Extracting conceptual relationships from specialized documents

Conceptual modeling has been fundamental to the management of structured data. However, its value is increasingly being recognized for knowledge management in general. In trying to develop suitable conceptual models for unstructured information, issues such as the level of representation and complexity of processing techniques arise. Here, we investigate the use of a conceptual model that is simple enough to allow efficient automatic extraction from two kinds of documents--scientific research papers and patents. Our model focused on the problem-solution relationship that is central to the analysis of scientific papers, while allowing supporting relationships such as methods and claims. We evaluated the utility of the approach by building a prototype system and carrying out experiments that assessed the accuracy level of the techniques used in building the model and the acceptability of the model through preliminary user studies. The feedback from these experiments shows promising results that support our choice in the tradeoffs between the granularity of the model and the processing techniques used. We discuss a variety of issues that arouse from this project and describe several directions for future work.

[1]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[2]  William C. Mann,et al.  RHETORICAL STRUCTURE THEORY: A THEORY OF TEXT ORGANIZATION , 1987 .

[3]  Chris D. Paice,et al.  The automatic generation of literature abstracts: an approach based on the identification of self-indicating phrases , 1980, SIGIR '80.

[4]  Jakob Nielsen,et al.  Usability engineering , 1997, The Computer Science and Engineering Handbook.

[5]  Karen Sparck Jones,et al.  Book Reviews: Evaluating Natural Language Processing Systems: An Analysis and Review , 1996, CL.

[6]  Chris D. Paice,et al.  Constructing literature abstracts by computer: Techniques and prospects , 1990, Inf. Process. Manag..

[7]  Karen Spärck Jones Towards Better NLP System Evaluation , 1994, HLT.

[8]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[9]  Lois L. Earl,et al.  Experiments in automatic extracting and indexing , 1970, Inf. Storage Retr..

[10]  Daniel Marcu,et al.  From discourse structures to text summaries , 1997 .

[11]  Mark T. Maybury,et al.  Advances in Automatic Text Summarization , 1999 .

[12]  Phyllis B. Baxendale,et al.  Machine-Made Index for Technical Literature - An Experiment , 1958, IBM J. Res. Dev..

[13]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[14]  G Stix,et al.  The mice that warred. , 2001, Scientific American.

[15]  Robin Cohen,et al.  Analyzing the Structure of Argumentative Discourse , 1987, CL.

[16]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies , 2000, ArXiv.

[17]  Robert N. Oddy,et al.  Information Retrieval Research , 1982 .

[18]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[19]  Simone Teufel,et al.  Sentence extraction as a classification task , 1997 .

[20]  E. F. Skorochod'ko Adaptive Method of Automatic Abstracting and Indexing , 1971, IFIP Congress.

[21]  Diane J. Litman,et al.  Cue Phrase Classification Using Machine Learning , 1996, J. Artif. Intell. Res..

[22]  Bowen Hui Measuring User Acceptability of Machine Translations to Diagnose System Errors: An Experience Report , 2002, COLING 2002.

[23]  Kathleen R. McKeown,et al.  Summarization Evaluation Methods: Experiments and Analysis , 1998 .

[24]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[25]  Tsutomu Hirao An Extrinsic Evaluation for Question-Biased Text Summarization on QA tasks , 2001 .