Automatically structuring domain knowledge from text: An overview of current research

This paper presents an overview of automatic methods for building domain knowledge structures (domain models) from text collections. Applications of domain models have a long history within knowledge engineering and artificial intelligence. In the last couple of decades they have surfaced noticeably as a useful tool within natural language processing, information retrieval and semantic web technology. Inspired by the ubiquitous propagation of domain model structures that are emerging in several research disciplines, we give an overview of the current research landscape and some techniques and approaches. We will also discuss trade-offs between different approaches and point to some recent trends.

[1]  Daniel Jurafsky,et al.  Learning Syntactic Patterns for Automatic Hypernym Discovery , 2004, NIPS.

[2]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[3]  Eduard H. Hovy,et al.  The Automated Acquisition of Topic Signatures for Text Summarization , 2000, COLING.

[4]  Udo Kruschwitz,et al.  Incorporating Seasonality into Search Suggestions Derived from Intranet Query Logs , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[5]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[6]  Hugo Liu,et al.  ConceptNet — A Practical Commonsense Reasoning Tool-Kit , 2004 .

[7]  T. D. Wilson,et al.  Models in information behaviour research , 1999, J. Documentation.

[8]  Thomas Markus,et al.  Ontology Enrichment with Social Tags for eLearning , 2009, EC-TEL.

[9]  Ido Dagan,et al.  The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[10]  Hsin-Hsi Chen,et al.  Query Expansion with ConceptNet and WordNet: An Intrinsic Comparison , 2006, AIRS.

[11]  Christian Wagner,et al.  Breaking the Knowledge Acquisition Bottleneck Through Conversational Knowledge Management , 2006, Inf. Resour. Manag. J..

[12]  J Diederich,et al.  Automatically Created Concept Graphs Using Descriptive Keywords in the Medical Domain , 2008, Methods of Information in Medicine.

[13]  Roberto Navigli,et al.  An analysis of ontology-based query expansion strategies , 2003 .

[14]  Filip Radlinski,et al.  Query chains: learning to rank from implicit feedback , 2005, KDD '05.

[15]  Aristides Gionis,et al.  The query-flow graph: model and applications , 2008, CIKM '08.

[16]  Eneko Agirre,et al.  Proceedings of the 4th International Workshop on Semantic Evaluations , 2007 .

[17]  Massimo Poesio,et al.  Extracting concept descriptions from the Web: the importance of attributes and values , 2008, Ontology Learning and Population.

[18]  Olatz Ansa,et al.  Enriching WordNet concepts with topic signatures , 2001, ArXiv.

[19]  Raymond J. Mooney,et al.  Bottom-Up Relational Learning of Pattern Matching Rules for Information Extraction , 2003, J. Mach. Learn. Res..

[20]  W. Grabe,et al.  Aspects of text structure : an investigation of the lexical organisation of text , 1987 .

[21]  Peter Clark,et al.  The Seventh PASCAL Recognizing Textual Entailment Challenge , 2011, TAC.

[22]  Montse Cuadros,et al.  Quality Assessment of Large Scale Knowledge Resources , 2006, EMNLP.

[23]  John F. Sowa,et al.  Handbook of Knowledge Representation Edited Conceptual Graphs 5.1 from Existential Graphs to Conceptual Graphs , 2022 .

[24]  Steffen Staab,et al.  Learning Concept Hierarchies from Text Corpora using Formal Concept Analysis , 2005, J. Artif. Intell. Res..

[25]  D. Kirsh Foundations of Artificial Intelligence , 1991 .

[26]  Yu-lung Lo,et al.  Upgrading domain ontology based on latent semantic analysis and group center similarity calculation , 2008, 2008 IEEE International Conference on Systems, Man and Cybernetics.

[27]  Jörg Tiedemann,et al.  Using Lexico-Semantic Information for Query Expansion in Passage Retrieval for Question Answering , 2008, COLING 2008.

[28]  Udo Kruschwitz,et al.  Automatically Maintained Domain Knowledge: Initial Findings , 2009, ECIR.

[29]  C. W. Cleverdon,et al.  The ASLIB CRANFIELD RESEARCH PROJECT ON The COMPARATIVE EFFICIENCY OF INDEXING SYSTEMS , 1960 .

[30]  Olena Medelyan,et al.  Integrating Cyc and Wikipedia: Folksonomy meets rigorously defined common-sense , 2008, AAAI 2008.

[31]  Ophir Frieder,et al.  Repeatable evaluation of search services in dynamic environments , 2007, TOIS.

[32]  Anne N. De Roeck,et al.  Autopoiesis, the immune system, and adaptive information filtering , 2009, Natural Computing.

[33]  Paola Velardi,et al.  From Glossaries to Ontologies: Extracting Semantic Structure from Textual Definitions , 2008, Ontology Learning and Population.

[34]  Uwe Aickelin,et al.  A Recommender System based on Idiotypic Artificial Immune Networks , 2005, J. Math. Model. Algorithms.

[35]  Roberto Navigli,et al.  Using Cycles and Quasi-Cycles to Disambiguate Dictionary Glosses , 2009, EACL.

[36]  William A. Woods,et al.  What's in a Link: Foundations for Semantic Networks , 1975 .

[37]  Shui-Lung Chuang,et al.  Taxonomy generation for text segments: A practical web-based approach , 2005, TOIS.

[38]  Steffen Staab,et al.  On How to Perform a Gold Standard Based Evaluation of Ontology Learning , 2006, SEMWEB.

[39]  Philipp Cimiano,et al.  Proceedings of the 2008 conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge , 2008 .

[40]  Markus Schaal,et al.  A graph based approach to estimating lexical cohesion , 2008, IIiX.

[41]  Dan Wu,et al.  Concept Extraction and Clustering for Topic Digital Library Construction , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[42]  Paola Velardi,et al.  Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites , 2004, CL.

[43]  Gerhard Paass,et al.  Composite Kernels For Relation Extraction , 2009, ACL.

[44]  Christian Jacquemin,et al.  Automatic Acquisition and Expansion of Hypernym Links , 2004, Comput. Humanit..

[45]  Marko Grobelnik,et al.  A SURVEY OF ONTOLOGY EVALUATION TECHNIQUES , 2005 .

[46]  Marko Brunzel,et al.  The XTREEM Methods for Ontology Learning from Web Documents , 2008, Ontology Learning and Population.

[47]  Roberto Navigli,et al.  Semi-Automatic Extension of Large-Scale Linguistic Knowledge Bases , 2005, FLAIRS.

[48]  Pablo Castells,et al.  An Ontology-Based Information Retrieval Model , 2005, ESWC.

[49]  Oren Etzioni,et al.  Web document clustering: a feasibility demonstration , 1998, SIGIR '98.

[50]  Paola Velardi,et al.  Ontology Enrichment Through Automatic Semantic Annotation of On-Line Glossaries , 2006, EKAW.

[51]  Mark Steedman,et al.  Example Selection for Bootstrapping Statistical Parsers , 2003, NAACL.

[52]  Ricardo A. Baeza-Yates,et al.  Graphs from Search Engine Queries , 2007, SOFSEM.

[53]  Thomas L. Griffiths,et al.  Contextual Dependencies in Unsupervised Word Segmentation , 2006, ACL.

[54]  Lauren B. Doyle,et al.  Semantic Road Maps for Literature Searchers , 1961, JACM.

[55]  Simone Paolo Ponzetto,et al.  Deriving a Large-Scale Taxonomy from Wikipedia , 2007, AAAI.

[56]  Aron Culotta,et al.  Dependency Tree Kernels for Relation Extraction , 2004, ACL.

[57]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[58]  Anne N. De Roeck,et al.  A review of evolutionary and immune-inspired information filtering , 2010, Natural Computing.

[59]  George A. Vouros,et al.  Enhancing Ontological Knowledge Through Ontology Population and Enrichment , 2004, EKAW.

[60]  Steffen Staab,et al.  Ontology Learning for the Semantic Web , 2002, IEEE Intell. Syst..

[61]  Daniel Jurafsky,et al.  Semantic Taxonomy Induction from Heterogenous Evidence , 2006, ACL.

[62]  Dr. Joseph A Meloche,et al.  A Report on the Second Workshop on Collaborative Information Seeking (cis) Collaborative Information Seeking , 2022 .

[63]  Simone Paolo Ponzetto,et al.  Large-Scale Taxonomy Mapping for Restructuring and Integrating Wikipedia , 2009, IJCAI.

[64]  M R Quillian,et al.  Word concepts: a theory and simulation of some basic semantic capabilities. , 1967, Behavioral science.

[65]  Steffen Staab,et al.  Measuring Similarity between Ontologies , 2002, EKAW.

[66]  Yuan Yan Tang,et al.  Document Processing for Automatic Knowledge Acquisition , 1994, IEEE Trans. Knowl. Data Eng..

[67]  Lucy Vanderwende,et al.  MindNet: Acquiring and Structuring Semantic Information from Text , 1998, COLING-ACL.

[68]  Yorick Wilks,et al.  Natural Language Processing as a Foundation of the Semantic Web , 2009, Found. Trends Web Sci..

[69]  Xiaotie Deng,et al.  A new suffix tree similarity measure for document clustering , 2007, WWW '07.

[70]  Man Lung Yiu,et al.  Group-by skyline query processing in relational engines , 2009, CIKM.

[71]  Patrick Pantel,et al.  Discovering word senses from text , 2002, KDD.

[72]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[73]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[74]  Helmut Berger,et al.  Improving Domain Ontologies by Mining Semantics from Text , 2004, APCCM.

[75]  Alexandre Passant,et al.  Using Ontologies to Strengthen Folksonomies and Enrich Information Retrieval in Weblogs: Theoretical background and corporate use-case , 2007, ICWSM.

[76]  Louisa Sadler,et al.  Structural Non-Correspondence in Translation , 1991, EACL.

[77]  Ricardo A. Baeza-Yates,et al.  Extracting semantic relations from query logs , 2007, KDD '07.

[78]  Udo Kruschwitz,et al.  Users want more sophisticated search assistants: Results of a task-based evaluation , 2005, J. Assoc. Inf. Sci. Technol..

[79]  Andrzej Skowron,et al.  Proceedings of the 2005 IEEE / WIC / ACM International Conference on Web Intelligence , 2005 .

[80]  Eugene J. Shekita,et al.  Beyond basic faceted search , 2008, WSDM '08.

[81]  D. Bobrow,et al.  Representation and Understanding: Studies in Cognitive Science , 1975 .

[82]  Eneko Agirre,et al.  Publicly Available Topic Signatures for all WordNet Nominal Senses , 2004, LREC.

[83]  Steffen Staab,et al.  Strategies for the Evaluation of Ontology Learning , 2008, Ontology Learning and Population.

[84]  Gregory Grefenstette,et al.  Use of syntactic context to produce term association lists for text retrieval , 1992, SIGIR '92.

[85]  Bob J. Wielinga,et al.  Patterns of semantic relations to improve image content search , 2007, J. Web Semant..

[86]  Ido Dagan,et al.  The Fourth PASCAL Recognizing Textual Entailment Challenge , 2008, TAC.

[87]  Giuseppe Nicosia,et al.  Special issue on Nature Inspired Cooperative Strategies for Optimisation (NICSO) , 2009, Natural Computing.

[88]  Stefan Bordag Word Sense Induction: Triplet-Based Clustering and Automatic Evaluation , 2006, EACL.

[89]  Claudio Giuliano,et al.  FBK-IRST: Kernel Methods for Semantic Relation Extraction , 2007, SemEval@ACL.

[90]  Douglas B. Lenat,et al.  CYC: Using Common Sense Knowledge to Overcome Brittleness and Knowledge Acquisition Bottlenecks , 1986, AI Mag..

[91]  Paola Velardi,et al.  Using text processing techniques to automatically enrich a domain ontology , 2001, FOIS.

[92]  Paola Velardi,et al.  Quantitative and Qualitative Evaluation of the OntoLearn Ontology Learning System , 2004, COLING.

[93]  Paul Solomon,et al.  Looking for Information—A Survey of Research on Information Seeking, Needs, and Behavior , 2003, Information Retrieval.

[94]  Hongbo Deng,et al.  Entropy-biased models for query representation on the click graph , 2009, SIGIR.

[95]  Francesco Bonchi,et al.  Query suggestions using query-flow graphs , 2009, WSCD '09.

[96]  Huajun Chen,et al.  The Semantic Web , 2011, Lecture Notes in Computer Science.

[97]  Ido Dagan,et al.  Machine Learning Challenges, Evaluating Predictive Uncertainty, Visual Object Classification and Recognizing Textual Entailment, First PASCAL Machine Learning Challenges Workshop, MLCW 2005, Southampton, UK, April 11-13, 2005, Revised Selected Papers , 2006, MLCW.

[98]  Boris Motik,et al.  An infrastructure for searching, reusing and evolving distributed ontologies , 2003, WWW '03.

[99]  Jimmy J. Lin,et al.  REXTOR: A System for Generating Relations from Natural Language , 2000 .

[100]  Ellen Riloff,et al.  Toward Completeness in Concept Extraction and Classification , 2009, EMNLP.

[101]  Dominic Widdows,et al.  Geometry and Meaning , 2004, Computational Linguistics.

[102]  Nick Craswell,et al.  Random walks on the click graph , 2007, SIGIR.

[103]  Henry Lieberman,et al.  AnalogySpace: Reducing the Dimensionality of Common Sense Knowledge , 2008, AAAI.

[104]  Ellen Riloff,et al.  A Bootstrapping Method for Learning Semantic Lexicons using Extraction Pattern Contexts , 2002, EMNLP.

[105]  Fabrizio Silvestri,et al.  Mining Query Logs: Turning Search Usage Data into Knowledge , 2010, Found. Trends Inf. Retr..

[106]  Olatz Ansa,et al.  Enriching very large ontologies using the WWW , 2000, ECAI Workshop on Ontology Learning.

[107]  Fausto Giunchiglia,et al.  A large dataset for the evaluation of ontology matching , 2009, The Knowledge Engineering Review.

[108]  Mirella Lapata,et al.  An Experimental Study of Graph Connectivity for Unsupervised Word Sense Disambiguation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[109]  Xiaoying Gao,et al.  Improving Web clustering by cluster selection , 2005, The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05).

[110]  Michael Jackman,et al.  Conceptual graphs , 1988 .

[111]  Udo Kruschwitz An Adaptable Search System for Collections of Partially Structured Documents , 2003, IEEE Intell. Syst..

[112]  Katsumi Tanaka,et al.  Extracting Concept Hierarchy Knowledge from the Web Based on Property Inheritance and Aggregation , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[113]  Enrique Alfonseca,et al.  Web-derived resources for web information retrieval: from conceptual hierarchies to attribute hierarchies , 2009, SIGIR.

[114]  Zornitsa Kozareva,et al.  A Semi-Supervised Method to Learn and Construct Taxonomies Using the Web , 2010, EMNLP.

[115]  J. Cullen,et al.  The Knowledge Acquisition Bottleneck: Time for Reassessment? , 1988 .

[116]  A. Akbik,et al.  Wanderlust : Extracting Semantic Relations from Natural Language Text Using Dependency Grammar Patterns , 2009 .

[117]  Steffen Staab,et al.  Ontology Learning , 2004, Encyclopedia of Machine Learning and Data Mining.

[118]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[119]  Patrick Pantel,et al.  Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations , 2006, ACL.

[120]  Mimoun Malki,et al.  Adapting WordNet to the Medical Domain using Lexicosyntactic Patterns in the Ohsumed Corpus. , 2006, IEEE International Conference on Computer Systems and Applications, 2006..

[121]  Erik T. Mueller,et al.  Open Mind Common Sense: Knowledge Acquisition from the General Public , 2002, OTM.

[122]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[123]  Dominic Widdows,et al.  Visualisation Techniques for Analysing Meaning , 2002, TSD.

[124]  Raymond Y. K. Lau,et al.  Towards a belief-revision-based adaptive and context-sensitive information retrieval system , 2008, TOIS.

[125]  Michael Strube,et al.  WikiNet: A Very Large Scale Multi-Lingual Concept Network , 2010, LREC.

[126]  Gary Marchionini,et al.  Information-Seeking Support Systems [Guest Editors' Introduction] , 2009, Computer.

[127]  Silvio Ceccato,et al.  LINGUISTIC ANALYSIS AND PROGRAMMING FOR MECHANICAL TRANSLATION (MECHANICAL TRANSLATION AND THOUGHT) , 1960 .

[128]  van Gerardus Noord,et al.  Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010) , 2010 .

[129]  Simone Paolo Ponzetto,et al.  BabelNet: Building a Very Large Multilingual Semantic Network , 2010, ACL.

[130]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[131]  Dominic Widdows,et al.  A Graph Model for Unsupervised Lexical Acquisition , 2002, COLING.

[132]  Dan I. Moldovan,et al.  Automatic Discovery of Part-Whole Relations , 2006, CL.

[133]  Udo Kruschwitz,et al.  Moving towards Adaptive Search in Digital Libraries , 2009, NLP4DL/AT4DL.

[134]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[135]  Stuart Watt,et al.  Classifying XML Documents by Using Genre Features , 2007 .

[136]  Kentaro Torisawa,et al.  Acquiring Hyponymy Relations from Web Documents , 2004, NAACL.

[137]  W. Bruce Croft,et al.  Discovering and Comparing Topic Hierarchies , 2000, RIAO.

[138]  Grigoris Antoniou,et al.  Ontology change: classification and survey , 2008, The Knowledge Engineering Review.

[139]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[140]  Ricardo A. Baeza-Yates,et al.  Query Recommendation Using Query Logs in Search Engines , 2004, EDBT Workshops.

[141]  Patrick Pantel,et al.  Ontologizing Semantic Relations , 2006, ACL.

[142]  Raymond Y. K. Lau,et al.  Toward a Fuzzy Domain Ontology Extraction Method for Adaptive e-Learning , 2009, IEEE Transactions on Knowledge and Data Engineering.

[143]  Raymond Y. K. Lau,et al.  Towards Context-Sensitive Domain Ontology Extraction , 2007, 2007 40th Annual Hawaii International Conference on System Sciences (HICSS'07).

[144]  Philipp Cimiano,et al.  Ontology Learning from Text: Methods, Evaluation and Applications , 2005 .

[145]  W. Bruce Croft,et al.  Deriving concept hierarchies from text , 1999, SIGIR '99.

[146]  Nikolaos Nanas,et al.  Towards Nootropia: a non-linear approach to adaptive document filtering , 2004, SIGF.