Hypatia: An expert system proposal for documentation departments

Nowadays the vast amount of text-based information stored in organizations requires different approaches and new tools in order to manage it adequately. This paper presents Hypatia, a support expert system for documentation departments and regular users that exploits not only local information, but also external resources from the Web (e.g., Linked Data). The expert system uses different modules: Natural Language Processing (NLP) analysis, categorization, semantic disambiguation, Automatic Query Expansion (AQE), semantic search, summarization, knowledge extraction, and aggregation. Users can interact with the expert system in different ways, varying from giving very specific orders to writing a simple list of keywords. The latter method requires a previous interpretation before deciding the response of the system. The obtained results will benefit from semantic links referencing complementary data to improve both the information presentation and the data navigation.

[1]  Eduardo Mena,et al.  Discovering the Semantics of User Keywords , 2007, J. Univers. Comput. Sci..

[2]  Eduardo Mena,et al.  TM-Gen: A Topic Map Generator from Text Documents , 2013, 2013 IEEE 25th International Conference on Tools with Artificial Intelligence.

[3]  L. R. Dice Measures of the Amount of Ecologic Association Between Species , 1945 .

[4]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[5]  Pedro Rangel Henriques,et al.  TM-Builder: An Ontology Builder based on XML Topic Maps , 2018, CLEI Electron. J..

[6]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[7]  Alan F. Smeaton,et al.  Using NLP or NLP Resources for Information Retrieval Tasks , 1999 .

[8]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[9]  António Branco,et al.  Extracting Multi-document Summaries with a Double Clustering Approach , 2012, NLDB.

[10]  António Branco,et al.  Compressing Multi-document Summaries through Sentence Simplification , 2013, ICAART.

[11]  Satoshi Sekine,et al.  Named entities : recognition, classification and use , 2009 .

[12]  Eduardo Mena,et al.  The GENIE Project - A Semantic Pipeline for Automatic Document Categorisation , 2014, WEBIST.

[13]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[14]  Sandeep Kumar,et al.  Semantic web reasoners and languages , 2011, Artificial Intelligence Review.

[15]  Edward H. Shortliffe,et al.  An Approach to Verifying Completeness and Consistency in a Rule-Based Expert System , 1982, AI Mag..

[16]  Jaime Carbonell,et al.  Multi-Document Summarization By Sentence Extraction , 2000 .

[17]  Gerard Salton,et al.  Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.

[18]  German Rigau,et al.  Book Reviews: EuroWordNet: A Multilingual Database with Lexical Semantic Networks , 1999, CL.

[19]  Eduardo Mena,et al.  NASS: News Annotation Semantic System , 2011, 2011 IEEE 23rd International Conference on Tools with Artificial Intelligence.

[20]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[21]  Erik Van der Goot,et al.  Near real time information mining in multilingual news , 2009, WWW '09.

[22]  Alan Gilchrist,et al.  Thesauri, taxonomies and ontologies - an etymological note , 2003, J. Documentation.

[23]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[24]  Soumen Chakrabarti,et al.  Keyword Search in Databases , 2007 .

[25]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[26]  Eduardo Mena,et al.  SQX-Lib: Developing a Semantic Query Expansion System in a Media Group , 2014, ECIR.

[27]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[28]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[29]  Bruno Grilhères,et al.  Events Extraction and Aggregation for Open Source Intelligence: From Text to Knowledge , 2013, 2013 IEEE 25th International Conference on Tools with Artificial Intelligence.

[30]  Eduard H. Hovy,et al.  Identifying Topics by Position , 1997, ANLP.