Name identification and extraction with formal concept analysis

One of the applications of the Formal concept analysis (FCA) is the ability to extract structured information from textual documents. Typically, one can define a set of attributes that will characterize the objects. Consequently, these defined objects will be extracted by standard FCA algorithms. In this paper, we describe how FCA identifies and extracts personal names as units of thought similar to the decoding of text sequences by Viterbi algorithm as used with Hidden Markov Models. We further exhibit how FCA mimics the thought process that goes into a rule-based information extraction system. We then observe that the formal approach of FCA combined with already established computational techniques such as bottom up intersection algorithm avoids the difficulties associated with hand coding and maintenance of rule-based systems.

[1]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[2]  Yuhua Qian,et al.  A comparative study of multigranulation rough sets and concept lattices via rule acquisition , 2016, Knowl. Based Syst..

[3]  Yuhua Qian,et al.  Concept learning via granular computing: A cognitive viewpoint , 2014, Information Sciences.

[4]  Uta Priss Linguistic Applications of Formal Concept Analysis , 2005, Formal Concept Analysis.

[5]  Sérgio M. Dias,et al.  Applying the JBOS reduction method for relevant knowledge extraction , 2013, Expert Syst. Appl..

[6]  Douglas E. Appelt,et al.  Introduction to Information Extraction Technology , 1999, IJCAI 1999.

[7]  Kazem Taghva,et al.  Name Extraction and Formal Concept Analysis , 2011, ICCS.

[8]  Weihua Xu,et al.  Granular Computing Approach to Two-Way Learning Based on Formal Concept Analysis in Fuzzy Datasets , 2016, IEEE Transactions on Cybernetics.

[9]  B. Powley,et al.  High accuracy citation extraction and named entity recognition for a heterogeneous corpus of academic papers , 2007, 2007 International Conference on Natural Language Processing and Knowledge Engineering.

[10]  Chu Kiong Loo,et al.  Formal concept analysis approach to cognitive functionalities of bidirectional associative memory , 2015, BICA 2015.

[11]  Gerd Stumme,et al.  Efficient Data Mining Based on Formal Concept Analysis , 2002, DEXA.

[12]  Thomas W. Reps,et al.  Identifying Modules via Concept Analysis , 1999, IEEE Trans. Software Eng..

[13]  Kazem Taghva,et al.  Address extraction using hidden Markov models , 2005, IS&T/SPIE Electronic Imaging.

[14]  Thierry Poibeau,et al.  Proper Name Extraction from Non-Journalistic Texts , 2000, CLIN.

[15]  Ralph Grishman,et al.  Message Understanding Conference- 6: A Brief History , 1996, COLING.

[16]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[17]  Weihua Xu,et al.  A novel cognitive system model and approach to transformation of information granules , 2014, Int. J. Approx. Reason..

[18]  Andrew McCallum,et al.  Information Extraction with HMMs and Shrinkage , 1999 .

[19]  Kazem Taghva,et al.  The Effects of OCR Error on the Extraction of Private Information , 2006, Document Analysis Systems.

[20]  Kazem Taghva,et al.  Using the Web 1T 5-Gram Database for Attribute Selection in Formal Concept Analysis to Correct Overstemmed Clusters , 2015, 2015 12th International Conference on Information Technology - New Generations.

[21]  Kazem Taghva,et al.  Recognizing acronyms and their definitions , 1999, International Journal on Document Analysis and Recognition.

[22]  Kazem Taghva Identification of Sensitive Unclassified Information , 2009 .