Cheating to achieve Formal Concept Analysis over a Large Formal Context

Researchers are facing one of the main problems of the Information Era. As more articles are made electronically available, it gets harder to follow trends in the different domains of research. Cheap, coherent and fast to construct knowledge models of research domains will be much required when information becomes unmanageable. While Formal Concept Analysis (FCA) has been widely used on several areas to construct knowledge artifacts for this purpose (Ontology development, Information Retrieval, Software Refactoring, Knowledge Discovery), the large amount of documents and terminology used on research domains makes it not a very good option (because of the high computational cost and humanly-unprocessable output). In this article we propose a novel heuristic to create a taxonomy from a large term-document dataset using Latent Semantic Analysis and Formal Concept Analysis. We provide and discuss its implementation on a real dataset from the Software Architecture community obtained from the ISI Web of Knowledge (4400 documents).

[1]  Steffen Staab,et al.  Learning Concept Hierarchies from Text Corpora using Formal Concept Analysis , 2005, J. Artif. Intell. Res..

[2]  Uta Priss,et al.  Formal concept analysis in information science , 2006, Annu. Rev. Inf. Sci. Technol..

[3]  Sergei O. Kuznetsov,et al.  On stability of a formal concept , 2007, Annals of Mathematics and Artificial Intelligence.

[4]  Sérgio M. Dias,et al.  Reducing the Size of Concept Lattices: The JBOS Approach , 2010, CLA.

[5]  Juan Llorens Morillo,et al.  Training Initiative for New Software/Enterprise Architects: An Ontological Approach , 2007, 2007 Working IEEE/IFIP Conference on Software Architecture (WICSA'07).

[6]  Panagiotis G. Ipeirotis,et al.  Automatic construction of multifaceted browsing interfaces , 2005, CIKM '05.

[7]  Václav Snásel,et al.  Concept Lattice Reduction by Singular Value Decomposition , 2007, SYRCoDIS.

[8]  Hernán Astudillo,et al.  No mining, no meaning: relating documents across repositories with ontology-driven information extraction , 2008, ACM Symposium on Document Engineering.

[9]  Amedeo Napoli,et al.  Analysis of Social Communities with Iceberg and Stability-Based Concept Lattices , 2008, ICFCA.

[10]  Douglas R. Vogel,et al.  Complexity Reduction in Lattice-Based Information Retrieval , 2005, Information Retrieval.

[11]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[12]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[13]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[14]  Rick Kazman,et al.  Accessing multimedia through concept clustering , 1997, CHI.

[15]  Camille Roth,et al.  Towards Concise Representation for Taxonomies of Epistemic Communities , 2006, CLA.

[16]  Ch. Aswanikumar,et al.  Concept lattice reduction using fuzzy K-Means clustering , 2010, Expert Syst. Appl..

[17]  Rudolf Wille,et al.  Formal Concept Analysis as Mathematical Theory of Concepts and Concept Hierarchies , 2005, Formal Concept Analysis.

[18]  Camille Roth,et al.  Reducing the Representation Complexity of Lattice-Based Taxonomies , 2007, ICCS.

[19]  Gerd Stumme,et al.  Efficient Data Mining Based on Formal Concept Analysis , 2002, DEXA.

[20]  Paul Bourgine,et al.  Lattice-based dynamic and overlapping taxonomies: The case of epistemic communities , 2006, Scientometrics.

[21]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[22]  Rudolf Wille,et al.  Restructuring Lattice Theory: An Approach Based on Hierarchies of Concepts , 2009, ICFCA.

[23]  Jian-Hua Yeh,et al.  Ontology Construction Based on Latent Topic Extraction in a Digital Library , 2008, ICADL.

[24]  Václav Snásel,et al.  Concept Lattice Generation by Singular Value Decomposition , 2004, CLA.