Exploration of textual document archives using a fuzzy hierarchical clustering algorithm in the GAMBAL system

The Internet together with the large amount of textual information available in document archives, has increased the relevance of information retrieval related tools. In this work we present an extension of the Gambal system for clustering and visualization of documents based on fuzzy clustering techniques. The tool allows to structure the set of documents in a hierarchical way (using a fuzzy hierarchical structure) and represent this structure in a graphical interface (a 3D sphere) over which the user can navigate.Gambal allows the analysis of the documents and the computation of their similarity not only on the basis of the syntactic similarity between words but also based on a dictionary (Wordnet 1.7) and latent semantics analysis.

[1]  Jinwoo Park,et al.  Improving text categorization using the importance of sentences , 2004, Inf. Process. Manag..

[2]  Takenobu Tokunaga,et al.  Query expansion using heterogeneous thesauri , 2000, Inf. Process. Manag..

[3]  Donna Harman,et al.  Information Processing and Management , 2022 .

[4]  Sanjay Ranka,et al.  An effic ient k-means clustering algorithm , 1997 .

[5]  Peter Bailey,et al.  Engineering a multi-purpose test collection for Web retrieval experiments , 2003, Inf. Process. Manag..

[6]  Enrique Herrera-Viedma,et al.  Evaluating the informative quality of documents in SGML format from judgements by means of fuzzy linguistic techniques based on computing with words , 2003, Inf. Process. Manag..

[7]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[8]  Donald H. Kraft,et al.  Combining fuzzy clustering and fuzzy inferencing in information retrieval , 2000, Ninth IEEE International Conference on Fuzzy Systems. FUZZ- IEEE 2000 (Cat. No.00CH37063).

[9]  George J. Klir,et al.  Fuzzy sets and fuzzy logic - theory and applications , 1995 .

[10]  Fabio Crestani,et al.  A graphical user interface for the retrieval of hierarchically structured documents , 2004, Inf. Process. Manag..

[11]  Sadaaki Miyamoto,et al.  Methods in Hard and Fuzzy Clustering , 2000 .

[12]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[13]  Sadaaki Miyamoto,et al.  Fuzzy Sets in Information Retrieval and Cluster Analysis , 1990, Theory and Decision Library.

[14]  Fabio Crestani,et al.  Soft Computing in Information Retrieval , 2000 .

[15]  Fabio Crestani,et al.  Lectures on Information Retrieval , 2001, Lecture Notes in Computer Science.

[16]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[17]  T. Kunii,et al.  Soft Computing and Human-Centered Machines , 2013, Computer Science Workbench.

[18]  Robert Villa,et al.  The effectiveness of query-specific hierarchic clustering in information retrieval , 2002, Inf. Process. Manag..

[19]  Donald H. Kraft,et al.  Fuzzy Set Techniques in Information Retrieval , 1999 .

[20]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[21]  Kerry Rodden,et al.  Evaluating similarity-based visualisations as interfaces for image browsing , 2002 .

[22]  Frank Klawonn,et al.  Fuzzy clustering with weighting of data variables , 2000, EUSFLAT-ESTYLF Joint Conf..

[23]  James C. French,et al.  Clustering large datasets in arbitrary metric spaces , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[24]  Patrick Bosc,et al.  Sugeno fuzzy integral as a basis for the interpretation of flexible queries involving monotonic aggregates , 2003, Inf. Process. Manag..

[25]  Y. Ohsawa,et al.  Potential motivations as fountains of chances , 2000, 2000 26th Annual Conference of the IEEE Industrial Electronics Society. IECON 2000. 2000 IEEE International Conference on Industrial Electronics, Control and Instrumentation. 21st Century Technologies.

[26]  Donald H. Kraft,et al.  A feature mining based approach for the classification of text documents into disjoint classes , 2002, Inf. Process. Manag..

[27]  Sadaaki Miyamoto,et al.  Hierarchical Spherical Clustering , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[28]  Ishwar K. Sethi,et al.  eID: a system for exploration of image databases , 2003, Inf. Process. Manag..

[29]  Jong-Hyeok Lee,et al.  Text categorization based on k-nearest neighbor approach for Web site classification , 2003, Inf. Process. Manag..

[30]  Sadaaki Miyamoto,et al.  Information clustering based on fuzzy multisets , 2003, Inf. Process. Manag..

[31]  Nuanwan Soonthornphisaj,et al.  Iterative cross‐training: An algorithm for learning from unlabeled Web pages , 2004, Int. J. Intell. Syst..

[32]  Sadaaki Miyamoto,et al.  Fuzzy clustering for indexing in the GAMBAL information retrieval system , 2003, EUSFLAT Conf..

[33]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[34]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[35]  Charles Cole,et al.  Visualization schemes for domain novices exploring a topic space: the navigation classification scheme , 2003, Inf. Process. Manag..