Query expansion using an immune-inspired biclustering algorithm

Query expansion is a technique utilized to improve the performance of information retrieval systems by automatically adding related terms to the initial query. These additional terms can be obtained from documents stored in a database. Usually, this task is performed by clustering the documents and then extracting representative terms from the clusters. Afterwards, a new search is performed in the whole database using the expanded set of terms. Recently, the authors have proposed an immune-inspired algorithm, namely BIC-aiNet, to perform biclustering of texts. Biclustering differs from standard clustering algorithms in the sense that the former can detect partial similarities in the attributes. The preliminary results indicated that our proposal is able to group similar texts effectively and the generated biclusters consistently presented relevant words to represent a category of texts. Motivated by this promising scenario, this paper better formalizes the proposal and investigates the usefulness of the whole methodology on larger datasets. The BIC-aiNet was applied to a set of documents aiming at identifying the set of relevant terms associated with each bicluster, giving rise to a query expansion tool. The obtained results were compared with those produced by two alternative proposals in the literature, and they indicate that these techniques tend to generate complementary results, as a consequence of the use of distinct similarity metrics.

[1]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[2]  Efstratios Gallopoulos,et al.  TMG: A MATLAB Toolbox for Generating Term-Document Matrices from Text Collections , 2006, Grouping Multidimensional Data.

[3]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[4]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[5]  Ujjwal Maulik,et al.  Multiobjective fuzzy biclustering in microarray data: Method and a new performance measure , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[6]  Roded Sharan,et al.  Biclustering Algorithms: A Survey , 2007 .

[7]  Fabrício Olivetti de França,et al.  Evaluating the Performance of a Biclustering Algorithm Applied to Collaborative Filtering - A Comparative Analysis , 2007, 7th International Conference on Hybrid Intelligent Systems (HIS 2007).

[8]  F. Azuaje Artificial Immune Systems: A New Computational Intelligence Approach , 2003 .

[9]  Srinivas Aluru,et al.  Handbook Of Computational Molecular Biology , 2010 .

[10]  Panagiotis Symeonidis,et al.  Nearest-biclusters collaborative filtering based on constant and coherent values , 2008, Information Retrieval.

[11]  H. Abbass,et al.  aiNet : An Artificial Immune Network for Data Analysis , 2022 .

[12]  Leandro Nunes de Castro,et al.  aiNet: An Artificial Immune Network for Data Analysis , 2002 .

[13]  Fabrício Olivetti de França,et al.  Applying Biclustering to Perform Collaborative Filtering , 2007, Seventh International Conference on Intelligent Systems Design and Applications (ISDA 2007).

[14]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[15]  Federico Divina,et al.  A multi-objective approach to discover biclusters in microarray data , 2007, GECCO '07.

[16]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[17]  Fernando José Von Zuben,et al.  Learning and optimization using the clonal selection principle , 2002, IEEE Trans. Evol. Comput..

[18]  W. Bruce Croft,et al.  Providing Government Information on the Internet: Experiences with THOMAS , 1995, DL.

[19]  Fabrício Olivetti de França,et al.  A Multi-Objective Multipopulation Approach for Biclustering , 2008, ICARIS.

[20]  G L Ada,et al.  The clonal-selection theory. , 1987, Scientific American.

[21]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[22]  Marc Teboulle,et al.  Grouping Multidimensional Data - Recent Advances in Clustering , 2006 .

[23]  Fabrício Olivetti de França,et al.  Applying Biclustering to Text Mining: An Immune-Inspired Approach , 2007, ICARIS.

[24]  Jerne Nk Towards a network theory of the immune system. , 1974 .

[25]  DebK.,et al.  A fast and elitist multiobjective genetic algorithm , 2002 .

[26]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[27]  Jonathan Timmis,et al.  Artificial immune systems - a new computational intelligence paradigm , 2002 .

[28]  Sankar K. Pal,et al.  A MOE framework for Biclustering of Microarray Data , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[29]  Ronen Feldman,et al.  The Text Mining Handbook: DIAL: A Dedicated Information Extraction Language for Text Mining , 2006 .

[30]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[31]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[32]  Federico Divina,et al.  Evolutionary Search of Biclusters by Minimal Intrafluctuation , 2007, 2007 IEEE International Fuzzy Systems Conference.

[33]  Sushmita Mitra,et al.  Multi-objective evolutionary biclustering of gene expression data , 2006, Pattern Recognit..

[34]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[35]  Aidong Zhang,et al.  Interrelated two-way clustering: an unsupervised approach for gene expression data analysis , 2001, Proceedings 2nd Annual IEEE International Symposium on Bioinformatics and Bioengineering (BIBE 2001).

[36]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[37]  Bart De Moor,et al.  Biclustering microarray data by Gibbs sampling , 2003, ECCB.

[38]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[39]  James Brian Quinn,et al.  Technology in services , 1987 .