论文信息 - Predicting missing values with biclustering: A coherence-based approach - 字舞流文

Predicting missing values with biclustering: A coherence-based approach

In this work, a novel biclustering-based approach to data imputation is proposed. This approach is based on the Mean Squared Residue metric, used to evaluate the degree of coherence among objects of a dataset, and presents an algebraic development that allows the modeling of the predictor as a quadratic programming problem. The proposed methodology is positioned in the field of missing data, its theoretical aspects are discussed and artificial and real-case scenarios are simulated to evaluate the performance of the technique. Additionally, relevant properties introduced by the biclustering process are also explored in post-imputation analysis, to highlight other advantages of the proposed methodology, more specifically confidence estimation and interpretability of the imputation process.

Fabrício Olivetti de França | Fernando José Von Zuben | Guilherme Palermo Coelho | F. V. Zuben | G. Coelho | G. P. Coelho | F. O. França | F. J. Zuben

[1] Morven Leese,et al. Book Review: Mathematical Classification and Clustering (Nonconvex Optimization and Its Applications, Vol. 11) , 2003 .

[2] Fabrício Olivetti de França,et al. Finding a high coverage set of 5-biclusters with swarm intelligence , 2010, IEEE Congress on Evolutionary Computation.

[3] James Bennett,et al. The Netflix Prize , 2007 .

[4] R. Varga. Geršgorin And His Circles , 2004 .

[5] Sushmita Mitra,et al. Multi-objective evolutionary biclustering of gene expression data , 2006, Pattern Recognit..

[6] Panagiotis Symeonidis,et al. Nearest-Biclusters Collaborative Filtering with Constant Values , 2006, WEBKDD.

[7] Kenneth Y. Goldberg,et al. Eigentaste: A Constant Time Collaborative Filtering Algorithm , 2001, Information Retrieval.

[8] Fabrício Olivetti de França,et al. Query expansion using an immune-inspired biclustering algorithm , 2010, Natural Computing.

[9] Fabrício Olivetti de França,et al. Multi-Objective Biclustering: When Non-dominated Solutions are not Enough , 2009, J. Math. Model. Algorithms.

[10] Judi Scheffer,et al. Dealing with Missing Data , 2020, The Big R‐Book.

[11] Françoise Fessant,et al. State-of-the-Art Recommender Systems , 2009 .

[12] D. Harville. Matrix Algebra From a Statistician's Perspective , 1998 .

[13] Jonathan Goldstein,et al. When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[14] Sven Bergmann,et al. Iterative signature algorithm for the analysis of large-scale gene expression data. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15] Fabrício Olivetti de França,et al. Evaluating the Performance of a Biclustering Algorithm Applied to Collaborative Filtering - A Comparative Analysis , 2007, 7th International Conference on Hybrid Intelligent Systems (HIS 2007).

[16] Boris Mirkin,et al. Mathematical Classification and Clustering , 1996 .

[17] Fabrício Olivetti de França,et al. Extracting additive and multiplicative coherent biclusters with swarm intelligence , 2011, 2011 IEEE Congress of Evolutionary Computation (CEC).

[18] Russ B. Altman,et al. Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[19] Marco Dorigo,et al. Optimization, Learning and Natural Algorithms , 1992 .

[20] Roded Sharan,et al. Discovering statistically significant biclusters in gene expression data , 2002, ISMB.

[21] R. Steele. Optimization , 2005 .

[22] Federico Divina,et al. Virtual Error: A New Measure for Evolutionary Biclustering , 2007, EvoBIO.

[23] J. Hartigan. Direct Clustering of a Data Matrix , 1972 .

[24] D. Rubin,et al. Statistical Analysis with Missing Data , 1988 .

[25] George M. Church,et al. Biclustering of Expression Data , 2000, ISMB.

[26] Donald Goldfarb,et al. An O(n3L) primal interior point algorithm for convex quadratic programming , 1991, Math. Program..

[27] Fritz Scheuren,et al. Hot Deck Imputation Procedure Applied to Double Sampling Design , 1986 .

[28] Fabrício Olivetti de França,et al. Applying Biclustering to Perform Collaborative Filtering , 2007, Seventh International Conference on Intelligent Systems Design and Applications (ISDA 2007).

[29] Inderjit S. Dhillon,et al. Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.