Pattern Decomposition with Complex Combinatorial Constraints: Application to Materials Discovery

Identifying important components or factors in large amounts of noisy data is a key problem in machine learning and data mining. Motivated by a pattern decomposition problem in materials discovery, aimed at discovering new materials for renewable energy, e.g. for fuel and solar cells, we introduce CombiFD, a framework for factor based pattern decomposition that allows the incorporation of a-priori knowledge as constraints, including complex combinatorial constraints. In addition, we propose a new pattern decomposition algorithm, called AMIQO, based on solving a sequence of (mixed-integer) quadratic programs. Our approach considerably outperforms the state of the art on the materials discovery problem, scaling to larger datasets and recovering more precise and physically meaningful decompositions. We also show the effectiveness of our approach for enforcing background knowledge on other application domains.

[1]  Peter Norvig,et al.  The Unreasonable Effectiveness of Data , 2009, IEEE Intelligent Systems.

[2]  R. M. Fleming,et al.  Discovery of a useful thin-film dielectric using a composition-spread approach , 1998, Nature.

[3]  Chris H. Q. Ding,et al.  Convex and Semi-Nonnegative Matrix Factorizations , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Chris H. Q. Ding,et al.  On the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing , 2008, Comput. Stat. Data Anal..

[5]  C. Gomes Computational Sustainability: Computational methods for a sustainable environment, economy, and society , 2009 .

[6]  John D. Perkins,et al.  Combinatorial materials science , 2005 .

[7]  Chih-Jen Lin,et al.  Projected Gradient Methods for Nonnegative Matrix Factorization , 2007, Neural Computation.

[8]  Ronan Le Bras,et al.  Constraint Reasoning and Kernel Clustering for Pattern Decomposition with Scaling , 2011, CP.

[9]  B. Narasimhan,et al.  Combinatorial materials science , 2007 .

[10]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[11]  Hideomi Koinuma,et al.  Combinatorial Synthesis and Evaluation of Functional Inorganic Materials Using Thin-Film Techniques , 2002 .

[12]  Stefano Ermon,et al.  SMT-Aided Combinatorial Materials Discovery , 2012, SAT.

[13]  Haifeng Liu,et al.  Non-Negative Matrix Factorization with Constraints , 2010, AAAI.

[14]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[15]  I Takeuchi,et al.  Rapid identification of structural phases in combinatorial thin-film libraries using x-ray diffraction and non-negative matrix factorization. , 2009, The Review of scientific instruments.

[16]  Naren Ramakrishnan,et al.  Clustering with Complex Constraints - Algorithms and Applications , 2013, AAAI.

[17]  Michael W. Berry,et al.  Algorithms and applications for approximate nonnegative matrix factorization , 2007, Comput. Stat. Data Anal..

[18]  P. Paatero,et al.  Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values† , 1994 .

[19]  Inderjit S. Dhillon,et al.  Fast Newton-type Methods for the Least Squares Nonnegative Matrix Approximation Problem , 2007, SDM.

[20]  Bhiksha Raj,et al.  Adobe Systems , 1998 .

[21]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[22]  M. Shahriar Hossain,et al.  Unifying dependent clustering and disparate clustering for non-homogeneous data , 2010, KDD.

[23]  Hyunsoo Kim,et al.  Nonnegative Matrix Factorization Based on Alternating Nonnegativity Constrained Least Squares and Active Set Method , 2008, SIAM J. Matrix Anal. Appl..

[24]  Ian Davidson,et al.  Constrained Clustering: Advances in Algorithms, Theory, and Applications , 2008 .

[25]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[26]  Chris H. Q. Ding,et al.  Symmetric Nonnegative Matrix Factorization for Graph Clustering , 2012, SDM.

[27]  Paris Smaragdis,et al.  Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs , 2004, ICA.

[28]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[29]  Michael W. Berry,et al.  Using a literature-based NMF model for discovering gene functional relationships , 2008, 2008 IEEE International Conference on Bioinformatics and Biomeidcine Workshops.

[30]  Amy Nicole Langville,et al.  Algorithms, Initializations, and Convergence for the Nonnegative Matrix Factorization , 2014, ArXiv.

[31]  Ashley A. White The Materials Genome Initiative: One year on , 2012 .

[32]  Ronan Le Bras,et al.  A computational challenge problem in materials discovery: synthetic problem generator and real-world datasets , 2014, AAAI 2014.

[33]  Manh Cuong Nguyen,et al.  On-the-fly machine-learning for high-throughput experiments: search for rare-earth-free permanent magnets , 2014, Scientific Reports.

[34]  Jaegul Choo,et al.  UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization , 2013, IEEE Transactions on Visualization and Computer Graphics.

[35]  Michael W. Berry,et al.  Document clustering using nonnegative matrix factorization , 2006, Inf. Process. Manag..

[36]  Slobodan Mitrovic,et al.  Discovering Ce-rich oxygen evolution catalysts, from high throughput screening to water electrolysis , 2014 .

[37]  Chris H. Q. Ding,et al.  Solving Consensus and Semi-supervised Clustering Problems Using Nonnegative Matrix Factorization , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[38]  John M. Gregoire,et al.  Improved Fuel Cell Oxidation Catalysis in Pt1−xTax† , 2010 .

[39]  Rose-Noëlle Vannier,et al.  Bi4V2O11 polymorph crystal structures related to their electrical properties , 2003 .

[40]  Xiaojun Wu,et al.  Graph Regularized Nonnegative Matrix Factorization for Data Representation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.