Deriving Compact Laws Based on Algebraic Formulation of a Data Set

In various subjects, there exist compact and consistent relationships between input and output parameters. Discovering the relationships, or namely compact laws, in a data set is of great interest in many fields, such as physics, chemistry, and finance. While data discovery has made great progress in practice thanks to the success of machine learning in recent years, the development of analytical approaches in finding the theory behind the data is relatively slow. In this paper, we develop an innovative approach in discovering compact laws from a data set. By proposing a novel algebraic equation formulation, we convert the problem of deriving meaning from data into formulating a linear algebra model and searching for relationships that fit the data. Rigorous proof is presented in validating the approach. The algebraic formulation allows the search of equation candidates in an explicit mathematical manner. Searching algorithms are also proposed for finding the governing equations with improved efficiency. For a certain type of compact theory, our approach assures convergence and the discovery is computationally efficient and mathematically precise.

[1]  Alan Edelman,et al.  Julia: A Fresh Approach to Numerical Computing , 2014, SIAM Rev..

[2]  A. Adamson A textbook of physical chemistry , 1973 .

[3]  W. Marsden I and J , 2012 .

[4]  R. Penrose A Generalized inverse for matrices , 1955 .

[5]  Steven L. Brunton,et al.  Data-driven discovery of partial differential equations , 2016, Science Advances.

[6]  James Casey,et al.  Classical Mechanics: A Modern Perspective , 2002 .

[7]  Pat Langley The computational support of scientific discovery , 2000, Int. J. Hum. Comput. Stud..

[8]  Michael I. Jordan,et al.  Machine learning: Trends, perspectives, and prospects , 2015, Science.

[9]  Mark A. Stalzer On the enumeration of sentences by compactness , 2017, ArXiv.

[10]  DO Q LEE NUMERICALLY EFFICIENT METHODS FOR SOLVING LEAST , 2012 .

[11]  Harald Niederreiter,et al.  Introduction to finite fields and their applications: Preface , 1994 .

[12]  Saso Dzeroski,et al.  Integrating Domain Knowledge in Equation Discovery , 2007, Computational Discovery of Scientific Knowledge.

[13]  Steven L. Brunton,et al.  Compressive Sensing and Low-Rank Libraries for Classification of Bifurcation Regimes in Nonlinear Dynamical Systems , 2013, SIAM J. Appl. Dyn. Syst..

[14]  Mark A. Stalzer,et al.  A preliminary review of influential works in data-driven discovery , 2015, SpringerPlus.

[15]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[16]  S. Brunton,et al.  Discovering governing equations from data by sparse identification of nonlinear dynamical systems , 2015, Proceedings of the National Academy of Sciences.

[17]  S. Osher,et al.  Sparse dynamics for partial differential equations , 2012, Proceedings of the National Academy of Sciences.

[18]  R. Pritchard,et al.  Electrical Characteristics of Transistors , 1967 .

[19]  Tom M Mitchell,et al.  Mining Our Reality , 2009, Science.

[20]  H. O. Foulkes Abstract Algebra , 1967, Nature.

[21]  Douglas H. Wiedemann Solving sparse linear equations over finite fields , 1986, IEEE Trans. Inf. Theory.

[22]  Hod Lipson,et al.  Distilling Free-Form Natural Laws from Experimental Data , 2009, Science.

[23]  B. Noble Applied Linear Algebra , 1969 .

[24]  Saso Dzeroski,et al.  Declarative Bias in Equation Discovery , 1997, ICML.

[25]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[26]  Rajkumar Buyya,et al.  Genetic Algorithm Based Data-Aware Group Scheduling for Big Data Clouds , 2014, 2014 IEEE/ACM International Symposium on Big Data Computing.

[27]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[28]  Pat Langley,et al.  Data-Driven Discovery of Physical Laws , 1981, Cogn. Sci..

[29]  Stephen C. Y. Lu,et al.  A knowledge-based equation discovery system for engineering domains , 1993, IEEE Expert.

[30]  Robert A. Lodder,et al.  Identification of Wood Species by Acoustic-Resonance Spectrometry Using Multivariate Subpopulation Analysis , 1993 .

[31]  Kalyan Veeramachaneni,et al.  Building Predictive Models via Feature Synthesis , 2015, GECCO.

[32]  Zoubin Ghahramani,et al.  Probabilistic machine learning and artificial intelligence , 2015, Nature.

[33]  R. Scott,et al.  Static properties of solutions. Van der Waals and related models for hydrocarbon mixtures , 1970 .

[34]  John Feo,et al.  Parallel Implementation of Fast Randomized Algorithms for Low Rank Matrix Decomposition , 2014, Parallel Process. Lett..

[35]  Lee Spector,et al.  Inference of compact nonlinear dynamic models by epigenetic local search , 2016, Eng. Appl. Artif. Intell..

[36]  R. Bapat,et al.  The generalized Moore-Penrose inverse , 1992 .