Inverse predictions on continuous models in scientific databases

Using continuous models in scientific databases has received an increased attention in the last years. It allows for a more efficient and accurate querying, as well as predictions of the outputs even where no measurements were performed. The most common queries are on how the output looks like for a given input setting. In this paper we study inverse model-based queries on continuous models, where one specifies a desired output and searches for the appropriate input setting, which falls into the reverse engineering category. We propose two possible approaches. The first one is an extension of the inverse regression paradigm. But simply switching the roles of input and output variables poses new challenges, which we overcome by using partial least squares. The second approach formulates the inverse prediction queries as linear optimization problems. We show that even though these two approaches seem completely different, they are closely related, and that the latter is more general. It facilitates the formulation of a wide range of queries, with specifications of fixed values and ranges in both input and output space, enabling the intuitive exploration of the experimental data and understanding the underlying process.

[1]  Aram Karalic Linear Regression in Regression Tree Leaves , 1992 .

[2]  Samuel Madden,et al.  MauveDB: supporting model-based user views in database systems , 2006, SIGMOD Conference.

[3]  J. Friedman Multivariate adaptive regression splines , 1990 .

[4]  Dan Suciu,et al.  Tiresias: the database oracle for how-to queries , 2012, SIGMOD Conference.

[5]  Leo Breiman,et al.  Hinging hyperplanes for regression, classification, and function approximation , 1993, IEEE Trans. Inf. Theory.

[6]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[7]  Michel Verleysen,et al.  Nonlinear Dimensionality Reduction , 2021, Computer Vision.

[8]  Dan Suciu,et al.  Reverse data management , 2011, Proc. VLDB Endow..

[9]  Boi Faltings,et al.  Global Consistency for Continuous Constraints , 1994, ECAI.

[10]  Amol Deshpande,et al.  Online Filtering, Smoothing and Probabilistic Modeling of Streaming data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[11]  Samuel Madden,et al.  Querying continuous functions in a database system , 2008, SIGMOD Conference.

[12]  Surajit Chaudhuri,et al.  Efficient evaluation of queries with mining predicates , 2002, Proceedings 18th International Conference on Data Engineering.

[13]  Richard F. Gunst,et al.  Applied Regression Analysis , 1999, Technometrics.

[14]  Wei Hong,et al.  Model-based approximate querying in sensor networks , 2005, The VLDB Journal.

[15]  Roman Rosipal,et al.  Overview and Recent Advances in Partial Least Squares , 2005, SLSFS.

[16]  A. Höskuldsson PLS regression methods , 1988 .

[17]  Wei Hong,et al.  Model-Driven Data Acquisition in Sensor Networks , 2004, VLDB.

[18]  Peter Z. Revesz,et al.  Constraint Databases: A Survey , 1995, Semantics in Databases.