The Sparse Regression Cube: A Reliable Modeling Technique for Open Cyber-Physical Systems

Understanding the end-to-end behavior of complex systems where computing technology interacts with physical world properties is a core challenge in cyber-physical computing. This paper develops a hierarchical modeling methodology for open cyber-physical systems that combines techniques in estimation theory with those in data mining to reliably capture complex system behavior at different levels of abstraction. Our technique is also novel in the sense that it provides a measure of confidence in predictions. An application to green transportation is discussed, where the goal is to reduce vehicular fuel consumption and carbon footprint. First-principle models of cyber-physical systems can be very complex and include a large number of parameters, whereas empirical regression models are often unreliable when a high number of parameters is involved. Our new modeling technique, called the Sparse Regression Cube, simultaneously (i) partitions sparse, high-dimensional measurements into subspaces within which reliable linear regression models apply and (ii) determines the best reliable model for each partition, quantifying uncertainty in output prediction. Evaluation results show that the framework significantly improves modeling accuracy compared to previous approaches and correctly quantifies prediction error, while maintaining high efficiency and scalability.

[1]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[2]  Yixin Chen,et al.  Compression and Aggregation for Logistic Regression Analysis in Data Cubes , 2009, IEEE Transactions on Knowledge and Data Engineering.

[3]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[4]  V. Barnett,et al.  Applied Linear Statistical Models , 1975 .

[5]  Casimir A. Kulikowski,et al.  Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning and Expert Systems , 1990 .

[6]  Yi Lin,et al.  Prediction Cubes , 2005, VLDB.

[7]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[8]  Raghu Ramakrishnan,et al.  Exploratory mining in cube space , 2006, Data Mining and Knowledge Discovery.

[9]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[10]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[11]  Jae-Gil Lee,et al.  Sampling cube: a framework for statistical olap over sampling data , 2008, SIGMOD Conference.

[12]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[13]  Michael H. Kutner Applied Linear Statistical Models , 1974 .

[14]  Yixin Chen,et al.  Regression Cubes with Lossless Compression and Aggregation , 2006, IEEE Transactions on Knowledge and Data Engineering.

[15]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[16]  Annette J. Dobson,et al.  An introduction to generalized linear models , 1991 .

[17]  Tarek F. Abdelzaher,et al.  GreenGPS: a participatory sensing fuel-efficient maps application , 2010, MobiSys '10.

[18]  Malik Beshir Malik,et al.  Applied Linear Regression , 2005, Technometrics.

[19]  Raghu Ramakrishnan,et al.  Bottom-up computation of sparse and Iceberg CUBE , 1999, SIGMOD '99.