Scaling Log-Linear Analysis to High-Dimensional Data

Association discovery is a fundamental data mining task. The primary statistical approach to association discovery between variables is log-linear analysis. Classical approaches to log-linear analysis do not scale beyond about ten variables. We develop an efficient approach to log-linear analysis that scales to hundreds of variables by melding the classical statistical machinery of log-linear analysis with advanced data mining techniques from association discovery and graphical modeling.

[1]  Pinar Heggernes,et al.  Minimal triangulations of graphs: A survey , 2006, Discret. Math..

[2]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[3]  Michael I. Jordan,et al.  Efficient Stepwise Selection in Decomposable Models , 2001, UAI.

[4]  S. Haberman,et al.  The analysis of frequency data , 1974 .

[5]  L. Satyanarayan,et al.  Cellular polypropylene polymer foam as air-coupled ultrasonic transducer materials , 2010, IEEE Transactions on Ultrasonics, Ferroelectrics and Frequency Control.

[6]  P. Bühlmann,et al.  Decomposition and Model Selection for Large Contingency Tables , 2009, Biometrical journal. Biometrische Zeitschrift.

[7]  Daphne Koller,et al.  Efficient Structure Learning of Markov Networks using L1-Regularization , 2006, NIPS.

[8]  Pedro M. Domingos,et al.  Learning Efficient Markov Networks , 2010, NIPS.

[9]  R B Wallace,et al.  Demographic and health characteristics of elderly smokers: results from established populations for epidemiologic studies of the elderly. , 1990, American journal of preventive medicine.

[10]  Sung-Ho Kim,et al.  Searching Model Structures Based on Marginal Model Structures , 2008, ICRA 2008.

[11]  Xintao Wu,et al.  Screening and interpreting multi-item associations based on log-linear modeling , 2003, KDD '03.

[12]  Kevin B. Korb,et al.  Incorporating expert knowledge when learning Bayesian network structure: A medical case study , 2011, Artif. Intell. Medicine.

[13]  Kevin B. Korb,et al.  Causal Discovery via MML , 1996, ICML.

[14]  Stephen E. Fienberg,et al.  Maximum likelihood estimation in log-linear models , 2011, 1104.3618.

[15]  Anne Berry,et al.  Maximal sub-triangulation in pre-processing phylogenetic data , 2003, Soft Comput..

[16]  Rajeev Rastogi,et al.  Independence is good: dependency-based histogram synopses for high-dimensional data , 2001, SIGMOD '01.

[17]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[18]  Francesco M. Malvestuto,et al.  Approximating discrete probability distributions with decomposable models , 1991, IEEE Trans. Syst. Man Cybern..

[19]  Robert B. Wallace,et al.  Established Populations for Epidemiologic Studies of the Elderly, 1981-1993: [East Boston, Massachusetts, Iowa and Washington Counties, Iowa, New Haven, Connecticut, and North Central North Carolina] , 1993 .

[20]  Mohammed J. Zaki,et al.  Fast vertical mining using diffsets , 2003, KDD '03.

[21]  A Hofman,et al.  Relation between smoking and risk of dementia and Alzheimer disease , 2007, Neurology.

[22]  Wai Lam,et al.  LEARNING BAYESIAN BELIEF NETWORKS: AN APPROACH BASED ON THE MDL PRINCIPLE , 1994, Comput. Intell..

[23]  Geoffrey I. Webb Layered critical values: a powerful direct-adjustment approach to discovering significant patterns , 2008, Machine Learning.

[24]  J. Lafferty,et al.  High-dimensional Ising model selection using ℓ1-regularized logistic regression , 2010, 1010.0311.

[25]  Anne Berry,et al.  A simple algorithm to generate the minimal separators and the maximal cliques of a chordal graph , 2011, Inf. Process. Lett..