Factor Graphs for Relational Regression

Traditional methods for supervised learning treat the input data as a set of independent and identically distributed points in a high-dimensional space. These methods completely ignore the rich underlying relational structure that might be inherent in many important problems. For instance, the data samples may be related to each other in ways such that the unknown variables associated with any sample not only depends on its individual attributes, but also depends on the variables associated with related samples. One regression problem of this nature, whose importance is emphasized by the present economic crises, is understanding real estate prices. The price of a house clearly depends on its individual attributes, such as, the number of bedrooms. However, the price also depends on the neighborhood in which the house lies and on the time period in which it was sold. This effect of neighborhood and time on the price is not directly measurable. It is merely reflected in the prices of other houses in the vicinity that were sold around the same time period. Uncovering and using these spatio-temporal dependencies can certainly help better understand house prices, while at the same time improving prediction accuracy. The models used to achieve this task fall in the class of Statistical Relational Learning. The underlying probabilistic graphical model takes as input a single instance of the entire collection of samples along with their relationship structure. The dependencies among samples is learnt with the help of parameter sharing and collective inference. The drawback of most such models proposed so far is that they cater only to classification problems. To this end, we propose a relational factor graph framework for doing regression in relational data. A single factor graph is used to capture, one, dependencies among individual variables of data points, and two, dependencies among variables associated with multiple data points. The proposed models are capable of capturing hidden inter-sample dependencies via latent variables. They also allow for log-likelihood functions that are non-linear in parameter space thereby allowing for considerably more complex architectures. Efficient inference and learning algorithms are proposed. The models are applied to predicting the prices of real estate properties. A by-product of it is a house price index. The relational aspect of the model accounts for the hidden spatio-temporal influences on the price of every house. The experiments show that one can achieve considerably superior performance by identifying and using the underlying spatio-temporal structure associated with the problem. To the best of our knowledge this is the first work in the direction of relational regression, especially in the frame-based class of statistical relational learning models. Furthermore, this is also the first work in constructing house price indices by simultaneously accounting for the spatio-temporal effects on house prices using large-scale industry standard data set.

[1]  D. Cox,et al.  An Analysis of Transformations , 1964 .

[2]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[3]  J. Besag Statistical Analysis of Non-Lattice Data , 1975 .

[4]  A. Goodman Hedonic prices, price indices and housing markets , 1978 .

[5]  R. Halvorsen,et al.  Choice of functional form for hedonic price equations , 1981 .

[6]  J. Besag On the Statistical Analysis of Dirty Pictures , 1986 .

[7]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Esther Levin,et al.  Accelerated Learning in Layered Neural Networks , 1988, Complex Syst..

[9]  R. Shiller,et al.  The Efficiency of the Market for Single-Family Homes , 1988 .

[10]  Hervé Bourlard,et al.  A Continuous Speech Recognition System Embedding MLP into HMM , 1989, NIPS.

[11]  Ken-ichi Iso,et al.  Speaker-independent word recognition using dynamic programming neural networks , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[12]  Douglas H. Fisher,et al.  Noise-Tolerant Conceptual Clustering , 1989, IJCAI.

[13]  Aysegul Can The Measurement of Neighborhood Dynamics in Urban House Prices , 1990 .

[14]  Yariv Ephraim,et al.  Estimation of hidden Markov model parameters by minimizing empirical error rate , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[15]  Hervé Bourlard HOW CONNECTIONIST MODELS COULD IMPROVE MARKOV MODELS FOR SPEECH RECOGNITION , 1990 .

[16]  Patrick Gallinari,et al.  A Framework for the Cooperation of Learning Algorithms , 1990, NIPS.

[17]  H. Bourlard,et al.  Links Between Markov Models and Multilayer Perceptrons , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  A. Waibel,et al.  Connectionist Viterbi training: a new hybrid method for continuous speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[19]  Alexander H. Waibel,et al.  Time-delay neural networks embedding time alignment: a performance analysis , 1991, EUROSPEECH.

[20]  Alex Waibel,et al.  Integrating time alignment and neural networks for high performance continuous speech recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[21]  Alexander H. Waibel,et al.  Multi-State Time Delay Networks for Continuous Speech Recognition , 1991, NIPS.

[22]  Xiao-Ping Zheng Metropolitan Spatial Structure and its Determinants: A Case-study of Tokyo , 1991 .

[23]  P. Haffner,et al.  Multi-State Time Delay Neural Networks for Continuous Speech Recognition , 1991 .

[24]  Patrick Gallinari,et al.  COMPARISON AND COOPERATION OF SEVERAL CLASSIFIERS , 1991 .

[25]  R. Meese,et al.  Nonparametric Estimation of Dynamic Hedonic Price Models and the Construction of Residential Housing Price Indices , 1991 .

[26]  P. Gallinari,et al.  Empirical risk optimisation: neural networks and dynamic programming , 1992, Neural Networks for Signal Processing II Proceedings of the 1992 IEEE Workshop.

[27]  V. S. Subrahmanian,et al.  Probabilistic Logic Programming , 1992, Inf. Comput..

[28]  R. Dubin Spatial autocorrelation and neighborhood quality , 1992 .

[29]  P. Gallinari,et al.  A speech recognizer optimally combining learning vector quantization, dynamic programming and multi-layer perceptron , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[30]  Shigeru Katagiri,et al.  Prototype-based discriminative training for various speech units , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[31]  Aysegul Can Specification and estimation of hedonic housing price models , 1992 .

[32]  Jean Tague-Sutcliffe,et al.  An Introduction to Informetrics , 1992, Inf. Process. Manag..

[33]  Yoshua Bengio,et al.  Global optimization of a neural network-hidden Markov model hybrid , 1992, IEEE Trans. Neural Networks.

[34]  Léon Bottou,et al.  Local Algorithms for Pattern Recognition and Dependencies Estimation , 1993, Neural Computation.

[35]  Patrick Haffner,et al.  Connectionist speech recognition with a global MMI algorithm , 1993, EUROSPEECH.

[36]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[37]  David Poole,et al.  Probabilistic Horn Abduction and Bayesian Networks , 1993, Artif. Intell..

[38]  Yoshua Bengio,et al.  An Input Output HMM Architecture , 1994, NIPS.

[39]  Christopher J. C. Burges,et al.  Image Segmentation and Recognition , 1994 .

[40]  Xavier Driancourt Optimisation par descente de gradient stochastique de systemes modulaires combinant reseaux de neurones et programmation dynamique. Application a la reconnaissance de la parole , 1994 .

[41]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[42]  Taisuke Sato,et al.  A Statistical Learning Method for Logic Programs with Distribution Semantics , 1995, ICLP.

[43]  Yochai Konig,et al.  REMAP: Recursive Estimation and Maximization of A Posteriori Probabilities - Application to Transition-Based Connectionist Speech Recognition , 1995, NIPS.

[44]  Yoshua Bengio,et al.  Input-output HMMs for sequence processing , 1996, IEEE Trans. Neural Networks.

[45]  Yoshua Bengio,et al.  Neural networks for speech and sequence recognition , 1996 .

[46]  P. Anglin,et al.  SEMIPARAMETRIC ESTIMATION OF A HEDONIC PRICE FUNCTION , 1996 .

[47]  David Poole,et al.  The Independent Choice Logic for Modelling Multiple Agents Under Uncertainty , 1997, Artif. Intell..

[48]  Erik McDermott,et al.  Discriminative Training for Speech Recognition , 1997 .

[49]  Peter Haddawy,et al.  Answering Queries from Context-Sensitive Probabilistic Knowledge Bases , 1997, Theor. Comput. Sci..

[50]  Otis W. Gilley,et al.  Using the Spatial Configuration of the Data to Improve Estimation , 1997 .

[51]  William N. Goetzmann,et al.  A Spatial Model of Housing Returns and Neighborhood Substitutability , 1997 .

[52]  Mehryar Mohri,et al.  Finite-State Transducers in Language and Speech Processing , 1997, Comput. Linguistics.

[53]  Ronald P. Barry,et al.  Spatiotemporal Autoregressive Models of Neighborhood Effects , 1998 .

[54]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[55]  Piotr Indyk,et al.  Enhanced hypertext categorization using hyperlinks , 1998, SIGMOD '98.

[56]  Yann LeCun,et al.  DjVu: a Compression Method for Distributing Scanned Documents in Color over the Internet , 1998, Color Imaging Conference.

[57]  S. Basu,et al.  Analysis of Spatial Autocorrelation in House Prices , 1998 .

[58]  Avi Pfeffer,et al.  Probabilistic Frame-Based Systems , 1998, AAAI/IAAI.

[59]  M. KleinbergJon Authoritative sources in a hyperlinked environment , 1999 .

[60]  Lise Getoor,et al.  Learning Probabilistic Relational Models , 1999, IJCAI.

[61]  Thomas Hofmann,et al.  Latent Class Models for Collaborative Filtering , 1999, IJCAI.

[62]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[63]  Stephen Muggleton,et al.  Learning Stochastic Logic Programs , 2000, Electron. Trans. Artif. Intell..

[64]  David A. Cohn,et al.  The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity , 2000, NIPS.

[65]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[66]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[67]  David Haussler,et al.  A Discriminative Framework for Detecting Remote Protein Homologies , 2000, J. Comput. Biol..

[68]  Tom M. Mitchell,et al.  Discovering Test Set Regularities in Relational Domains , 2000, ICML.

[69]  K. Kersting,et al.  Interpreting Bayesian Logic Programs , 2000 .

[70]  Daniel Povey,et al.  Large scale discriminative training for speech recognition , 2000 .

[71]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[72]  Jennifer Neville,et al.  Iterative Classification in Relational Data , 2000 .

[73]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[74]  Ben Taskar,et al.  Probabilistic Classification and Clustering in Relational Data , 2001, IJCAI.

[75]  Nghiep Nguyen,et al.  Predicting Housing Value: A Comparison of Multiple Regression Analysis and Artificial Neural Networks , 2001 .

[76]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[77]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[78]  Ben Taskar,et al.  Discriminative Probabilistic Models for Relational Data , 2002, UAI.

[79]  T. Kauko Modelling the locational determinants of house prices: neural network and value tree approaches , 2002 .

[80]  Martial Hebert,et al.  Discriminative Fields for Modeling Spatial Dependencies in Natural Images , 2003, NIPS.

[81]  J. Clapp A Semi Parametric Method for Estimating Local House Price Indices , 2003 .

[82]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[83]  Thomas Hofmann,et al.  Investigating Loss Functions and Optimization Methods for Discriminative Learning of Label Sequences , 2003, EMNLP.

[84]  Léon Bottou,et al.  Stochastic Learning , 2003, Advanced Lectures on Machine Learning.

[85]  Thomas Hofmann,et al.  Large margin methods for label sequence learning , 2003, INTERSPEECH.

[86]  Yee Whye Teh,et al.  Energy-Based Models for Sparse Overcomplete Representations , 2003, J. Mach. Learn. Res..

[87]  T. Thibodeau Marking Single–Family Property Values to Market , 2003 .

[88]  David Heckerman,et al.  Probabilistic Models for Relational Data , 2004 .

[89]  Yann LeCun,et al.  Synergistic Face Detection and Pose Estimation with Energy-Based Models , 2004, J. Mach. Learn. Res..

[90]  A. Gelfand,et al.  The Dynamics of Location in Home Price , 2004 .

[91]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[92]  Michael Collins,et al.  Discriminative Reranking for Natural Language Parsing , 2000, CL.

[93]  William T. Freeman,et al.  Constructing free-energy approximations and generalized belief propagation algorithms , 2005, IEEE Transactions on Information Theory.

[94]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[95]  Yann LeCun,et al.  Loss Functions for Discriminative Training of Energy-Based Models , 2005, AISTATS.

[96]  Dongning Guo,et al.  Belief propagation is asymptotically equivalent to MAP estimation for sparse linear systems , 2006 .

[97]  Jason Weston,et al.  Trading convexity for scalability , 2006, ICML.

[98]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[99]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .

[100]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[101]  Mark W. Schmidt,et al.  Accelerated training of conditional random fields with stochastic gradient methods , 2006, ICML.

[102]  Jennifer Neville,et al.  Relational Dependency Networks , 2007, J. Mach. Learn. Res..