Comparison between Explicit Learning and Implicit Modeling of Relational Features in Structured Output Spaces

Building relational models for the structured output classification problem of sequence labeling has been recently explored in a few research works. The models built in such a manner are interpretable and capture much more information about the domain (than models built directly from basic attributes), resulting in accurate predictions. On the other hand, discovering optimal relational features is a hard task, since the space of relational features is exponentially large. An exhaustive search in this exponentially large feature space is infeasible. Therefore, often the feature space is explored using heuristics. Recently, we proposed a Hierarchical Kernels-based feature learning approach (StructHKL) for sequence labeling [?], that optimally learns emission features in the form of conjunctions of basic inputs at a sequence position. However, StructHKL cannot be trivially applied to learn complex relational features derived from relative sequence positions. In this paper, we seek to learn optimal relational sequence labeling models by leveraging a relational kernel that computes the similarity between instances in an implicit space of relational features. To this end, we employ relational subsequence kernels at each sequence position (over a time window of observations around the pivot position) for the classification model. While this method of modeling does not result in interpretability, relational subsequence kernels do efficiently capture relational sequential information on the inputs. We present experimental comparison between approaches for explicit learning and implicit modeling of relational features and explain the trade-offs therein.

[1]  L. De Raedt,et al.  Logical Hidden Markov Models , 2011, J. Artif. Intell. Res..

[2]  Hiroyuki Goto,et al.  Efficient Scheduling Focusing on the Duality of MPL Representation , 2007, 2007 IEEE Symposium on Computational Intelligence in Scheduling.

[3]  Andrew McCallum,et al.  Efficiently Inducing Features of Conditional Random Fields , 2002, UAI.

[4]  Ganesh Ramakrishnan,et al.  Probing the Space of Optimal Markov Logic Networks for Sequence Labeling , 2012, ILP.

[5]  Katsumi Inoue,et al.  SOLAR: A Consequence Finding System for Advanced Reasoning , 2003, TABLEAUX.

[6]  Alex Waibel,et al.  Readings in speech recognition , 1990 .

[7]  Patrick D. Surry,et al.  Differential Response Analysis: Modeling True Responses by Isolating the Effect of a Single Action , 1999 .

[8]  David Page,et al.  Relational Differential Prediction , 2012, ECML/PKDD.

[9]  Stephen Muggleton,et al.  An Experimental Comparison of Human and Machine Learning Formalisms , 1989, ML.

[10]  Behram Hansotia,et al.  Incremental value modeling , 2002 .

[11]  Pavel Brazdil,et al.  Proceedings of the European Conference on Machine Learning , 1993 .

[12]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[13]  Thomas G. Dietterich Machine Learning for Sequential Data: A Review , 2002, SSPR/SPR.

[14]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[15]  Liisa Välikangas,et al.  The quest for resilience. , 2003, Harvard business review.

[16]  Yuke Zhu,et al.  Modelling relational statistics with Bayes Nets , 2013, Machine Learning.

[17]  Federico Divina,et al.  Handling continuous attributes in an evolutionary inductive learner , 2005, IEEE Transactions on Evolutionary Computation.

[18]  Eugénio C. Oliveira,et al.  Improving Numerical Reasoning Capabilities of Inductive Logic Programming Systems , 2004, IBERAMIA.

[19]  Hannu Toivonen,et al.  Discovery of frequent DATALOG patterns , 1999, Data Mining and Knowledge Discovery.

[20]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[21]  Johanna Völker,et al.  Inductive Learning of Disjointness Axioms , 2011, OTM Conferences.

[22]  Luc De Raedt,et al.  Stochastic relational processes: Efficient inference and applications , 2011, Machine Learning.

[23]  Evelina Lamma,et al.  Epistemic and Statistical Probabilistic Ontologies , 2012, URSW.

[24]  Patrick D. Surry,et al.  Real-World Uplift Modelling with Significance-Based Uplift Trees , 2012 .

[25]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[26]  Evelina Lamma,et al.  Probabilistic Ontologies in Datalog+/- , 2012, CILC.

[27]  Gert Smolka,et al.  Attributive Concept Descriptions with Complements , 1991, Artif. Intell..

[28]  Ingo Thon,et al.  Don't Fear Optimality: Sampling for Probabilistic-Logic Sequence Models , 2009, ILP.

[29]  Rui Camacho,et al.  Experiments in Numerical Reasoning with Inductive Logic Programming , 2007 .

[30]  Kristian Kersting,et al.  TildeCRF: Conditional Random Fields for Logical Sequences , 2006, ECML.

[31]  Stefano Ferilli,et al.  Feature Construction for Relational Sequence Learning , 2010, ArXiv.

[32]  Ganesh Ramakrishnan,et al.  Rule Ensemble Learning Using Hierarchical Kernels in Structured Output Spaces , 2012, AAAI.

[33]  Ben Taskar,et al.  Structured Prediction, Dual Extragradient and Bregman Projections , 2006, J. Mach. Learn. Res..

[34]  Rajesh P. N. Rao,et al.  Fast Structured Prediction Using Large Margin Sigmoid Belief Networks , 2012, International Journal of Computer Vision.

[35]  Kate Revoredo,et al.  Learning Probabilistic Description Logics: A Framework and Algorithms , 2011, MICAI.

[36]  David Page,et al.  Score As You Lift (SAYL): A Statistical Relational Learning Approach to Uplift Modeling , 2013, ECML/PKDD.

[37]  David Page,et al.  Uncovering age-specific invasive and DCIS breast cancer rules using inductive logic programming , 2010, IHI.

[38]  Fabrizio Riguzzi,et al.  Experimentation of an expectation maximization algorithm for probabilistic logic programs , 2012, Intelligenza Artificiale.

[39]  Evelina Lamma,et al.  BUNDLE: A Reasoner for Probabilistic Ontologies , 2013, RR.

[40]  Szymon Jaroszewicz,et al.  Decision trees for uplift modeling with single and multiple treatments , 2011, Knowledge and Information Systems.

[41]  Luc De Raedt,et al.  Relational Transformation-based Tagging for Activity Recognition , 2008, Fundam. Informaticae.

[42]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[43]  Saso Dzeroski,et al.  Learning Nonrecursive Definitions of Relations with LINUS , 1991, EWSL.

[44]  Jr. G. Forney,et al.  Viterbi Algorithm , 1973, Encyclopedia of Machine Learning.

[45]  Fabrizio Riguzzi,et al.  Expectation maximization over binary decision diagrams for probabilistic logic programs , 2013, Intell. Data Anal..

[46]  Walter Daelemans,et al.  Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4 , 2003 .

[47]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[48]  N. Christakis,et al.  Social Network Sensors for Early Detection of Contagious Outbreaks , 2010, PloS one.

[49]  Ruggero G. Pensa,et al.  Context-Based Distance Learning for Categorical Data Clustering , 2009, IDA.

[50]  Victor S. Y. Lo The true lift model: a novel data mining approach to response modeling in database marketing , 2002, SKDD.

[51]  Ondrej Kuzelka,et al.  Predicting Top-k Trends on Twitter using Graphlets and Time Features , 2013, ILP.

[52]  Luc De Raedt,et al.  Lookahead and Discretization in ILP , 1997, ILP.

[53]  Jesse Davis,et al.  An Integrated Approach to Learning Bayesian Networks of Rules , 2005, ECML.

[54]  Razvan C. Bunescu,et al.  Subsequence Kernels for Relation Extraction , 2005, NIPS.

[55]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[56]  Evelina Lamma,et al.  Parameter Learning for Probabilistic Ontologies , 2013, RR.

[57]  Taisuke Sato,et al.  A Statistical Learning Method for Logic Programs with Distribution Semantics , 1995, ICLP.

[58]  Gwenn Englebienne,et al.  Accurate activity recognition in a home setting , 2008, UbiComp.

[59]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[60]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[61]  Evelina Lamma,et al.  A Distribution Semantics for Probabilistic Ontologies , 2011, URSW.

[62]  Marco Botta,et al.  SMART+: A Multi-Strategy Learning Tool , 1993, IJCAI.

[63]  Hee Yong Youn,et al.  Proceedings of the 10th international conference on Ubiquitous computing , 2008, UbiComp 2008.

[64]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[65]  Ganesh Ramakrishnan,et al.  Enhancing Activity Recognition in Smart Homes Using Feature Induction , 2011, DaWaK.

[66]  Ashwin Srinivasan,et al.  Mutagenesis: ILP experiments in a non-determinate biological domain , 1994 .

[67]  Evelina Lamma,et al.  Probabilistic Datalog+/- under the Distribution Semantics , 2012, Description Logics.

[68]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[69]  Tijana Milenkoviæ,et al.  Uncovering Biological Network Function via Graphlet Degree Signatures , 2008, Cancer informatics.

[70]  C. S. Holling Resilience and Stability of Ecological Systems , 1973 .

[71]  Wei Li,et al.  Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons , 2003, CoNLL.