Network Motif Model: An Efficient Approach for Extracting Features from Relational Data

This paper proposes the Network Motif Model (NMM), a novel and efficient approach for extracting features from relational data. First, our approach constructs a data network according to the data relation. Then significant sub-graphs are identified by extracting the basic network motifs from the data network, inspired by the motif concepts of complex network. At last, the first-order information of original data can be integrated with extracted significant sub-graphs to create the network motif features of relational data. Since basic motifs are easy to detect, the computation is efficient. Also, this kind of feature extraction not only preserves the relation of the data, but also keeps the label information of original data. Our experiments show that NMM has better classification accuracy than some inductive logic programming methods and probabilistic relational models. Thus, this model can be a potentially useful feature extraction strategy for statistical learning on Multi-relational data.

[1]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[2]  U. Alon,et al.  Subgraphs and network motifs in geometric networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[3]  Peter Donnelly,et al.  Superfamilies of Evolved and Designed Networks , 2004 .

[4]  S. Mangan,et al.  The coherent feedforward loop serves as a sign-sensitive delay element in transcription networks. , 2003, Journal of molecular biology.

[5]  Albert-László Barabási,et al.  Aggregation of topological motifs in the Escherichia coli transcriptional regulatory network , 2004, BMC Bioinformatics.

[6]  Ben Taskar,et al.  Learning Probabilistic Models of Link Structure , 2003, J. Mach. Learn. Res..

[7]  Peter A. Flach,et al.  Naive Bayesian Classification of Structured Data , 2004, Machine Learning.

[8]  George Karypis,et al.  Frequent Substructure-Based Approaches for Classifying Chemical Compounds , 2005, IEEE Trans. Knowl. Data Eng..

[9]  Luc De Raedt,et al.  Inductive Logic Programming: Theory and Methods , 1994, J. Log. Program..

[10]  R. Milo,et al.  Network motifs in integrated cellular networks of transcription-regulation and protein-protein interaction. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Nir Friedman,et al.  Being Bayesian About Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks , 2004, Machine Learning.

[12]  Henrik Boström,et al.  Combining Divide-and-Conquer and Separate-and-Conquer for Efficient and Effective Rule Induction , 1999, ILP.

[13]  Hyunsoo Kim,et al.  Dimension Reduction in Text Classification with Support Vector Machines , 2005, J. Mach. Learn. Res..

[14]  Xiao Fan Wang,et al.  Complex Networks: Topology, Dynamics and Synchronization , 2002, Int. J. Bifurc. Chaos.

[15]  Ashwin Srinivasan,et al.  Biochemical Knowledge Discovery Using Inductive Logic Programming , 1998, Discovery Science.

[16]  Richard A. Lewis,et al.  Drug design by machine learning: the use of inductive logic programming to model the structure-activity relationships of trimethoprim analogues binding to dihydrofolate reductase. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Susanne Hoche,et al.  Scaling Boosting by Margin-Based Inclusionof Features and Relations , 2002, ECML.

[18]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[19]  Takashi Washio,et al.  Complete Mining of Frequent Patterns from Graphs: Mining Graph Data , 2003, Machine Learning.

[20]  Bernard F. Buxton,et al.  Drug Design by Machine Learning: Support Vector Machines for Pharmaceutical Data Analysis , 2001, Comput. Chem..

[21]  S. Mangan,et al.  Article number: 2005.0006 , 2022 .