MapReduce-Based Network Motif Detection for Traditional Chinese Medicine

Network motifs are basic building blocks in complex networks. Motif detection has recently attracted much attention as a topic to uncover structural design principles of complex networks. Pattern finding is the most computationally expensive step in the process of motif detection. In this chapter, we design a pattern-finding algorithm based on Google MapReduce to improve the efficiency for analyzing the complex network. We reorganized the pattern-finding process and implemented each step using the MapReduce framework, which makes MRPF (MapReduce-based pattern finding) parallelizable and extensible. MRPF framework aims to implement frequent pattern finding on complex graphs based on Hadoop, which can be divided into four steps: distributed storage, neighbor vertices finding and pattern initialization, pattern extension, and frequency computing. Performance evaluation shows our algorithm can facilitate the detection of larger motifs in large-sized networks and has good scalability. We apply it in the prescription network and find some commonly used prescription network motifs that provide the possibility to discover further the law of prescription compatibility.

[1]  George Karypis,et al.  Finding Frequent Patterns in a Large Sparse Graph* , 2005, Data Mining and Knowledge Discovery.

[2]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[3]  Xuezhong Zhou,et al.  Distributional Character Clustering for Chinese Text Categorization , 2004, PRICAI.

[4]  Mong-Li Lee,et al.  NeMoFinder: dissecting genome-wide protein-protein interactions with meso-scale network motifs , 2006, KDD '06.

[5]  Zhaohui Wu,et al.  Knowledge discovery in traditional Chinese medicine: State of the art and perspectives , 2006, Artif. Intell. Medicine.

[6]  Tan Ying Mining Compatibility Rules from Irregular Chinese Traditional Medicine Database by Apriori Agorithm , 2007 .

[7]  Ehud Gudes,et al.  Discovering Frequent Graph Patterns Using Disjoint Paths , 2006, IEEE Transactions on Knowledge and Data Engineering.

[8]  George Karypis,et al.  An efficient algorithm for discovering frequent subgraphs , 2004, IEEE Transactions on Knowledge and Data Engineering.

[9]  Lawrence B. Holder,et al.  Substructure Discovery Using Minimum Description Length and Background Knowledge , 1993, J. Artif. Intell. Res..

[10]  Hiroshi Motoda,et al.  Graph-based induction as a unified learning framework , 1994, Applied Intelligence.

[11]  Junli Chen,et al.  Text Mining for Finding Functional Community of Related Genes Using TCM Knowledge , 2004, PKDD.

[12]  Wei Wang,et al.  Efficient mining of frequent subgraphs in the presence of isomorphism , 2003, Third IEEE International Conference on Data Mining.

[13]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[14]  Yung-Hsien Chang,et al.  Clinical evaluation of the traditional chinese prescription Chi‐Ju‐Di‐Huang‐Wan for Dry Eye , 2005, Phytotherapy research : PTR.

[15]  Kunle Olukotun,et al.  Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[16]  Edward Y. Chang,et al.  Parallelizing Support Vector Machines on Distributed Computers , 2007, NIPS.

[17]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[18]  Takashi Washio,et al.  Complete Mining of Frequent Patterns from Graphs: Mining Graph Data , 2003, Machine Learning.

[19]  Jiawei Han,et al.  gApprox: Mining Frequent Approximate Patterns from a Massive Network , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[20]  Hongbin Xiao,et al.  New method for analysis of Chinese herbal complex prescription and its application , 1999 .

[21]  Wei Wang,et al.  An Efficient Algorithm of Frequent Connected Subgraph Extraction , 2003, PAKDD.