A new logic correlation rule for HIV-1 protease mutation

Research highlights? In this paper, a pre and post-processing approach through discovering a logic correlation rule to the mutations of the HIV-1 protease sequences form HIV Drug Resistance Database by combining Apriori-base method and Bool-ean function simplification technique called Apriori_BFS method is presented. ? Statistically, the correlation can show whether and how strongly pairs of variables are related. In this study, the logic correlation rule indicates the logic relation between two or more variables; it does not indicate that one variable ''caused'' the other, just which as one variable changes, the other changes likewise. ? To design effective HIV protease inhibitors, identifying cleaved HIV subtype is very crucial. Therefore, this study attempted to distinguish the diversity relationship of mutations between subtypes of HIV-1 by discovering the logic correlation rules. Mining association rule in a large database is a technique for finding relations among attributes. In the last decade, most studies have been devoted to boost the efficiency, but few of them have been concentrated on the analysis of logic correlation among variables. Furthermore, mining association rules in a large database, when applied on a bio-sequence data set, is generally medically irrelevant and difficult to analyze. In this paper, a pre and post-processing approach through discovering a logic correlation rule by combining Apriori-based method and Boolean function simplification technique called Apriori_BFS method is presented. The objective of the proposed method is to effectively reduce the number of rules and present an integration logic correlation rule to readers. The experiment was conducted by using a real-world case, the HIV Drug Resistance Database, and its results unveil that the proposed method, Apriori_BFS, can not only present the logic correlation among variables but also provide more condensed rules than the Apriori method alone.

[1]  Laks V. S. Lakshmanan,et al.  Exploratory mining and pruning optimizations of constrained associations rules , 1998, SIGMOD '98.

[2]  Yueh-Min Huang,et al.  CSAM: Using Clustering-Hashing-Signal Anchoring Method to Explore Human Novel Genes , 2006, J. Comput. Biol..

[3]  Sen Zhang,et al.  Discovering Frequent Agreement Subtrees from Phylogenetic Data , 2008, IEEE Transactions on Knowledge and Data Engineering.

[4]  Tzung-Pei Hong,et al.  Multi-level fuzzy mining with multiple minimum supports , 2008, Expert Syst. Appl..

[5]  Bradley J. Betts,et al.  Human immunodeficiency virus reverse transcriptase and protease sequence database. , 2003, Nucleic acids research.

[6]  Heikki Mannila,et al.  Pruning and grouping of discovered association rules , 1995 .

[7]  Jian Pei,et al.  Mining Access Patterns Efficiently from Web Logs , 2000, PAKDD.

[8]  Sunita Sarawagi,et al.  Integrating association rule mining with relational database systems: alternatives and implications , 1998, SIGMOD '98.

[9]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[10]  Charu C. Aggarwal,et al.  A Tree Projection Algorithm for Generation of Frequent Item Sets , 2001, J. Parallel Distributed Comput..

[11]  Eldon Y. Li,et al.  Mining protein-protein interaction information on the internet , 2006, Expert Syst. Appl..

[12]  Wynne Hsu,et al.  Pruning and summarizing the discovered associations , 1999, KDD '99.

[13]  M. Wainberg,et al.  HIV-1 subtype distribution and the problem of drug resistance , 2004, AIDS.

[14]  Sang-Won Lee,et al.  mBAR: A Materialized Bitmap Based Association Rule Algorithm , 2005, 21st International Conference on Data Engineering Workshops (ICDEW'05).

[15]  Jie Dong,et al.  BitTableFI: An efficient mining frequent itemsets algorithm , 2007, Knowl. Based Syst..

[16]  Hannu Toivonen,et al.  Sampling Large Databases for Association Rules , 1996, VLDB.

[17]  Jian Pei,et al.  Mining frequent patterns by pattern-growth: methodology and implications , 2000, SKDD.

[18]  Jiawei Han,et al.  Discovery of Multiple-Level Association Rules from Large Databases , 1995, VLDB.

[19]  Bin Chen,et al.  An Algorithm for Constrained Association Rule Mining in Semi-structured Data , 1999, PAKDD.

[20]  Erhan Akin,et al.  An efficient genetic algorithm for automated mining of both positive and negative quantitative association rules , 2006, Soft Comput..

[21]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[22]  Ming-Syan Chen,et al.  Hardware-Enhanced Association Rule Mining with Hashing and Pipelining , 2008, IEEE Transactions on Knowledge and Data Engineering.

[23]  Hongjun Lu,et al.  H-mine: hyper-structure mining of frequent patterns in large databases , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[24]  M. Karnaugh The map method for synthesis of combinational logic circuits , 1953, Transactions of the American Institute of Electrical Engineers, Part I: Communication and Electronics.

[25]  R. J. Kuo,et al.  Association rule mining through the ant colony system for National Health Insurance Research Database in Taiwan , 2007, Comput. Math. Appl..

[26]  Yueh-Min Huang,et al.  NP-miner: A real-time recommendation algorithm by using web usage mining , 2006, Knowl. Based Syst..

[27]  Tzu-Chuen Lu,et al.  Mining association rules procedure to support on-line recommendation by customers and products fragmentation , 2001, Expert Syst. Appl..

[28]  Tsau Young Lin,et al.  A fast association rule algorithm based on bitmap and granular computing , 2003, The 12th IEEE International Conference on Fuzzy Systems, 2003. FUZZ '03..

[29]  Chi-Huey Wong,et al.  HIV‐1 Protease: Mechanism and Drug Discovery , 2003 .

[30]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[31]  Yubo Yuan,et al.  A Matrix Algorithm for Mining Association Rules , 2005, ICIC.

[32]  Yun Sing Koh,et al.  Mining Interesting Imperfectly Sporadic Rules , 2006, PAKDD.

[33]  Ying Mei,et al.  High Efficiency Association Rules Mining Algorithm for Bank Cost Analysis , 2008, 2008 International Symposium on Electronic Commerce and Security.

[34]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[35]  Yungho Leu,et al.  An effective Boolean algorithm for mining association rules in large databases , 1999, Proceedings. 6th International Conference on Advanced Systems for Advanced Applications.

[36]  Philip S. Yu,et al.  An effective hash-based algorithm for mining association rules , 1995, SIGMOD '95.

[37]  Ramakrishnan Srikant,et al.  Mining Association Rules with Item Constraints , 1997, KDD.

[38]  Rakesh Agrawal,et al.  Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..

[39]  Salvatore J. Stolfo,et al.  Adaptive Intrusion Detection: A Data Mining Approach , 2000, Artificial Intelligence Review.

[40]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[41]  Yueh-Min Huang,et al.  fficient ~ndu~t~ve earning Method for Object-Oriented Using Attribute Entropy , 1996 .