Mining fuzzy amino acid associations in peptide sequences of mycobacterium tuberculosis complex (MTBC)

The biological databases are flooded with genomic and proteomic data which can be analyzed to generate the information and knowledge which can be useful for understanding molecular mechanisms involved in disease and health state of a living being. The tuberculosis is an infectious disease and is pandemic, causing large number of deaths every year. In this paper an attempt has been made to develop a model for mining amino acid association patterns in peptide sequences of MTBC. The peptide sequences of species of MTBC are taken from the NCBI. The variation in the length of these sequences leads to variation in degree of relationship among amino acids present in each sequence. The fuzzy set is employed to model this uncertainty of degree of relationships among the amino acids of the peptide sequences of MTBC. The crisp and fuzzy amino acid association rules have been generated from the peptide sequences of MTBC and on comparison it is observed that fuzzy set approach is able to address the issue of under prediction and over prediction of amino acid association patterns due to uncertainty in degree of relationship among the amino acid. The amino acid association patterns have been used to predict secondary structure and physiochemical properties as an illustration. Thus the patterns generated can be useful in understanding the molecular mechanisms involved in MTBC by predicting physiochemical properties, structures and protein–protein interactions etc.

[1]  Jaideep Srivastava,et al.  Selecting the right interestingness measure for association patterns , 2002, KDD.

[2]  Kamal Raj Pardasani,et al.  Mining Fuzzy Amino Acid Association Patterns in Peptide Sequences of Alphaproteobacteria , 2013 .

[3]  Bülent Yener,et al.  TB-Lineage: an online tool for classification and analysis of strains of Mycobacterium tuberculosis complex. , 2012, Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases.

[4]  D. K. Swami,et al.  Lattice Based Algorithm for Incremental Mining of Association Rules , 2022 .

[5]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[6]  Maulana Azad,et al.  Rough Set Model for Discovering Multidimensional Association Rules , 2009 .

[7]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[8]  Kristin P. Bennett,et al.  Determination of Major Lineages of Mycobacterium tuberculosis Complex Using Mycobacterial Interspersed Repetitive Units , 2009, 2009 IEEE International Conference on Bioinformatics and Biomedicine.

[9]  Robert LIN,et al.  NOTE ON FUZZY SETS , 2014 .

[10]  Edward Omiecinski,et al.  Alternative Interest Measures for Mining Associations in Databases , 2003, IEEE Trans. Knowl. Data Eng..

[11]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[12]  Neelu Khare,et al.  An Algorithm for Mining Multidimensional Association Rules Using Boolean Matrix , 2010, 2010 International Conference on Recent Trends in Information, Telecommunication and Computing.

[13]  S. Cole,et al.  Comparative and functional genomics of the Mycobacterium tuberculosis complex. , 2002, Microbiology.

[14]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[15]  Walid G. Aref Mining Association Rules in Large Databases , 2004 .

[16]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[17]  Anjana Pandey,et al.  PPCI Algorithm for Mining Temporal Association Rules in Large Databases , 2009, J. Inf. Knowl. Manag..

[18]  S. N. Sivanandam,et al.  Fast algorithm for mining multilevel association rules , 2003, TENCON 2003. Conference on Convergent Technologies for Asia-Pacific Region.

[19]  Mikhail S. Gelfand,et al.  Mining sequence annotation databanks for association patterns , 2005, Bioinform..

[20]  Francisco-Javier Lopez,et al.  Fuzzy association rules for biological data analysis: A case study on yeast , 2008, BMC Bioinformatics.

[21]  Nitin Gupta,et al.  Mining Quantitative Association Rules in Protein Sequences , 2006, Selected Papers from AusDM.

[22]  C. Buchrieser,et al.  A new evolutionary scenario for the Mycobacterium tuberculosis complex , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Xindong Wu,et al.  PMBC: Pattern mining from biological sequences with wildcard constraints , 2013, Comput. Biol. Medicine.