Protein Sequence Classification In Data Mining– A Study

Since the computerized applications are used all around the world, there occurs the collection of a vast amount of data. The important information hidden in vast data is attracting the researchers of multiple disciplines to make study in developing effective approaches to derive the hidden knowledge within them. Data mining may be considered to be the process of extracting or mining the useful and valuable knowledge from large amounts of data. There are various different domains in data mining such as text mining, image mining, sequential pattern mining, web mining and etc. Among these, sequence mining is one of the most important research area which helps to finding the sequential relationships found in the data. Sequence mining is applied in wide range of application areas such as the analysis of customer purchase patterns, web access patterns, weather observations, protein sequencing, DNA sequencing, etc. In protein and DNA analysis, sequence mining techniques are used for sequence alignment, sequence searching and sequence classification. In the area of protein sequence analysis, the researchers are showing their interest in the field of protein sequence classification. It has the ability to discover the recurring structures that exist in the protein sequences. This paper explains various techniques used by different researchers in classifying the proteins and also provides an overview of different protein sequence classification methods.

[1]  Dennis Shasha,et al.  Application of neural networks to biological data mining: a case study in protein sequence classification , 2000, KDD '00.

[2]  Cornelia Caragea,et al.  Protein Sequence Classification Using Feature Hashing , 2011, BIBM.

[3]  Abdullah Al Mamun,et al.  A more appropriate Protein Classification using Data Mining , 2010, ArXiv.

[4]  Xing-Ming Zhao,et al.  A Novel Hybrid GA/SVM System for Protein Sequences Classification , 2004, IDEAL.

[5]  P. Vaishali,et al.  Application of Data mining and Soft Computing in Bioinformatics , 2011 .

[6]  B. Rost,et al.  Disease-related mutations predicted to impact protein function , 2012, BMC Genomics.

[7]  Engelbert Mephu Nguifo,et al.  Protein sequences classification by means of feature extraction with substitution matrices , 2010, BMC Bioinformatics.

[8]  Dennis Shasha,et al.  Introduction to Data Mining in Bioinformatics , 2005, Data Mining in Bioinformatics.

[9]  Khalid Raza,et al.  Application Of Data Mining In Bioinformatics , 2012, ArXiv.

[10]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[11]  Rituparna Chaki,et al.  A Brief Review of Data Mining Application Involving Protein Sequence Classification , 2012, ACITY.

[12]  Jiawei Han How Can Data Mining Help Bio-Data Analysis? , 2002, BIOKDD.

[13]  Mehul Barot,et al.  Mining Sequential Pattern with Time-Constraint , 2013 .

[14]  RAMADEVI YELLASIRI,et al.  ROUGH SET PROTEIN CLASSIFIER , 2009 .

[15]  Paulo J. Azevedo,et al.  Protein Sequence Classification Through Relevant Sequence Mining and Bayes Classifiers , 2005, EPIA.