Motif Discovery and Data Mining in Bioinformatics

Bioinformatics analyses huge amounts of biological data that demands in-depth understanding. On the other hand, data mining research develops methods for discovering motifs in biosequences. Motif discovery involves benefits and challenges. We show bridge of the two fields, data mining and Bioinformatics, for successful mining of biological data. We found the motivation and justification factors lead to preferring naturalistic method research for Bioinformatics, because naturalistic method depends on real data. The method empowers Bioinformatics techniques to handle the true properties and reducing assumptions for un-modeled or uncover biodata phenomena. The empowerment comes from recognizing and understanding biodata properties and processes.

[1]  Baw-Jhiune Liu,et al.  WildSpan: mining structured motifs from protein sequences , 2011, Algorithms for Molecular Biology.

[2]  J. Weng,et al.  Functional analysis and comparative genomics of expressed sequence tags from the lycophyte Selaginella moellendorffii , 2005, BMC Genomics.

[3]  Medha Pradhan,et al.  Motif Discovery in Biological Sequences , 2008 .

[4]  Sven Rahmann,et al.  Efficient exact motif discovery , 2009, Bioinform..

[5]  R. Sowdhamini,et al.  STIF: Identification of stress-upregulated transcription factor binding sites in Arabidopsis thaliana , 2008, Bioinformation.

[6]  Cinzia Pizzi Motif Discovery with Compact Approaches - Design and Applications , 2011 .

[7]  Timothy R. Hughes,et al.  YeTFaSCo: a database of evaluated yeast transcription factor sequence specificities , 2011, Nucleic Acids Res..

[8]  W. Wong,et al.  Computational Biology: Toward Deciphering Gene Regulatory Information in Mammalian Genomes , 2006, Biometrics.

[9]  Marie-France Sagot,et al.  An efficient algorithm for the identification of structured motifs in DNA promoter sequences , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[10]  Tetsushi Yada,et al.  Large-scale motif discovery using DNA Gray code and equiprobable oligomers , 2011, Bioinform..

[11]  Nooruldeen Nasih Qader,et al.  Motivation and Justification of Naturalistic Method for Bioinformatics Research , 2014 .

[12]  Arlindo L. Oliveira,et al.  GRISOTTO: A greedy approach to improve combinatorial algorithms for motif discovery with prior knowledge , 2010, Algorithms for Molecular Biology.

[13]  Kenta Nakai,et al.  DBTSS: DataBase of Transcriptional Start Sites progress report in 2012 , 2011, Nucleic Acids Res..

[14]  Vladimir Pavlovic,et al.  Efficient motif finding algorithms for large-alphabet inputs , 2010, BMC Bioinformatics.

[15]  Dianhui Wang,et al.  GAPK: Genetic algorithms with prior knowledge for motif discovery in DNA sequences , 2009, 2009 IEEE Congress on Evolutionary Computation.

[16]  E. Lander Initial impact of the sequencing of the human genome , 2011, Nature.

[17]  Hao-Geng Hung,et al.  Discovering gapped binding sites of yeast transcription factors , 2008, Proceedings of the National Academy of Sciences.

[18]  Jeffrey Xu Yu,et al.  Scalable sequential pattern mining for biological sequences , 2004, CIKM '04.

[19]  Detlef Weigel,et al.  The Scale of Population Structure in Arabidopsis thaliana , 2010, PLoS genetics.

[20]  Alberto Apostolico,et al.  Efficient algorithms for the discovery of gapped factors , 2011, Algorithms for Molecular Biology.

[21]  Ioannis P. Androulakis,et al.  Recent Advances in the Computational Discovery of Transcription Factor Binding Sites , 2009, Algorithms.

[22]  Finn Drabløs,et al.  Assessment of composite motif discovery methods , 2008, BMC Bioinformatics.

[23]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[24]  Lorenz Wernisch,et al.  Variable structure motifs for transcription factor binding sites , 2010, BMC Genomics.

[25]  Tao Song,et al.  An Improved Immune Genetic Algorithm for Weak Signal Motif Detecting Problems , 2012 .

[26]  Nicola Vitacolonna,et al.  Structured motifs search. , 2005, Journal of computational biology : a journal of computational molecular cell biology.

[27]  Yongqiang Zhang,et al.  SMOTIF: efficient structured pattern and profile motif search , 2006, Algorithms for Molecular Biology.

[28]  Siu-Ming Yiu,et al.  Optimal Algorithm for Finding DNA Motifs with Nucleotide Adjacent Dependency , 2008, APBC.

[29]  Martin C. Frith,et al.  Discovering Sequence Motifs with Arbitrary Insertions and Deletions , 2008, PLoS Comput. Biol..

[30]  Giorgio Terracina,et al.  Mining Loosely Structured Motifs from Biological Data , 2008, IEEE Transactions on Knowledge and Data Engineering.

[31]  Motif mining: an assessment and perspective for amyloid fibril prediction tool , 2012, Bioinformation.

[32]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[33]  Sun-Yuan Hsieh,et al.  An Improved Heuristic Algorithm for Finding Motif Signals in DNA Sequences , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[34]  William Stafford Noble,et al.  FIMO: scanning for occurrences of a given motif , 2011, Bioinform..

[35]  Hannu Toivonen,et al.  Data Mining In Bioinformatics , 2005 .

[36]  J. Patel,et al.  Declarative Querying for Biological Sequences , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[37]  Jun S. Liu,et al.  The EM Algorithm and the Rise of Computational Biology , 2010, 1104.2180.

[38]  Pradeep Kumar,et al.  Pattern Discovery Using Sequence Data Mining: Applications and Studies , 2011 .

[39]  William Stafford Noble,et al.  Assessing computational tools for the discovery of transcription factor binding sites , 2005, Nature Biotechnology.

[40]  Henry C. M. Leung,et al.  DNA Motif Representation with Nucleotide Dependency , 2008, TCBB.

[41]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[42]  J. Collado-Vides,et al.  Theoretical and empirical quality assessment of transcription factor-binding motifs , 2010, Nucleic acids research.

[43]  Sanguthevar Rajasekaran,et al.  Algorithms for Motif Search , 2005 .