A Review on Hierarchical Clustering-Based Covariance Model to ncRNA Identification

Recent discoveries have revealed that functional discovery of noncoding RNAs (ncRNAs) has gradually acquired attention among researchers in bioinformatics domain. ncRNA families are believed to be responsible for a variety of biological functionalities, ranging from gene expression regulation to catalytic activities, when the others are still to be unveiled. These new recoveries have opened many aspects in ncRNA research, for example in functional subgroups discovery. Hence, cross fertilization solutions originated from computational intelligence concepts and algorithms has started to achieve promising results. For instance, data clustering is one of the popular techniques in many different domains for the purpose of Covariance Model (CM) in ncRNA identification. Hierarchical clustering is the most frequently used mathematical technique to group a set of ncRNAs in human into different families based on sequence similarity. However, conventional algorithms have some shortcomings such as the sequence structures of each family will be significantly diluted when the number of sequence features for known family dataset increases. This study presents a literature review on the hierarchical clustering algorithm and its variants for ncRNA family identification using the sequence structure.

[1]  Gillian Dobbie,et al.  Research on particle swarm optimization based clustering: A systematic review of literature and techniques , 2014, Swarm Evol. Comput..

[2]  Scott F. Smith Covariance Searches for ncRNA Gene Finding , 2006, 2006 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology.

[3]  Rolf Backofen,et al.  Inferring Noncoding RNA Families and Classes by Means of Genome-Scale Structure-Based Clustering , 2007, PLoS Comput. Biol..

[4]  Amit Konar,et al.  Swarm Intelligence Algorithms in Bioinformatics , 2008, Computational Intelligence in Bioinformatics.

[5]  Jin-Mao Wei,et al.  Clustering of ncRNA Based on Structural and Semantic Similarity , 2013 .

[6]  Jeff Augen Bioinformatics in the Post-Genomic Era: Genome, Transcriptome, Proteome, and Information-Based Medicine , 2004 .

[7]  Alan Mitchell Durham,et al.  Computational methods in noncoding RNA research , 2008, Journal of mathematical biology.

[8]  Wenbo Jiang,et al.  Combined covariance model for non-coding RNA gene finding , 2011, 2011 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB).

[9]  Jennifer A Smith RNA Search with Decision Trees and Partial Covariance Models , 2009, TCBB.

[10]  Huei-Hun Tseng,et al.  Finding Non-coding RNAs Through Genome-Scale Clustering , 2008, APBC.

[11]  Zasha Weinberg,et al.  CMfinder - a covariance model based RNA motif finding algorithm , 2006, Bioinform..

[12]  Roded Sharan,et al.  A sequence-based filtering method for ncRNA identification and its application to searching for riboswitch elements , 2006, ISMB.

[13]  E Westhof,et al.  Non-Watson-Crick base pairs in RNA-protein recognition. , 1999, Chemistry & biology.

[14]  S. Butcher,et al.  The molecular interactions that stabilize RNA tertiary structure: RNA motifs, patterns, and networks. , 2011, Accounts of chemical research.

[15]  Gillian Dobbie,et al.  Particle Swarm Optimization Based Hierarchical Agglomerative Clustering , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[16]  R. Durbin,et al.  RNA sequence analysis using covariance models. , 1994, Nucleic acids research.

[17]  Fionn Murtagh,et al.  Methods of Hierarchical Clustering , 2011, ArXiv.

[18]  Sean R. Eddy,et al.  A memory-efficient dynamic programming algorithm for optimal alignment of a sequence to an RNA secondary structure , 2002, BMC Bioinformatics.

[19]  R. Tibshirani,et al.  Complementary hierarchical clustering. , 2008, Biostatistics.

[20]  Sean R. Eddy,et al.  Rfam: an RNA family database , 2003, Nucleic Acids Res..

[21]  Xiaopeng Zhu,et al.  Experimental RNomics and genomic comparative analysis reveal a large group of species-specific small non-message RNAs in the silkworm Bombyx mori , 2011, Nucleic acids research.

[22]  Ching-Yi Chen,et al.  Particle swarm optimization algorithm and its application to clustering analysis , 2004, 2012 Proceedings of 17th Conference on Electrical Power Distribution.

[23]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[24]  Yutaka Saito,et al.  Fast and accurate clustering of noncoding RNAs using ensembles of sequence alignments and secondary structures , 2011, BMC Bioinformatics.

[25]  J. Holton,et al.  Covariance analysis of RNA recognition motifs identifies functionally linked amino acids. , 2001, Journal of molecular biology.