Feature extraction approaches for biological sequences: a comparative study of mathematical features
暂无分享,去创建一个
André C P L F de Carvalho | Fabricio M. Lopes | Robson P Bonidia | Lucas D H Sampaio | Douglas S Domingues | Alexandre R Paschoal | Fabrício M Lopes | Danilo S Sanches | A. D. de Carvalho | A. R. Paschoal | D. Sanches | D. Domingues | R. Bonidia | L. D. H. Sampaio
[1] Petar Glažar,et al. circBase: a database for circular RNAs , 2014, RNA.
[2] Ge Gao,et al. CPC2: a fast and accurate coding potential calculator based on sequence intrinsic features , 2017, Nucleic Acids Res..
[3] Lennart Martens,et al. LNCipedia: a database for annotated human lncRNA transcript sequences and structures , 2012, Nucleic Acids Res..
[4] Qian-Hao Zhu,et al. PlantcircBase: A Database for Plant Circular RNAs. , 2017, Molecular plant.
[5] Ole Winther,et al. An introduction to deep learning on biological sequence data: examples and solutions , 2017, Bioinform..
[6] R Zhang,et al. Z curves, an intutive tool for visualizing and analyzing the DNA sequences. , 1994, Journal of biomolecular structure & dynamics.
[7] Jacob Cohen. A Coefficient of Agreement for Nominal Scales , 1960 .
[8] Changchuan Yin,et al. Prediction of protein coding regions by the 3-base periodicity analysis of a DNA sequence. , 2007, Journal of theoretical biology.
[9] Hsiao-Lin V. Wang,et al. Long Noncoding RNAs in Plants. , 2017, Advances in experimental medicine and biology.
[10] Annalisa Marsico,et al. pysster: classification of biological sequences by learning sequence and structure motifs with convolutional neural networks , 2018, Bioinform..
[11] Caitlin M. A. Simopoulos,et al. Prediction of plant lncRNA by ensemble machine learning classifiers , 2018, BMC Genomics.
[12] Maozu Guo,et al. Perspectives of Bioinformatics in Big Data Era , 2019, Current genomics.
[13] Quan Du,et al. Analysis of LncRNA expression in cell differentiation , 2018, RNA biology.
[14] Changchuan Yin,et al. A Fourier Characteristic of Coding Sequences: Origins and a Non-Fourier Approximation , 2005, J. Comput. Biol..
[15] Cheng Wu,et al. The characteristic landscape of lncRNAs classified by RBP-lncRNA interactions across 10 cancers. , 2017, Molecular bioSystems.
[16] Abdollah Dehzangi,et al. PyFeat: a Python-based effective feature generation tool for DNA, RNA and protein sequences , 2019, Bioinform..
[17] P H Watson,et al. The steroid receptor RNA activator is the first functional RNA encoding a protein , 2004, FEBS letters.
[18] Claes Wahlestedt,et al. Involvement of long noncoding RNAs in diseases affecting the central nervous system , 2012, RNA biology.
[19] R. Fernando,et al. Efficient strategies for leave-one-out cross validation for genomic best linear unbiased prediction , 2017, Journal of Animal Science and Biotechnology.
[20] Priscila Tiemi Maeda Saito,et al. Pattern recognition analysis on long noncoding RNAs: a tool for prediction in plants , 2019, Briefings Bioinform..
[21] Xiaoyong Pan,et al. PredcircRNA: computational classification of circular RNA from other long non-coding RNA using hybrid features. , 2015, Molecular bioSystems.
[22] Zeping Han,et al. Bioinformatic analysis and prediction of the function and regulatory network of long non-coding RNAs in hepatocellular carcinoma , 2018, Oncology letters.
[23] Syed Mansoor Raza,et al. A Review of Computational Methods for Finding Non-Coding RNA Genes , 2016, Genes.
[24] Taghi M. Khoshgoftaar,et al. CatBoost for big data: an interdisciplinary review , 2020, J. Big Data.
[25] Martin Sill,et al. Machine learning workflows to estimate class probabilities for precision cancer diagnostics on DNA methylation microarray data , 2020, Nature Protocols.
[26] Mohamed Chaabane,et al. circDeep: deep learning approach for circular RNA classification from other long non-coding RNA , 2019, Bioinform..
[27] A. Nair,et al. A coding measure scheme employing electron-ion interaction pseudopotential (EIIP) , 2006, Bioinformation.
[28] E. Li,et al. CNIT: a fast and accurate web tool for identifying protein-coding and long non-coding transcripts based on intrinsic sequence composition , 2019, Nucleic Acids Res..
[29] E. Jacobsen,et al. The sliding DFT , 2003, IEEE Signal Process. Mag..
[30] Aimin Li,et al. PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme , 2014, BMC Bioinformatics.
[31] S. Eddy. Non–coding RNA genes and the modern RNA world , 2001, Nature Reviews Genetics.
[32] André Carlos Ponce de Leon Ferreira de Carvalho,et al. Selecting the Most Relevant Features for the Identification of Long Non-Coding RNAs in Plants , 2019, BRACIS.
[33] Yong Zhang,et al. CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine , 2007, Nucleic Acids Res..
[34] Luciano da Fontoura Costa,et al. Complex networks: The key to systems biology , 2008 .
[35] Lisa E. Gralinski,et al. Unique Signatures of Long Noncoding RNA Expression in Response to Virus Infection and Altered Innate Immune Signaling , 2010, mBio.
[36] L. Qu,et al. Genome-wide screening and functional analysis identify a large number of long noncoding RNAs involved in the sexual reproduction of rice , 2014, Genome Biology.
[37] C.M. Rader. The fast Fourier transform , 1975, Proceedings of the IEEE.
[38] Qingyu Liu,et al. Identifying Circular RNA and Predicting Its Regulatory Interactions by Machine Learning , 2020, Frontiers in Genetics.
[39] P. Stadler,et al. RNA Maps Reveal New RNA Classes and a Possible Function for Pervasive Transcription , 2007, Science.
[40] Howard Y. Chang,et al. Unique features of long non-coding RNA biogenesis and function , 2015, Nature Reviews Genetics.
[41] Gonzalo Martínez-Muñoz,et al. A comparative analysis of gradient boosting algorithms , 2020, Artificial Intelligence Review.
[42] Melissa J. Fullwood,et al. Roles, Functions, and Mechanisms of Long Non-coding RNAs in Cancer , 2016, Genom. Proteom. Bioinform..
[43] Yuan Zhang,et al. LncRNA-ID: Long non-coding RNA IDentification using balanced random forests , 2015, Bioinform..
[44] Ruifeng Hu,et al. lncRNATargets: A platform for lncRNA target prediction based on nucleic acid thermodynamics , 2016, J. Bioinform. Comput. Biol..
[45] Silvio C. E. Tosatto,et al. REPETITA: detection and discrimination of the periodicity of protein solenoid repeats by discrete Fourier transform , 2009, Bioinform..
[46] Fabricio M. Lopes,et al. BASiNET—BiologicAl Sequences NETwork: a case study on coding and non-coding RNAs identification , 2018, Nucleic acids research.
[47] Wenjun Liu,et al. Puzzle of highly pathogenic human coronaviruses (2019-nCoV) , 2020, Protein & Cell.
[48] Dongdong Sun,et al. A text feature-based approach for literature mining of lncRNA-protein interactions , 2016, Neurocomputing.
[49] Esra Zihni,et al. Opening the black box of artificial intelligence for clinical decision support: A study predicting stroke outcome , 2020, PloS one.
[50] Alexander Schliep,et al. Comparative study on normalization procedures for cluster analysis of gene expression datasets , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).
[51] Homayoun Nikookar. Peak-to-average power ratio , 2013 .
[52] David M. Goodstein,et al. Phytozome: a comparative platform for green plant genomics , 2011, Nucleic Acids Res..
[53] Y. Mo,et al. Emerging roles of lncRNAs in the post-transcriptional regulation in cancer , 2019, Genes & diseases.
[54] Jehoshua Bruck,et al. Evolution of $k$ -Mer Frequencies and Entropy in Duplication and Substitution Mutation Systems , 2018, IEEE Transactions on Information Theory.
[55] Roberto Marcondes Cesar Junior,et al. Inference of gene regulatory networks from time series by Tsallis entropy , 2011, BMC Systems Biology.
[56] Wen Zhang,et al. The linear neighborhood propagation method for predicting long non-coding RNA-protein interactions , 2018, Neurocomputing.
[57] Andreu Paytuví Gallart,et al. GREENC: a Wiki-based database of plant lncRNAs , 2015, Nucleic Acids Res..
[58] Xiao Fan Wang,et al. Complex Networks: Topology, Dynamics and Synchronization , 2002, Int. J. Bifurc. Chaos.
[59] Bronwen L. Aken,et al. GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.
[60] Jeannie T. Lee,et al. Long Noncoding RNAs: Past, Present, and Future , 2013, Genetics.
[61] Vladimir B. Bajic,et al. Characterization and identification of long non-coding RNAs based on feature relationship , 2019, Bioinform..
[62] Lei Wang,et al. A Novel Method for LncRNA-Disease Association Prediction Based on an lncRNA-Disease Association Network , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.
[63] Xiaoyong Pan,et al. Discriminating cirRNAs from other lncRNAs using a hierarchical extreme learning machine (H-ELM) algorithm with feature selection , 2017, Zeitschrift für Induktive Abstammungs- und Vererbungslehre.
[64] Silvia Angeletti,et al. The 2019‐new coronavirus epidemic: Evidence for virus evolution , 2020, Journal of medical virology.
[65] C T Zhang. A symmetrical theory of DNA sequences and its applications. , 1997, Journal of theoretical biology.
[66] Shaowu Zhang,et al. lncRNA-MFDL: identification of human long non-coding RNAs by fusing multiple features and using deep learning. , 2015, Molecular bioSystems.
[67] Trevor Hastie,et al. Multi-class AdaBoost ∗ , 2009 .
[68] Yuwei Zhang,et al. Long noncoding RNA: a crosslink in biological regulatory network , 2018, Briefings Bioinform..
[69] Jian Zhang,et al. PlantNATsDB: a comprehensive database of plant natural antisense transcripts , 2011, Nucleic Acids Res..
[70] J. Kocher,et al. CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model , 2013, Nucleic acids research.
[71] Yi Zhao,et al. Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts , 2013, Nucleic acids research.
[72] Bin Liu,et al. BioSeq-Analysis: a platform for DNA, RNA and protein sequence analysis based on machine learning approaches , 2019, Briefings Bioinform..
[73] Alan M. Moses,et al. Entropy and Information within Intrinsically Disordered Protein Regions , 2019, Entropy.
[74] Leonidas D. Iasemidis,et al. Autoregressive Modeling and Feature Analysis of DNA Sequences , 2004, EURASIP J. Adv. Signal Process..
[75] Abdiel Ramírez-Reyes,et al. Determining the Entropic Index q of Tsallis Entropy in Images through Redundancy , 2016, Entropy.
[76] Zhihua Li,et al. Survey on encoding schemes for genomic data representation and feature learning - from signal processing to machine learning , 2018, Big Data Min. Anal..
[77] Hamid Rastegari,et al. Intelligent mining of large-scale bio-data: Bioinformatics applications , 2018 .
[78] Urminder Singh,et al. PLncPRO for prediction of long non-coding RNAs (lncRNAs) in plants and its application for discovery of abiotic stress-responsive lncRNAs in rice and chickpea , 2017, Nucleic acids research.
[79] S. Brommonschenkel,et al. Machine learning approaches and their current application in plant molecular biology: A systematic review. , 2019, Plant science : an international journal of experimental plant biology.
[80] Alexander Y. Liu. The Effect of Oversampling and Undersampling on Classifying Imbalanced Text Datasets , 2004 .
[81] U. Ohler,et al. Towards a deeper annotation of human lncRNAs. , 2020, Biochimica et biophysica acta. Gene regulatory mechanisms.
[82] Mohammed Abo-Zahhad,et al. Genomic Analysis and Classification of Exon and Intron Sequences Using DNA Numerical Mapping Techniques , 2012 .
[83] P D Cristea. Conversion of nucleotides sequences into genomic signals , 2002, Journal of cellular and molecular medicine.
[84] Yan Li,et al. circRNADb: A comprehensive database for human circular RNAs with protein-coding annotations , 2016, Scientific Reports.
[85] Sabeur Aridhi,et al. Feature extraction in protein sequences classification: a new stability measure , 2012, BCB.
[86] Chee Keong Kwoh,et al. DeepCPP: a deep neural network based on nucleotide bias information and minimum distribution similarity feature selection for RNA coding potential prediction , 2020, Briefings Bioinform..
[87] Matthew England,et al. PLIT: An alignment-free computational tool for identification of long non-coding RNAs in plant transcriptomic datasets , 2019, Comput. Biol. Medicine.
[88] Georgina Stegmayer,et al. Complexity measures of the mature miRNA for improving pre-miRNAs prediction , 2019, Bioinform..
[89] David G. Knowles,et al. The GENCODE v7 catalog of human long noncoding RNAs: Analysis of their gene structure, evolution, and expression , 2012, Genome research.
[90] Yan Guo,et al. Characterization of stress-responsive lncRNAs in Arabidopsis thaliana by integrating expression, epigenetic and structural features. , 2014, The Plant journal : for cell and molecular biology.
[91] Fabrício Martins Lopes,et al. Classification of texture based on Bag-of-Visual-Words through complex networks , 2019, Expert Syst. Appl..
[92] Cong Pian,et al. LncRNApred: Classification of Long Non-Coding RNAs and Protein-Coding Transcripts by the Ensemble Algorithm with a New Hybrid Feature , 2016, PloS one.
[93] Susana Vinga,et al. Information theory applications for biological sequence analysis , 2013, Briefings Bioinform..
[94] Márcio Portes de Albuquerque,et al. Image thresholding using Tsallis entropy , 2004, Pattern Recognit. Lett..
[95] G. Stein,et al. Non-coding RNAs: Epigenetic regulators of bone development and homeostasis. , 2015, Bone.
[96] Xi Chen,et al. Computational identification of human long intergenic non-coding RNAs using a GA-SVM algorithm. , 2014, Gene.
[97] Yanchun Liang,et al. LncFinder: an integrated platform for long non-coding RNA identification utilizing sequence intrinsic composition, structural information and physicochemical property , 2018, Briefings Bioinform..
[98] Jia Meng,et al. lncRScan-SVM: A Tool for Predicting Long Non-Coding RNAs Using Support Vector Machine , 2015, PloS one.
[99] Leo Breiman,et al. Random Forests , 2001, Machine Learning.
[100] Jianfeng Shao,et al. SNR of DNA sequences mapped by general affine transformations of the indicator sequences , 2013, Journal of mathematical biology.
[101] Xiangfeng Wang,et al. Machine learning for Big Data analytics in plants. , 2014, Trends in plant science.
[102] Byunghan Lee,et al. LncRNAnet: long non‐coding RNA identification using deep learning , 2018, Bioinform..
[103] V. Bajic,et al. On the classification of long non-coding RNAs , 2013, RNA biology.
[104] R. Voss,et al. Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. , 1992, Physical review letters.
[105] Sanjiv Kumar,et al. A Survey of Modern Questions and Challenges in Feature Extraction , 2015, FE@NIPS.
[106] Annick Lesne,et al. Shannon entropy: a rigorous notion at the crossroads between probability, information theory, dynamical systems and statistical physics , 2014, Mathematical Structures in Computer Science.
[107] Feng Liu,et al. PredLnc-GFStack: A Global Sequence Feature Based on a Stacked Ensemble Learning Method for Predicting lncRNAs from Transcripts , 2019, Genes.
[108] Byunghan Lee,et al. Deep learning in bioinformatics , 2016, Briefings Bioinform..
[109] D. Adelson,et al. Transposable elements (TEs) contribute to stress‐related long intergenic noncoding RNAs in plants , 2017, The Plant journal : for cell and molecular biology.
[110] Kesari Verma,et al. Investigations on Impact of Feature Normalization Techniques on Classifier's Performance in Breast Tumor Classification , 2015 .
[111] Pritish Kumar Varadwaj,et al. DeepLNC, a long non-coding RNA prediction tool using deep neural network , 2016, Network Modeling Analysis in Health Informatics and Bioinformatics.
[112] Cheng Huang,et al. Long noncoding RNAs: Novel insights into hepatocelluar carcinoma. , 2014, Cancer letters.
[113] Clícia Grativol,et al. PlantRNA_Sniffer: A SVM-Based Workflow to Predict Long Intergenic Non-Coding RNAs in Plants , 2017, Non-coding RNA.
[114] Dimitris Anastassiou,et al. Genomic signal processing , 2001, IEEE Signal Process. Mag..
[115] Geoffrey I. Webb,et al. iLearn : an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data , 2019, Briefings Bioinform..
[116] Anna Veronika Dorogush,et al. CatBoost: unbiased boosting with categorical features , 2017, NeurIPS.
[117] Thomas L. Madden,et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.
[118] Ying Chen,et al. A measure of DNA sequence similarity by Fourier Transform with applications on hierarchical clustering. , 2014, Journal of theoretical biology.
[119] Milton Pividori,et al. Predicting novel microRNA: a comprehensive comparison of machine learning approaches , 2019, Briefings Bioinform..
[120] Gerardo Mendizabal-Ruiz,et al. On DNA numerical representations for genomic similarity computation , 2017, PloS one.
[121] G. Helt,et al. Transcriptional Maps of 10 Human Chromosomes at 5-Nucleotide Resolution , 2005, Science.
[122] J. A. Tenreiro Machado,et al. Shannon, Rényie and Tsallis entropy analysis of DNA using phase plane , 2011 .