Applications of Machine Learning in Genomics and Systems Biology

As the accomplishment of the human genome project, techniques that can analyze large amounts of data are urgently needed. Advances in computational techniques for analyzing high-throughput data in genomics, proteomics, and visualization have been extensively studied and have played vital roles in understanding biological mechanisms. Machine learning and related techniques such as support vector machines, Markov models, decision trees, and neural networks have been increasingly used to solve problems in genomics and systems biology. Machine learning was defined as a “computer program that can learn from experience with respect to some class of tasks and performance measure” [1]. If we can design machine learning algorithms to learn from past experience and thus improve the performance automatically, we can solve complicated problems such as those in genomics and systems biology. In this special issue, we have explored the topics of identifying biomarkers, transcription factor binding, novel type III effectors, predicting breeding values for dairy cattle, and gene selection and tumor classification. The papers in this volume have studied the previously researched domains and also researched the new approaches for bioinformatics problems. The papers reflect the urgency of using machine learning techniques to develop more efficient and accurate algorithms for biological problems. We hope that the papers in the volume can broaden the view of the current machine learning approaches in genomics systems biology and inspire ideas of designing new approaches for existing biological problems. Chunmei Liu Dongsheng Che Xumin Liu Yinglei Song

[1]  Alan Collmer,et al.  Pseudomonas syringae Type III Secretion System Targeting Signals and Novel Effectors Studied with a Cya Translocation Reporter , 2004, Journal of bacteriology.

[2]  Michael I. Jordan,et al.  Variational inference for Dirichlet process mixtures , 2006 .

[3]  Yoshiharu Sato,et al.  Meta-analytic approach to the accurate prediction of secreted virulence effectors in gram-negative bacteria , 2011, BMC Bioinformatics.

[4]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[5]  Isao Hayashi,et al.  NN-driven fuzzy reasoning , 1991, Int. J. Approx. Reason..

[6]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[7]  R. Lacroix,et al.  Induction and evaluation of decision trees for lactation curve analysis , 2003 .

[8]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[9]  M. M. Garner,et al.  A gel electrophoresis method for quantifying the binding of proteins to specific DNA regions: application to components of the Escherichia coli lactose operon regulatory system , 1981, Nucleic Acids Res..

[10]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[11]  Manuel J. Maña López,et al.  Research and applications: Improving image retrieval effectiveness via query expansion using MeSH hierarchical structure , 2013, J. Am. Medical Informatics Assoc..

[12]  R. Lacroix,et al.  EFFECTS OF LEARNING PARAMETERS AND DATA PRESENTATION ON THE PERFORMANCE OF BACKPROPAGATION NETWORKS FOR MILK YIELD PREDICTION , 1998 .

[13]  Victor Maojo,et al.  A knowledge engineering approach to recognizing and extracting sequences of nucleic acids from scientific literature , 2010, 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology.

[14]  C W Heald,et al.  A computerized mastitis decision aid using farm-based records: an artificial neural network approach. , 2000, Journal of dairy science.

[15]  J. Galán,et al.  Type III Secretion Machines: Bacterial Devices for Protein Delivery into Host Cells , 1999 .

[16]  R. K. Sharma,et al.  Prediction of first lactation 305-day milk yield in Karan Fries dairy cattle using ANN modeling , 2007, Appl. Soft Comput..

[17]  Brian J Staskawicz,et al.  Direct biochemical evidence for type III secretion-dependent translocation of the AvrBs2 effector protein into plant cells , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Davar Giveki,et al.  Automatic detection of erythemato-squamous diseases using PSO-SVM based on association rules , 2013, Eng. Appl. Artif. Intell..

[19]  Henry H. N. Lam,et al.  Data analysis and bioinformatics tools for tandem mass spectrometry in proteomics. , 2008, Physiological genomics.

[20]  D. Galas,et al.  DNAse footprinting: a simple method for the detection of protein-DNA binding specificity. , 1978, Nucleic acids research.

[21]  Kenneth Levenberg A METHOD FOR THE SOLUTION OF CERTAIN NON – LINEAR PROBLEMS IN LEAST SQUARES , 1944 .

[22]  Wei Kong,et al.  Hybrid particle swarm optimization and tabu search approach for selecting genes for tumor classification using gene expression data , 2008, Comput. Biol. Chem..

[23]  Chong Wang,et al.  Simultaneous image classification and annotation , 2009, CVPR.

[24]  Carlo Tomasi,et al.  Exploratory Dijkstra forest based automatic vessel segmentation: applications in video indirect ophthalmoscopy (VIO) , 2012, Biomedical optics express.

[25]  Tinghua Wang Improving SVM Classification by Feature Weight Learning , 2010, 2010 International Conference on Intelligent Computation Technology and Automation.

[26]  S. Lisa,et al.  Use of 2D Barcode to Access Multimedia Content and the Web from a Mobile Handset , 2008, IEEE GLOBECOM 2008 - 2008 IEEE Global Telecommunications Conference.

[27]  Tommy W. S. Chow,et al.  Textual and Visual Content-Based Anti-Phishing: A Bayesian Approach , 2011, IEEE Transactions on Neural Networks.

[28]  G. Stormo Computer methods for analyzing sequence recognition of nucleic acids. , 1988, Annual Review of Biophysics and Biophysical Chemistry.

[29]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Qing Zhang,et al.  High-accuracy prediction of bacterial type III secreted effectors based on position-specific amino acid composition profiles , 2011, Bioinform..

[31]  Jerry Zeyu Gao,et al.  Understanding 2D-BarCode Technology and Applications in M-Commerce - Design and Implementation of A 2D Barcode Processing Solution , 2007, 31st Annual International Computer Software and Applications Conference (COMPSAC 2007).

[32]  Yang Yang A comparative study on sequence feature extraction for type III secreted effector prediction , 2011, 2011 Eighth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD).

[33]  J. F. Hayes,et al.  Prediction of Cow Performance with a Connectionist Model , 1995 .

[34]  Monica Vencato,et al.  Whole-genome expression profiling defines the HrpL regulon of Pseudomonas syringae pv. tomato DC3000, allows de novo reconstruction of the Hrp cis clement, and identifies novel coregulated genes. , 2006, Molecular plant-microbe interactions : MPMI.

[35]  O. Nelles Nonlinear System Identification , 2001 .

[36]  Yong Shi,et al.  Laplacian twin support vector machine for semi-supervised classification , 2012, Neural Networks.

[37]  Oliver Nelles,et al.  Nonlinear system identification with local linear neuro-fuzzy models , 1999 .

[38]  Max Costa,et al.  Histone modifications and cancer: biomarkers of prognosis? , 2012, American journal of cancer research.

[39]  Li-Yeh Chuang,et al.  Improved binary PSO for feature selection using gene expression data , 2008, Comput. Biol. Chem..

[40]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[41]  Nicola J. Rinaldi,et al.  Transcriptional regulatory code of a eukaryotic genome , 2004, Nature.

[42]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[43]  Martin T. Hagan,et al.  Neural network design , 1995 .

[44]  Robert J. McQueen,et al.  Applying machine learning to agricultural data , 1995 .

[45]  Jun S. Liu,et al.  Bayesian Models for Multiple Local Sequence Alignment and Gibbs Sampling Strategies , 1995 .

[46]  Tim J. P. Hubbard,et al.  Data growth and its impact on the SCOP database: new developments , 2007, Nucleic Acids Res..

[47]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[48]  G. Church,et al.  Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation , 1998, Nature Biotechnology.

[49]  R. Lacroix,et al.  FUZZY SET-BASED ANALYTICAL TOOLS FOR DAIRY HERD IMPROVEMENT , 1998 .

[50]  Bo Jin,et al.  Support vector machines with evolutionary feature weights optimization for biomedical data classification , 2005, NAFIPS 2005 - 2005 Annual Meeting of the North American Fuzzy Information Processing Society.

[51]  Shutao Li,et al.  Gene selection using hybrid particle swarm optimization and genetic algorithm , 2008, Soft Comput..

[52]  Thomas Rattei,et al.  Sequence-Based Prediction of Type III Secreted Proteins , 2009, PLoS pathogens.

[53]  Tan-Hsu Tan,et al.  2D Barcode and Augmented Reality Supported English Learning System , 2007, 6th IEEE/ACIS International Conference on Computer and Information Science (ICIS 2007).

[54]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[55]  M. Hestenes,et al.  Methods of conjugate gradients for solving linear systems , 1952 .

[56]  David S Guttman,et al.  A functional screen for the type III (Hrp) secretome of the plant pathogen Pseudomonas syringae. , 2002, Science.

[57]  R. Lacroix,et al.  Performance analysis of a fuzzy decision­ support system for culling of dairy cows , 1998 .

[58]  Sheng Yang He,et al.  Type III protein secretion mechanism in mammalian and plant pathogens. , 2004, Biochimica et biophysica acta.

[59]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[60]  Samuel H. Payne,et al.  Accurate annotation of peptide modifications through unrestrictive database search. , 2008, Journal of proteome research.

[61]  Alan F. Scott,et al.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders , 2004, Nucleic Acids Res..

[62]  M. Nielen,et al.  Comparison of analysis techniques for on-line detection of clinical mastitis. , 1995, Journal of dairy science.

[63]  Francesca Chiaromonte,et al.  Scoring Pairwise Genomic Sequence Alignments , 2001, Pacific Symposium on Biocomputing.

[64]  Alan Collmer,et al.  Genomewide identification of proteins secreted by the Hrp type III protein secretion system of Pseudomonas syringae pv. tomato DC3000 , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[65]  Charles Elkan,et al.  Unsupervised learning of multiple motifs in biopolymers using expectation maximization , 1995, Mach. Learn..

[66]  P A Oltenacu,et al.  A decision support system for evaluating mastitis information. , 1995, Journal of dairy science.

[67]  H Hogeveen,et al.  Automatic detection of clinical mastitis is improved by in-line monitoring of somatic cell count. , 2008, Journal of dairy science.

[68]  George Stephanopoulos,et al.  Determination of minimum sample size and discriminatory expression patterns in microarray data , 2002, Bioinform..

[69]  A. D. Whittaker,et al.  Snack Quality Evaluation Method Based on Image Features and Neural Network Prediction , 1995 .

[70]  Paul Chen,et al.  SPECTRUM ANALYSIS OF MIXING POWER CURVES FOR NEURAL NETWORK PREDICTION OF DOUGH RHEOLOGICAL PROPERTIES , 1997 .

[71]  Lloyd A. Smith,et al.  An investigation into the use of machine learning for determining oestrus in cows , 1996 .

[72]  R. Lacroix,et al.  Improving dairy yield predictions through combined record classifiers and specialized artificial neural networks. , 1998 .

[73]  Gary D. Stormo,et al.  Identifying DNA and protein patterns with statistically significant alignments of multiple sequences , 1999, Bioinform..

[74]  Y. Z. Chen,et al.  Protein function classification via support vector machine approach. , 2003, Mathematical biosciences.

[75]  Jens Sadowski,et al.  Comparison of Support Vector Machine and Artificial Neural Network Systems for Drug/Nondrug Classification , 2003, J. Chem. Inf. Comput. Sci..

[76]  R M de Mol,et al.  Application of fuzzy logic in automated cow status monitoring. , 2001, Journal of dairy science.

[77]  Feng Luan,et al.  Diagnosing Breast Cancer Based on Support Vector Machines , 2003, J. Chem. Inf. Comput. Sci..

[78]  Yang Yang,et al.  Extracting Features from Protein Sequences Using Chinese Segmentation Techniques for Subcellular Localization , 2005, 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[79]  Seema Mattoo,et al.  A genome‐wide screen identifies a Bordetella type III secretion effector and candidate effectors in other species , 2005, Molecular microbiology.

[80]  Bao-Gang Hu,et al.  A novel support vector machine with its features weighted by mutual information , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[81]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[82]  Siti Zaiton Mohd Hashim,et al.  A model for gene selection and classification of gene expression data , 2007, Artificial Life and Robotics.

[83]  Minoru Itou,et al.  Lipid profile is associated with the incidence of cognitive dysfunction in viral cirrhotic patients: A data‐mining analysis , 2013, Hepatology research : the official journal of the Japan Society of Hepatology.

[84]  Khaled Rasheed,et al.  MDGA: motif discovery using a genetic algorithm , 2005, GECCO '05.

[85]  Jos Boekhorst,et al.  Data mining in the Life Sciences with Random Forest: a walk in the park or lost in the jungle? , 2012, Briefings Bioinform..

[86]  Martín López-Nores,et al.  Monitoring medicine intake in the networked home: The iCabiNET solution , 2008, Pervasive 2008.

[87]  S. Kim,et al.  NEURAL NETWORK MODELING AND FUZZY CONTROL SIMULATION FOR BREAD-BAKING PROCESS , 1997 .

[88]  G. Stormo,et al.  Identifying protein-binding sites from unaligned DNA fragments. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[89]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[90]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[91]  Louis Sanzogni,et al.  Milk Production estimates using feed forward artificial neural networks , 2001 .

[92]  R. J. Cole,et al.  Estimation of aflatoxin contamination in preharvest peanuts using neural networks , 1997 .

[93]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[94]  D Gianola,et al.  Analysis of reproductive performance of lactating cows on large dairy farms using machine learning algorithms. , 2006, Journal of dairy science.

[95]  Driss Aboutajdine,et al.  A two-stage gene selection scheme utilizing MRMR filter and GA wrapper , 2011, Knowledge and Information Systems.

[96]  Wei Kong,et al.  A combination of modified particle swarm optimization algorithm and support vector machine for gene selection and tumor classification. , 2007, Talanta.

[97]  Zhang Jianqi,et al.  Face recognition method based on support vector machine and particle swarm optimization , 2011 .

[98]  Jesús S. Aguilar-Ruiz,et al.  Incremental wrapper-based gene selection from microarray data for cancer classification , 2006, Pattern Recognit..

[99]  Rong-Ming Chen,et al.  FMGA: finding motifs by genetic algorithm , 2004, Proceedings. Fourth IEEE Symposium on Bioinformatics and Bioengineering.

[100]  Taioun Kim,et al.  Inducing inference rules for the classification of bovine mastitis , 1999 .

[101]  J J Domecq,et al.  Expert system for evaluation of reproductive performance and management. , 1991, Journal of dairy science.