An efficient method for mining cross-timepoint gene regulation sequential patterns from time course gene expression datasets

BackgroundObservation of gene expression changes implying gene regulations using a repetitive experiment in time course has become more and more important. However, there is no effective method which can handle such kind of data. For instance, in a clinical/biological progression like inflammatory response or cancer formation, a great number of differentially expressed genes at different time points could be identified through a large-scale microarray approach. For each repetitive experiment with different samples, converting the microarray datasets into transactional databases with significant singleton genes at each time point would allow sequential patterns implying gene regulations to be identified. Although traditional sequential pattern mining methods have been successfully proposed and widely used in different interesting topics, like mining customer purchasing sequences from a transactional database, to our knowledge, the methods are not suitable for such biological dataset because every transaction in the converted database may contain too many items/genes.ResultsIn this paper, we propose a new algorithm called CTGR-Span (Cross-Timepoint Gene Regulation Sequential pattern) to efficiently mine CTGR-SPs (Cross-Timepoint Gene Regulation Sequential Patterns) even on larger datasets where traditional algorithms are infeasible. The CTGR-Span includes several biologically designed parameters based on the characteristics of gene regulation. We perform an optimal parameter tuning process using a GO enrichment analysis to yield CTGR-SPs more meaningful biologically. The proposed method was evaluated with two publicly available human time course microarray datasets and it was shown that it outperformed the traditional methods in terms of execution efficiency. After evaluating with previous literature, the resulting patterns also strongly correlated with the experimental backgrounds of the datasets used in this study.ConclusionsWe propose an efficient CTGR-Span to mine several biologically meaningful CTGR-SPs. We postulate that the biologist can benefit from our new algorithm since the patterns implying gene regulations could provide further insights into the mechanisms of novel gene regulations during a biological or clinical progression. The Java source code, program tutorial and other related materials used in this program are available at http://websystem.csie.ncku.edu.tw/CTGR-Span.rar.

[1]  M. Gantier,et al.  The not‐so‐neutral role of microRNAs in neutrophil biology , 2013, Journal of leukocyte biology.

[2]  Matthew R. Jones,et al.  This information is current as Pneumococcal Pneumonia Innate Immune Responses during Type I Alveolar Epithelial Cells Mount , 2012 .

[3]  Yusuke Nakamura,et al.  Critical function for nuclear envelope protein TMEM209 in human pulmonary carcinogenesis. , 2012, Cancer research.

[4]  Jian Pei,et al.  Mining sequential patterns with constraints in large databases , 2002, CIKM '02.

[5]  Chia-Wen Chang,et al.  Mining Closed Sequential Patterns with Time Constraints , 2008, J. Inf. Sci. Eng..

[6]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[7]  Asrar B. Malik,et al.  Caveolin-1 Regulates NF-κB Activation and Lung Inflammatory Response to Sepsis Induced by Lipopolysaccharide1 , 2006, The Journal of Immunology.

[8]  Jing Wang,et al.  Effect of hepatitis C virus core shadow protein expressed in human hepatoma cell line on human gene expression profiles , 2006, Journal of gastroenterology and hepatology.

[9]  Suh-Yin Lee,et al.  Efficient mining of sequential patterns with time constraints by delimited pattern growth , 2005, Knowledge and Information Systems.

[10]  E. Swiatlo,et al.  Oligonucleotides identify conserved and variable regions of pspA and pspA-like sequences of Streptococcus pneumoniae. , 1997, Gene.

[11]  Vincent S. Tseng,et al.  A One-Phase Method for Mining High Utility Mobile Sequential Patterns in Mobile Commerce Environments , 2012, IEA/AIE.

[12]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[13]  W. MacNee,et al.  Role of transcription factors in inflammatory lung diseases , 1998, Thorax.

[14]  Nahum Sonenberg,et al.  Host Translation at the Nexus of Infection and Immunity , 2012, Cell Host & Microbe.

[15]  Takashi Shimizu,et al.  A Dipalmitoylated Lipoprotein from Mycoplasma pneumoniae Activates NF-κB through TLR1, TLR2, and TLR61 , 2005, The Journal of Immunology.

[16]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[17]  Chao-Hung Lee,et al.  Microarray studies on effects of Pneumocystis carinii infection on global gene expression in alveolar macrophages , 2010, BMC Microbiology.

[18]  Ian C. Hsu,et al.  Identification of Human Housekeeping Genes and Tissue-Selective Genes by Microarray Meta-Analysis , 2011, PloS one.

[19]  Yan A. Su,et al.  Signature patterns revealed by microarray analyses of mice infected with influenza virus A and Streptococcus pneumoniae , 2006, Microbes and Infection.

[20]  Hyunjung Shin,et al.  Extracting regulatory modules from gene expression data by sequential pattern mining , 2011, BMC Genomics.

[21]  Jacques Fellay,et al.  IL28B genotype is associated with differential expression of intrahepatic interferon‐stimulated genes in patients with chronic hepatitis C , 2010, Hepatology.

[22]  Andrew H Talal,et al.  Plasma chemokine levels correlate with the outcome of antiviral therapy in patients with hepatitis C. , 2005, Blood.

[23]  S. Zeichner,et al.  Human Immunodeficiency Virus Type 1 Vpr-Dependent Cell Cycle Arrest through a Mitogen-Activated Protein Kinase Signal Transduction Pathway , 2005, Journal of Virology.

[24]  M. Ballmaier,et al.  Incidence of CSF3R mutations in severe congenital neutropenia and relevance for leukemogenesis: Results of a long-term survey. , 2007, Blood.

[25]  Baw-Jhiune Liu,et al.  Efficient Discovery of Structural Motifs from Protein Sequences with Combination of Flexible Intra- and Inter-block Gap Constraints , 2006, PAKDD.

[26]  Philip S. Yu,et al.  Proceedings of the Eleventh International Conference on Data Engineering , 1995 .

[27]  Weixiong Zhang,et al.  Plasticity of the Systemic Inflammatory Response to Acute Infection during Critical Illness: Development of the Riboleukogram , 2008, PloS one.

[28]  Rui Liu,et al.  Open Access Research , 2022 .

[29]  Chu-Yu Chin,et al.  Discovering Clinical Biomarkers of Chronic Hepatitis B by Mining Mutation Hotspots , 2011, 2011 International Conference on Technologies and Applications of Artificial Intelligence.

[30]  A. Malik,et al.  Protease-activated Receptor-1 Activation of Endothelial Cells Induces Protein Kinase Cα-dependent Phosphorylation of Syntaxin 4 and Munc18c , 2005, Journal of Biological Chemistry.

[31]  Unil Yun,et al.  A new framework for detecting weighted sequential patterns in large sequence databases , 2008, Knowl. Based Syst..

[32]  Yi-Lin Tsai,et al.  CTGR-Span: Efficient mining of cross-timepoint gene regulation sequential patterns from microarray datasets , 2012, 2012 IEEE International Conference on Bioinformatics and Biomedicine.

[33]  Milton W. Taylor,et al.  Cyclic changes in gene expression induced by Peg-interferon alfa-2b plus ribavirin in peripheral blood monocytes (PBMC) of hepatitis C patients during the first 10 weeks of treatment , 2008, Journal of Translational Medicine.

[34]  T. Sauerbruch,et al.  Hepatitis C virus NS2 protein inhibits gene expression from different cellular and viral promoters in hepatic and nonhepatic cell lines. , 2003, Virology.

[35]  T. S. Moran,et al.  Genomic analysis of murine pulmonary tissue following carbonyl chloride inhalation. , 2005, Chemical research in toxicology.

[36]  Maria Anisimova,et al.  Phylogenomic analysis of natural selection pressure in Streptococcus genomes , 2007, BMC Evolutionary Biology.

[37]  Giovanni Colonna,et al.  A possible predictive marker of progression for hepatocellular carcinoma. , 2011, Oncology letters.

[38]  Wei Zhang,et al.  Identification of Candidate Susceptibility and Resistance Genes of Mice Infected with Streptococcus suis Type 2 , 2012, PloS one.

[39]  Hans-Gustaf Ljunggren,et al.  Interferon-alpha-induced TRAIL on natural killer cells is associated with control of hepatitis C virus infection. , 2010, Gastroenterology.

[40]  M. Inanlou,et al.  Microarray analysis of Myf5-/-:MyoD-/- hypoplastic mouse lungs reveals a profile of genes involved in pneumocyte differentiation. , 2007, Histology and histopathology.

[41]  Brad T. Sherman,et al.  DAVID: Database for Annotation, Visualization, and Integrated Discovery , 2003, Genome Biology.

[42]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[43]  R. Tupler,et al.  Altered gene silencing and human diseases , 2005, Clinical genetics.

[44]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.