A genome analysis based on repeat sharing gene networks

Motivated by an interest to understand how information is organized within genomes, and how genes communicate between each other in the transcription process, in this paper we propose a novel network based methodology for genomic sequence analysis, specifically applied to three organisms: Nanoarchaeum equitans, Escherichia coli, and Saccaromyces cerevisiae. A dictionary based approach previously introduced is here continued through a repeat analysis in genic and intergenic regions. Key results of this work have been found in a biological and computational analysis of novel parametrized gene networks, defined by means of motifs of fixed length occurring inside multiple genes. Cliques emerge as groups of genes sharing a long repeat with a clear biological interpretation, while a (complete, paralog) cluster analysis has outlined some unexpected regularity. Repeat sharing gene networks may be applied in contexts of comparative genomics, as an investigation methodology for a comprehension of evolutional and functional properties of genes.

[1]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[2]  Gabriele Tosadori,et al.  Genome classification by dictionary-based indexes , 2011 .

[3]  Beth Israel,et al.  Decision letter: Replication Study: A coding-independent function of gene and pseudogene mRNAs regulates tumour biology , 2010 .

[4]  Astrid Kosters,et al.  Substring Differences in Genomes , 2008 .

[5]  L. Poliseno Pseudogenes: Newly Discovered Players in Human Cancer , 2012, Science Signaling.

[6]  Yi Luo,et al.  How independent are the appearances of n-mers in different genomes? , 2004, Bioinform..

[7]  David Horn,et al.  Genomic DNA k-mer Spectra: Models and Modalities , 2010, RECOMB.

[8]  Hiroshi Mizoguchi,et al.  Escherichia coli minimum genome factory , 2007, Biotechnology and applied biochemistry.

[9]  Raymond K. Auerbach,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[10]  Giuditta Franco,et al.  An Investigation on Genomic Repeats , 2013, CiE.

[11]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[12]  V. Manca,et al.  A dictionary based informational genome analysis , 2012, BMC Genomics.

[13]  Timothy L. Andersen,et al.  Absent Sequences: Nullomers and Primes , 2006, Pacific Symposium on Biocomputing.

[14]  V. Brendel,et al.  Genome structure described by formal languages. , 1984, Nucleic acids research.

[15]  C. Burge,et al.  Most mammalian mRNAs are conserved targets of microRNAs. , 2008, Genome research.

[16]  Pierre Mandin Genetic screens to identify bacterial sRNA regulators. , 2012, Methods in molecular biology.

[17]  Hsuan-Cheng Huang,et al.  Modularity of Escherichia coli sRNA regulation revealed by sRNA-target and protein network analysis , 2010, BMC Bioinformatics.

[18]  Giuditta Franco,et al.  Perspectives in Computational Genome Analysis , 2014, Discrete and Topological Models in Molecular Biology.

[19]  Sandip Paul,et al.  Analysis of Nanoarchaeum equitans genome and proteome composition: indications for hyperthermophilic and parasitic adaptation , 2006, BMC Genomics.

[20]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[21]  Robert Giegerich,et al.  BMC Bioinformatics BioMed Central Methodology article Efficient computation of absent words in genomic sequences , 2008 .

[22]  Antonio Restivo,et al.  Word assembly through minimal forbidden words , 2006, Theor. Comput. Sci..

[23]  Benno Schwikowski,et al.  Graph-based methods for analysing networks in cell biology , 2006, Briefings Bioinform..

[24]  J. Vogel,et al.  Experimental approaches for the discovery and characterization of regulatory small RNA. , 2009, Current opinion in microbiology.

[25]  Gonzalo Navarro,et al.  Compressed full-text indexes , 2007, CSUR.

[26]  D. Bartel MicroRNAs: Target Recognition and Regulatory Functions , 2009, Cell.

[27]  D. Searls,et al.  Robots in invertebrate neuroscience , 2002, Nature.

[28]  Ying Xu,et al.  Barcodes for genomes and applications , 2008, BMC Bioinformatics.

[29]  Jonas S. Almeida,et al.  Alignment-free sequence comparison-a review , 2003, Bioinform..

[30]  David B. Searls Molecules, Languages and Automata , 2010, ICGI.

[31]  Han N. Lim,et al.  Direct comparison of small RNA and transcription factor signaling , 2012, Nucleic acids research.

[32]  S. Gottesman The small RNA regulators of Escherichia coli: roles and mechanisms*. , 2004, Annual review of microbiology.

[33]  Jonas S. Almeida,et al.  Local Renyi entropic profiles of DNA sequences , 2007, BMC Bioinformatics.

[34]  R. Simons,et al.  Antisense RNA control in bacteria, phages, and plasmids. , 1994, Annual review of microbiology.