Detecting seasonal marine microbial communities with symmetrical non-negative matrix factorization

With the development of high-throughput and low-cost sequencing technology, a large amount of marine microbial sequences is generated. So, it is possible to research more uncultivated marine microbes. The marine microbial diversity, the association patterns among marine microbial species and environment factors are hidden in these large amount sequences. Understanding these association patterns has a high potential for exploiting the marine resources. Yet, very few marine microbial association patterns are well characterized even with the weight of research effort presently devoted to this field. In this paper, with the 16S rRNA tag pyrosequencing data taken monthly over 6 years at a temperate marine coastal sits in West English Channel, we first introduced a neighbor-seeds based heuristic clustering method called as NbHCluster by incorporating an adaptive neighbor set expanding procedure and a greedy heuristic clustering procedure, to generate the operational taxonomic units (OTUs), and utilized the mutual information (MI) algorithm to construct the spring, summer, fall, and winter seasonal marine association networks of microbe and environmental factors. Then, we used the fuzzy clustering framework by defining a clique-node similarity matrix and adopting the symmetrical non-negative matrix factorization method, to detect the association community patterns and structures in the four seasonal marine networks. The results show that the four seasonal marine microbial association networks have characters of complex networks, and the marine microbial association patterns are related with the seasonal variability; the same environmental factor influence different species in the four seasons; and the correlative relationships are stronger between OTUs (taxa) than with environmental factors.

[1]  B L Maidak,et al.  The RDP-II (Ribosomal Database Project) , 2001, Nucleic Acids Res..

[2]  Susan M. Huse,et al.  The Taxonomic and Functional Diversity of Microbes at a Temperate Coastal Site: A ‘Multi-Omic’ Study of Seasonal and Diel Temporal Variation , 2010, PloS one.

[3]  D. Caron,et al.  Marine bacterial, archaeal and protistan association networks reveal ecological linkages , 2011, The ISME Journal.

[4]  Susan M. Huse,et al.  Microbial diversity in the deep sea and the underexplored “rare biosphere” , 2006, Proceedings of the National Academy of Sciences.

[5]  T. Nepusz,et al.  Fuzzy communities and the concept of bridgeness in complex networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[6]  P. Falkowski,et al.  Biogeochemical Controls and Feedbacks on Ocean Primary Production , 1998, Science.

[7]  Marc Garbey,et al.  Reconstructing regulatory networks from the dynamic plasticity of gene expression by mutual information , 2013, Nucleic acids research.

[8]  Xindong Wu,et al.  A new descriptive clustering algorithm based on Nonnegative Matrix Factorization , 2008, 2008 IEEE International Conference on Granular Computing.

[9]  M. Cottrell,et al.  The structure of bacterial communities in the western Arctic Ocean as revealed by pyrosequencing of 16S rRNA genes. , 2010, Environmental microbiology.

[10]  Eoin L. Brodie,et al.  Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB , 2006, Applied and Environmental Microbiology.

[11]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[12]  Noah Fierer,et al.  Using network analysis to explore co-occurrence patterns in soil microbial communities , 2011, The ISME Journal.

[13]  W. Ludwig,et al.  SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB , 2007, Nucleic acids research.

[14]  Susan M. Huse,et al.  Accuracy and quality of massively parallel DNA pyrosequencing , 2007, Genome Biology.

[15]  Robert C. Edgar,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2001 .

[16]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[17]  Jizhong Zhou,et al.  Phylogenetic Molecular Ecological Network of Soil Microbial Communities in Response to Elevated CO2 , 2011, mBio.

[18]  Mihai Pop,et al.  DNACLUST: accurate and efficient clustering of phylogenetic marker genes , 2011, BMC Bioinformatics.

[19]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[20]  P. Paatero,et al.  Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values† , 1994 .