BioJava 5: A community driven open-source bioinformatics library

BioJava is an open-source project that provides a Java library for processing biological data. The project aims to simplify bioinformatic analyses by implementing parsers, data structures, and algorithms for common tasks in genomics, structural biology, ontologies, phylogenetics, and more. Since 2012, we have released two major versions of the library (4 and 5) that include many new features to tackle challenges with increasingly complex macromolecular structure data. BioJava requires Java 8 or higher and is freely available under the LGPL 2.1 license. The project is hosted on GitHub at https://github.com/biojava/biojava. More information and documentation can be found online on the BioJava website (http://www.biojava.org) and tutorial (https://github.com/biojava/biojava-tutorial). All inquiries should be directed to the GitHub page or the BioJava mailing list (http://lists.open-bio.org/mailman/listinfo/biojava-l).

[1]  Andreas Prlic,et al.  BioJava: an open-source framework for bioinformatics in 2012 , 2012, Bioinform..

[2]  Jens Allmer,et al.  AltORFev facilitates the prediction of alternative open reading frames in eukaryotic mRNAs , 2016, Bioinform..

[3]  E. Kaplan,et al.  Nonparametric Estimation from Incomplete Observations , 1958 .

[4]  Matthew R. Pocock,et al.  BioJava: open source components for bioinformatics , 2000, SIGB.

[5]  Jose M. Duarte,et al.  Towards an efficient compression of 3D coordinates of macromolecular structures , 2017, PloS one.

[6]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[7]  Christian M. Zmasek,et al.  phyloXML: XML for evolutionary biology and comparative genomics , 2009, BMC Bioinformatics.

[8]  A. Shrake,et al.  Environment and exposure to solvent of protein atoms. Lysozyme and insulin. , 1973, Journal of molecular biology.

[9]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[10]  M. J. Chalmers,et al.  HDX Workbench: Software for the Analysis of H/D Exchange MS Data , 2012, Journal of The American Society for Mass Spectrometry.

[11]  Dong Xu,et al.  G2S: a web-service for annotating genomic variants on 3D protein structures , 2018, Bioinform..

[12]  Philip E. Bourne,et al.  Analyzing the symmetrical arrangement of structural repeats in proteins with CE-Symm , 2019, PLoS Comput. Biol..

[13]  Matthew R. Pocock,et al.  The Bioperl toolkit: Perl modules for the life sciences. , 2002, Genome research.

[14]  Jun Wang,et al.  Boolean analysis reveals systematic interactions among low-abundance species in the human gut microbiome , 2017, PLoS Comput. Biol..

[15]  Andreas Prlić,et al.  Analyzing the symmetrical arrangement of structural repeats in proteins with CE-Symm , 2018, bioRxiv.

[16]  Dong Xu,et al.  BioJava-ModFinder: identification of protein modifications in 3D structures from the Protein Data Bank , 2017, Bioinform..

[17]  Jose M. Duarte,et al.  Automated evaluation of quaternary structures from protein crystals , 2017, bioRxiv.

[18]  Zhang Zhang,et al.  CloudPhylo: a fast and scalable tool for phylogeny reconstruction , 2016, Bioinform..

[19]  Zheng Rong Yang,et al.  RONN: the bio-basis function neural network technique applied to the detection of natively disordered regions in proteins , 2005, Bioinform..

[20]  Andreas Prlic,et al.  MMTF—An efficient file format for the transmission, visualization, and analysis of macromolecular structures , 2017, PLoS Comput. Biol..

[21]  Zukang Feng,et al.  RCSB Protein Data Bank: Sustaining a living digital data resource that enables breakthroughs in scientific research and biomedical education , 2017, Protein science : a publication of the Protein Society.

[22]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[23]  Mark Stitt,et al.  RobiNA: a user-friendly, integrated software solution for RNA-Seq-based transcriptomics , 2012, Nucleic Acids Res..

[24]  Torsten Schwede,et al.  Assessment of protein assembly prediction in CASP12 , 2018, Proteins.

[25]  Philip E. Bourne,et al.  A New Algorithm for the Alignment of Multiple Protein Structures Using Monte Carlo Optimization , 2000, Pacific Symposium on Biocomputing.

[26]  Andreas Prlic,et al.  Sequence analysis , 2003 .