A First Look at an Automated Pipeline for NGS-Based Breast-Cancer Diagnosis: The CArDIGAN Approach

Continuous improvements on Next-Generation Sequencing approaches are providing a wealth of data for life sciences, and massive application of ICT (Information and Communication Technologies) to molecular research has become essential for the progress of medical research. While several software tools have been developed to assist in analysis, there is still a lack of a focused and integrated solution for several of specific research goals. One such goal is molecular diagnostics of Breast Cancer (BC), the most common malignancy in females. In this paper we present, describe, and evaluate experimentally— on real data—part of a software pipeline we have designed, and are implementing, specifically aimed at assisting in BC diagnostics. The results show that the pipeline is effective in assisting and enhancing BC diagnostics, and encourage towards further automation.

[1]  V. D’Argenio,et al.  Cracking the Code of Human Diseases Using Next-Generation Sequencing: Applications, Challenges, and Perspectives , 2015, BioMed research international.

[2]  E. Mardis The impact of next-generation sequencing technology on genetics. , 2008, Trends in genetics : TIG.

[3]  J. Benítez,et al.  Whole Exome Sequencing Suggests Much of Non-BRCA1/BRCA2 Familial Breast Cancer Is Due to Moderate and Low Penetrance Susceptibility Alleles , 2013, PloS one.

[4]  Daniel J. Blankenberg,et al.  Using Galaxy to Perform Large‐Scale Interactive Data Analyses , 2012, Current protocols in bioinformatics.

[5]  Alessio Botta,et al.  On Network Throughput Variability in Microsoft Azure Cloud , 2014, GLOBECOM 2014.

[6]  Borja Sotomayor,et al.  Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses , 2014, J. Biomed. Informatics.

[7]  Zlatko Trajanoski,et al.  SIMPLEX: Cloud-Enabled Pipeline for the Comprehensive Analysis of Exome Sequencing Data , 2012, PloS one.

[8]  Konstantinos Krampis,et al.  Cloud BioLinux: pre-configured and on-demand bioinformatics computing for the genomics community , 2012, BMC Bioinformatics.

[9]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[10]  Mattia D'Antonio,et al.  WEP: an high-performance analysis pipeline for whole-exome sequencing data , 2013 .

[11]  Alberto Magi,et al.  Bioinformatics for Next Generation Sequencing Data , 2010, Genes.

[12]  Antonio Pescapè,et al.  A First Look at Public-Cloud Inter-Datacenter Network Performance , 2016, 2016 IEEE Global Communications Conference (GLOBECOM).

[13]  D. Bentley,et al.  Identification of the breast cancer susceptibility gene BRCA2 , 1995, Nature.

[14]  A. Antoniou,et al.  Genetic modifiers of cancer risk for BRCA1 and BRCA2 mutation carriers. , 2011, Annals of oncology : official journal of the European Society for Medical Oncology.

[15]  J. Chun,et al.  Analytical Tools and Databases for Metagenomics in the Next-Generation Sequencing Era , 2013, Genomics & informatics.

[16]  Antonio Pescapè,et al.  CloudSurf: A platform for monitoring public-cloud networks , 2016, 2016 IEEE 2nd International Forum on Research and Technologies for Society and Industry Leveraging a better tomorrow (RTSI).

[17]  M. Snyder,et al.  High-throughput sequencing technologies. , 2015, Molecular cell.

[18]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[19]  David R. Riley,et al.  CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing , 2011, BMC Bioinformatics.

[20]  Simon White,et al.  Launching genomics into the cloud: deployment of Mercury, a next generation sequence analysis pipeline , 2014, BMC Bioinformatics.

[21]  Clara Gaff,et al.  Cpipe: a shared variant detection pipeline designed for diagnostic settings , 2015 .

[22]  Michael R. Speicher,et al.  A survey of tools for variant analysis of next-generation genome sequencing data , 2013, Briefings Bioinform..

[23]  M. Dolled-Filhart,et al.  Computational and Bioinformatics Frameworks for Next-Generation Whole Exome and Genome Sequencing , 2013, TheScientificWorldJournal.

[24]  J. Hicks,et al.  Insight into the heterogeneity of breast cancer through next-generation sequencing. , 2011, The Journal of clinical investigation.

[25]  Li Guo,et al.  Next-Generation Sequencing of MicroRNAs for Breast Cancer Detection , 2011, Journal of biomedicine & biotechnology.

[26]  Andreas von Bubnoff,et al.  Next-Generation Sequencing: The Race Is On , 2008, Cell.

[27]  G. D'aiuto,et al.  The molecular analysis of BRCA1 and BRCA2: Next-generation sequencing supersedes conventional approaches. , 2015, Clinica chimica acta; international journal of clinical chemistry.

[28]  Rajnish Kumar,et al.  Statistical discrimination of breast cancer microarray data , 2016, 2016 International Conference on Bioinformatics and Systems Biology (BSB).

[29]  Mohammad Shabbir Hasan,et al.  Performance evaluation of indel calling tools using real short-read data , 2015, Human Genomics.

[30]  James Taylor,et al.  Next-generation sequencing data interpretation: enhancing reproducibility and accessibility , 2012, Nature Reviews Genetics.

[31]  Michael J. Becich,et al.  Next generation sequencing in clinical medicine: Challenges and lessons for pathology and biomedical informatics , 2012, Journal of pathology informatics.

[32]  Iain E. Buchan,et al.  Can multiple SNP testing in BRCA2 and BRCA1 female carriers be used to improve risk prediction models in conjunction with clinical assessment? , 2014, BMC Medical Informatics and Decision Making.

[33]  S. Narod Breast cancer in young women , 2012, Nature Reviews Clinical Oncology.

[34]  Andrzej M. Goscinski,et al.  A Survey of Approaches and Frameworks to Carry Out Genomic Data Analysis on the Cloud , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[35]  Jake Luo,et al.  Big Data Application in Biomedical Research and Health Care: A Literature Review , 2016, Biomedical informatics insights.

[36]  Shaurya Jauhari,et al.  Mining Gene Expression Data Focusing Cancer Therapeutics: A Digest , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[37]  Steven E. Bayer,et al.  A strong candidate for the breast and ovarian cancer susceptibility gene BRCA1. , 1994, Science.

[38]  G. Limongelli,et al.  DNA sequence capture and next-generation sequencing for the molecular diagnosis of genetic cardiomyopathies. , 2014, The Journal of molecular diagnostics : JMD.

[39]  Richard Durbin,et al.  Fast and accurate long-read alignment with Burrows–Wheeler transform , 2010, Bioinform..

[40]  Antonio Pescapè,et al.  Cloud monitoring: Definitions, issues and future directions , 2012, 2012 IEEE 1st International Conference on Cloud Networking (CLOUDNET).

[41]  J. Benítez,et al.  The complex genetic landscape of familial breast cancer , 2013, Human Genetics.