Transcript annotation tool (TransAT): an R package for retrieving annotations for transcript-specific genetic variants

Background An individual’s genetics play a role in how RNA transcripts are generated from DNA and consequently in their translation into protein. Transcriptional and translational profiling of patients furnishes the information that a specific marker is present; however, it fails to provide evidence whether the marker correlates with response to a therapeutic agent. A comparative analysis of the frequency of genetic variants, such as single nucleotide polymorphisms (SNPs), in diseased and general populations can identify pathogenic variants in individual patients. This is in part because SNPs have considerable effects on protein function and gene expression when they occur in coding regions and regulatory sequences, respectively. Therefore, a tool that can help users to obtain the allele frequency for a corresponding transcript is the need of the day. Several annotation tools such as SNPnexus and VariED are publicly available; however, none of them can use transcript IDs as input and provide the corresponding genomic positions of variants. Results In this study, we developed an R package, called transcript annotation tool (TransAT), that provides (i) SNP ID and genomic position for a user-provided transcript ID from patients, and (ii) allele frequencies for the SNPs from publicly available global populations. All data elements are extracted, collected, and displayed in an easily downloadable format in two simple command lines. TransAT is available on Windows/Linux/MacOS and is operative for R version 4.0.4 or later. It is available at https://github.com/ShihChingYu/TransAT and can be downloaded and installed using devtools::install_github("ShihChingYu/TransAT", force=T) on the R execution page. Thereafter, all functions can be executed by loading the package into R with library(TransAT). Conclusions TransAT is a novel tool that seamlessly provides genetic annotations for queried transcripts. Such easily obtainable information would be greatly advantageous for physicians, assisting them to make individualized decisions about specific drug treatments. Moreover, allele frequencies from user-chosen global ethnic populations will highlight the importance of ethnicity and its effect on patient pathogenicity.

[1]  F. Zhu,et al.  Genome-wide association study reveals novel loci associated with body size and carcass yields in Pekin ducks , 2019, BMC Genomics.

[2]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[3]  Chen-Yang Shen,et al.  Population structure of Han Chinese in the modern Taiwanese population based on 10,000 participants in the Taiwan Biobank project. , 2016, Human molecular genetics.

[4]  R. Yamada,et al.  Ethnic differences in allele frequency of autoimmune-disease-associated SNPs , 2005, Journal of Human Genetics.

[5]  Y. Yen,et al.  Alternative splicing as a therapeutic target for human diseases. , 2009, Methods in molecular biology.

[6]  Keith Nykamp,et al.  Pathogenic variant burden in the ExAC database: an empirical approach to evaluating population data for clinical variant interpretation , 2017, Genome Medicine.

[7]  N. Powe The Pathogenesis of Race and Ethnic Disparities: Targets for Achieving Health Equity. , 2021, Clinical journal of the American Society of Nephrology : CJASN.

[8]  S. Navarro,et al.  High Oct4 expression: implications in the pathogenesis of neuroblastic tumours , 2019, BMC Cancer.

[9]  David Haussler,et al.  The UCSC Known Genes , 2006, Bioinform..

[10]  C. Amos,et al.  Racial Differences in the Association Between SNPs on 15q25.1, Smoking Behavior, and Risk of Non-small Cell Lung Cancer , 2009, Journal of thoracic oncology : official publication of the International Association for the Study of Lung Cancer.

[11]  Ke Wang,et al.  A high-throughput SNP discovery strategy for RNA-seq data , 2019, BMC Genomics.

[12]  X. Wang,et al.  Integrated metabolite and gene expression profiles identify lipid biomarkers associated with progression of hepatocellular carcinoma and patient outcomes. , 2013, Gastroenterology.

[13]  D. G. MacArthur,et al.  Guidelines for investigating causality of sequence variants in human disease , 2014, Nature.

[14]  N. Lemoine,et al.  SNPnexus: a web server for functional annotation of human genome sequence variation (2020 update) , 2020, Nucleic Acids Res..

[15]  Kathleen M. Egan,et al.  Family history is a significant risk factor for pancreatic cancer: results from a systematic review and meta-analysis , 2008, Familial Cancer.

[16]  T. Tatusova,et al.  NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2006, Nucleic Acids Research.

[17]  Astrid Gall,et al.  Ensembl 2020 , 2019, Nucleic Acids Res..

[18]  Catherine A McCarty,et al.  Population based allele frequencies of disease associated polymorphisms in the Personalized Medicine Research Project , 2010, BMC Genetics.

[19]  M. Shanahan,et al.  Population-based RNA profiling in Add Health finds social disparities in inflammatory and antiviral gene regulation to emerge by young adulthood , 2020, Proceedings of the National Academy of Sciences.

[20]  Rong Chen,et al.  Integrating 400 million variants from 80,000 human samples with extensive annotations: towards a knowledge base to analyze disease cohorts , 2016, BMC Bioinformatics.

[21]  M. Zampieri,et al.  Metabolic profiling of cancer cells reveals genome-wide crosstalk between transcriptional regulators and metabolism , 2019, Nature Communications.

[22]  Olga Anczuków,et al.  Alternative‐splicing defects in cancer: Splicing regulators and their downstream targets, guiding the way to novel cancer therapeutics , 2018, Wiley interdisciplinary reviews. RNA.

[23]  Shiguo Liu,et al.  SMAD4 Y353C promotes the progression of PDAC , 2019, BMC Cancer.

[24]  G. Petersen,et al.  Prevalence of germline mutations in cancer predisposition genes in patients with pancreatic cancer. , 2015, Gastroenterology.

[25]  M. Mildner,et al.  Re-epithelialization and immune cell behaviour in an ex vivo human skin model , 2020, Scientific Reports.

[26]  Amrita Chattopadhyay,et al.  VariED: the first integrated database of gene annotation and expression profiles for variants related to human diseases , 2019, Database J. Biol. Databases Curation.

[27]  Linda Koch,et al.  Exploring human genomic diversity with gnomAD , 2020, Nature Reviews Genetics.

[28]  Bale,et al.  Standards and Guidelines for the Interpretation of Sequence Variants: A Joint Consensus Recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology , 2015, Genetics in Medicine.

[29]  A. Shelling,et al.  Predictive and prognostic molecular markers for cancer medicine , 2010, Therapeutic advances in medical oncology.

[30]  A. Rustgi Familial pancreatic cancer: genetic advances , 2014, Genes & development.

[31]  R. Rodenburg,et al.  The functional genomics laboratory: functional validation of genetic variants , 2018, Journal of Inherited Metabolic Disease.

[32]  D. Perkins,et al.  Expanding the ‘central dogma’: the regulatory role of nonprotein coding genes and implications for the genetic liability to schizophrenia , 2005, Molecular Psychiatry.

[33]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[34]  Robert Gentleman,et al.  Software for Computing and Annotating Genomic Ranges , 2013, PLoS Comput. Biol..

[35]  Elvira Pelosi,et al.  Pancreatic Cancer: Molecular Characterization, Clonal Evolution and Cancer Stem Cells , 2017, Biomedicines.

[36]  Hong Zhao,et al.  A new SNP genotyping technology Target SNP-seq and its application in genetic analysis of cucumber varieties , 2020, Scientific Reports.

[37]  E. Birney,et al.  Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt , 2009, Nature Protocols.

[38]  Hui Yang,et al.  Genomic variant annotation and prioritization with ANNOVAR and wANNOVAR , 2015, Nature Protocols.