A machine learning model to predict the origin of forensically relevant body fluids
暂无分享,去创建一个
Abstract More often than not, DNA profiling alone is not sufficient to accurately determine the nature of a crime. In such cases, the identification of the cellular origin and composition of crime scene related traces, shortly termed as body fluid identification (BFI) can provide contextual information with regard to the circumstances in which the crime unfolded. Our approach uses a targeted mRNA-Sequencing protocol for body fluid identification, based on a multiplexed panel of highly specific biomarkers corresponding to the five categories of forensically relevant body fluids: blood, saliva, semen, vaginal secretions and menstrual blood. Since targeted mRNA-sequencing offers both quantitative and qualitative information, it is a very powerful method for RNA profiling. The raw sequencing data were used to build a gene expression pipeline for evidencing the expression levels of the biomarkers and their correlations with the body fluids. Subsequently, the resulting expression profiles were used to build a multi-class random forest probabilistic classifier that predicts the origin of single-source and mixed samples, respectively. The novelty of this approach consists in incorporating probabilistic information in a machine learning prediction model, while also providing a high level of explainability for the prediction outputs.
[1] J. Ballantyne,et al. Messenger RNA biomarker signatures for forensic body fluid identification revealed by targeted RNA sequencing. , 2018, Forensic science international. Genetics.