Big data analytics for drug discovery

Big Data refers to data sets that are so large and complex that traditional data processing tools and technologies cannot cope with. The process of examining such data to uncover hidden patterns in them is referred to as Big Data Analytics. Drug discovery is related to big data analytics as the process may require the collection, processing and analysis of extremely large volume of structured and unstructured biomedical data stemming from a wide range of experiments and surveys collected by hospitals, laboratories, pharmaceutical companies or even social media. These data may include sequencing and gene expression data, drug data including molecular data, protein and drug interaction data, clinical trial and electronic patient record data, patient behavior and self-reporting data in social media, regulatory monitoring data, and literatures where trends and drug repurposing and protein-protein interaction data may be found. To analyze such diversity of data types in large volumes for the purpose of drug discovery, we need algorithms that are simple, effective, efficient and scalable. In this talk, we discuss how we can take advantage of the recent development in big data analytics to improve the drug discovery process. We describe what have recently been done and what remain to be done to develop big data algorithms for drug discovery. We present the effort we have recently made to develop such algorithms to uncover hidden patterns in such data as unreported drug side-effect discussions in social media communications, patient record and sequencing data, regulatory monitoring and drug-protein interaction data, protein-chemical interactions data, etc., for drug side-effect prediction and how such predictions may be used to determine possible drug structures with different desirable properties. We discuss how big data analytics may contribute to better drug efficacy and safety for pharmaceutical companies and regulators.