Evaluating the disparity between active areas of biomedical research and the global burden of disease employing Linked Data and data-driven discovery

Although biomedical research has brought substantial benefit to people all over the world, by dramatically improving their life expectancy and the quality of life, the distribution of this benefit is not equitable. An important contributor to this is the current absence of accurate, interlinked data and information that enables a precise description of the degree of inequality between current efforts in biomedical research and global health care needs. In this position paper we present an approach for evaluating this disparity, which involves converting and inter-linking relevant datasets into Linked Data, and analyzing them to represent the disparity as a visual map. We identify different data sets, relevant for answering the research question. Since bio-medical statistical data is of paramount importance in this data integration project, we describe a tool and methodology of representing such bio-medical statistical data as RDF. We perform an preliminary integration of the datasets and outline how prospective queries can be formulated. We conclude by discussing the limitations of the approach taking the current bio-medical data curation landscape into account.

[1]  Elisa Bertino,et al.  Guest Editors' Introduction: Data Quality in the Internet Era , 2010, IEEE Internet Comput..

[2]  Deborah F. Swayne,et al.  Interactive and Dynamic Graphics for Data Analysis - With R and GGobi , 2007, Use R.

[3]  Felix Naumann,et al.  Data Fusion in Three Steps: Resolving Schema, Tuple, and Value Inconsistencies , 2006, IEEE Data Eng. Bull..

[4]  Arie Segev,et al.  Data manipulation in heterogeneous databases , 1991, SGMD.

[5]  Martin Gaedke,et al.  Discovering and Maintaining Links on the Web of Data , 2009, SEMWEB.

[6]  Olaf Hartig,et al.  Using Web Data Provenance for Quality Assessment , 2009, SWPM.

[7]  Jennifer Widom,et al.  Research problems in data warehousing , 1995, CIKM '95.

[8]  Jaideep Srivastava,et al.  Entity identification in database integration , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[9]  Harlan M. Krumholz,et al.  Trial Publication after Registration in ClinicalTrials.Gov: A Cross-Sectional Analysis , 2009, PLoS medicine.

[10]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[11]  Les Gasser,et al.  Assessing Information Quality of a Community-Based Encyclopedia , 2005, ICIQ.

[12]  Klaus R. Dittrich,et al.  Three decades of data integration - All problems solved? , 2004, IFIP Congress Topical Sessions.

[13]  H. Varmus,et al.  Evaluating the burden of disease and spending the research dollars of the National Institutes of Health. , 1999, The New England journal of medicine.

[14]  J. R. Evans Essential national health research. A key to equity in development. , 1990, The New England journal of medicine.

[15]  Panagiotis G. Ipeirotis,et al.  Duplicate Record Detection: A Survey , 2007 .

[16]  Lee Feigenbaum,et al.  SCOVO: Using Statistics on the Web of Data , 2009, ESWC.

[17]  Ivan P. Fellegi,et al.  A Theory for Record Linkage , 1969 .

[18]  N. Powe,et al.  The relation between funding by the National Institutes of Health and the burden of disease. , 1999, The New England journal of medicine.

[19]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[20]  Andrew McCallum,et al.  Joint deduplication of multiple record types in relational data , 2005, CIKM '05.

[21]  Arthur Meltzer,et al.  Health Disparities: Measuring Health Care Use and Access for Racial/Ethnic Populations , 2005 .

[22]  Sören Auer,et al.  OntoWiki: A Tool for Social, Semantic Collaboration , 2006, CKC.

[23]  A. Thiel,et al.  Immunity against HIV/AIDS, Malaria, and Tuberculosis during Co-Infections with Neglected Infectious Diseases: Recommendations for the European Union Research Priorities , 2008, PLoS neglected tropical diseases.