SOCR data dashboard: an integrated big data archive mashing medicare, labor, census and econometric information

IntroductionIntuitive formulation of informative and computationally-efficient queries on big and complex datasets present a number of challenges. As data collection is increasingly streamlined and ubiquitous, data exploration, discovery and analytics get considerably harder. Exploratory querying of heterogeneous and multi-source information is both difficult and necessary to advance our knowledge about the world around us.Research designWe developed a mechanism to integrate dispersed multi-source data and service the mashed information via human and machine interfaces in a secure, scalable manner. This process facilitates the exploration of subtle associations between variables, population strata, or clusters of data elements, which may be opaque to standard independent inspection of the individual sources. This a new platform includes a device agnostic tool (Dashboard webapp, http://socr.umich.edu/HTML5/Dashboard/) for graphical querying, navigating and exploring the multivariate associations in complex heterogeneous datasets.ResultsThe paper illustrates this core functionality and serviceoriented infrastructure using healthcare data (e.g., US data from the 2010 Census, Demographic and Economic surveys, Bureau of Labor Statistics, and Center for Medicare Services) as well as Parkinson’s Disease neuroimaging data. Both the back-end data archive and the front-end dashboard interfaces are continuously expanded to include additional data elements and new ways to customize the human and machine interactions.ConclusionsA client-side data import utility allows for easy and intuitive integration of user-supplied datasets. This completely open-science framework may be used for exploratory analytics, confirmatory analyses, meta-analyses, and education and training purposes in a wide variety of fields.

[1]  D. Polymath,et al.  A new proof of the density Hales-Jewett theorem , 2009, 0910.3926.

[2]  Trey Ideker,et al.  Cytoscape 2.8: new features for data integration and network visualization , 2010, Bioinform..

[3]  Hhs Centers for Medicare Medicaid Services,et al.  Medicare program; hospital inpatient prospective payment systems for acute care hospitals and the long-term care hospital prospective payment system and Fiscal Year 2014 rates; quality reporting requirements for specific providers; hospital conditions of participation; payment policies related to pa , 2013, Federal register.

[4]  Erika Check Hayden,et al.  Mozilla plan seeks to debug scientific code , 2013, Nature.

[5]  Heikki Mannila,et al.  Principles of Data Mining , 2001, Undergraduate Topics in Computer Science.

[6]  Jeffrey Heer,et al.  D³ Data-Driven Documents , 2011, IEEE Transactions on Visualization and Computer Graphics.

[7]  Kristina Chodorow,et al.  MongoDB: The Definitive Guide , 2010 .

[8]  G. C. Tiao,et al.  Bayesian inference in statistical analysis , 1973 .

[9]  Lars Schmidt-Thieme,et al.  Data Analysis, Machine Learning and Applications - Proceedings of the 31st Annual Conference of the Gesellschaft für Klassifikation e.V., Albert-Ludwigs-Universität Freiburg, March 7-9, 2007 , 2008, GfKl.

[10]  Ivo D Dinov,et al.  Technology‐enhanced Interactive Teaching of Marginal, Joint and Conditional Probabilities: The Special Case of Bivariate Normal Distribution , 2013, Teaching statistics.

[11]  Martin Wattenberg,et al.  ManyEyes: a Site for Visualization at Internet Scale , 2007, IEEE Transactions on Visualization and Computer Graphics.

[12]  Steve Vinoski,et al.  Node.js: Using JavaScript to Build High-Performance Network Programs , 2010, IEEE Internet Comput..

[13]  Aniket Kittur,et al.  The polymath project: lessons from a successful online collaboration in mathematics , 2011, CHI.

[14]  Jeffrey Heer,et al.  SpanningAspectRatioBank Easing FunctionS ArrayIn ColorIn Date Interpolator MatrixInterpola NumObjecPointI Rectang ISchedu Parallel Pause Scheduler Sequen Transition Transitioner Transiti Tween Co DelimGraphMLCon IData JSONCon DataField DataSc Dat DataSource Data DataUtil DirtySprite LineS RectSprite , 2011 .

[15]  A. Simmons,et al.  Large-scale resting state network correlates of cognitive impairment in Parkinson's disease and related dopaminergic deficits , 2014, Front. Syst. Neurosci..

[16]  Ivo D Dinov,et al.  SOCR Motion Charts: An Efficient, Open-Source, Interactive and Dynamic Applet for Visualizing Longitudinal Multivariate Data , 2010, Journal of statistics education : an international journal on the teaching and learning of statistics.

[17]  Martin Klazar,et al.  Polymath's combinatorial proof of the density Hales-Jewett theorem , 2012, 1205.7084.

[18]  Vijay V. Raghavan,et al.  NoSQL Systems for Big Data Management , 2014, 2014 IEEE World Congress on Services.

[19]  Trey Ideker,et al.  Cytoscape tools for the web age: D3.js and Cytoscape.js exporters , 2014, F1000Research.

[20]  Ashutosh Nandeshwar Tableau Data Visualization Cookbook , 2013 .

[21]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[22]  Saurabh Phaltane,et al.  APACHE WEB SERVER MONITORING , 2013 .

[23]  Ivo D Dinov,et al.  3D comparison of hippocampal atrophy in amnestic mild cognitive impairment and Alzheimer's disease. , 2006, Brain : a journal of neurology.

[24]  Eric Rollins,et al.  Evaluating whether changes in utilization of hospital outpatient services contributed to lower Medicare readmission rate. , 2014, Medicare & medicaid research review.

[25]  Yike Guo,et al.  tranSMART: An Open Source and Community-Driven Informatics and Data Sharing Platform for Clinical and Translational Research , 2013, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[26]  Dongmei Liu,et al.  Sleep duration predicts cardiometabolic risk in obese adolescents. , 2014, The Journal of pediatrics.

[27]  Charles F. Burant,et al.  The impact of a managed care obesity intervention on clinical outcomes and costs: A prospective observational study , 2013, Obesity.

[28]  A. Toga,et al.  Structural Correlates of Apathy in Alzheimer’s Disease , 2007, Dementia and Geriatric Cognitive Disorders.

[29]  Samuel Hertig,et al.  A guide to the visual analysis and communication of biomolecular structural data , 2014, Nature Reviews Molecular Cell Biology.

[30]  Jeffrey Heer,et al.  Wrangler: interactive visual specification of data transformation scripts , 2011, CHI.

[31]  Arthur W. Toga,et al.  High-throughput neuroimaging-genetics computational infrastructure , 2014, Front. Neuroinform..

[32]  Arthur W. Toga,et al.  The perfect neuroimaging-genetics-computation storm: collision of petabytes of data, millions of hardware devices and thousands of software tools , 2013, Brain Imaging and Behavior.

[33]  Zhu Wei-ping,et al.  Using MongoDB to implement textbook management system instead of MySQL , 2011, 2011 IEEE 3rd International Conference on Communication Software and Networks.

[34]  Gabor Grothendieck,et al.  Lattice: Multivariate Data Visualization with R , 2008 .

[35]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[36]  D. S. Sivia,et al.  Data Analysis , 1996, Encyclopedia of Evolutionary Psychological Science.

[37]  Thorsten Meinl,et al.  KNIME: The Konstanz Information Miner , 2007, GfKl.