The NIST data science initiative

We examine foundational issues in data science including current challenges, basic research questions, and expected advances, as the basis for a new Data Science Initiative and evaluation series, introduced by the National Institute of Standards and Technology (NIST) in the fall of 2015. The evaluations will facilitate research efforts, collaboration, leverage shared infrastructure, and effectively address cross-cutting challenges faced by diverse data science communities. The evaluations will have multiple research tracks championed by members of the data science community, and will enable rigorous comparison of approaches through common tasks, datasets, metrics, and shared research challenges. The tracks will measure several different data science technologies in a wide range of fields, starting with a pre-pilot. In addition to developing data science evaluation methods and metrics, it will address computing infrastructure, standards for an interoperability framework, and domain-specific examples.

[1]  Thomas Hofmann,et al.  Predicting Structured Data (Neural Information Processing) , 2007 .

[2]  Eugenia Kalnay,et al.  Atmospheric Modeling, Data Assimilation and Predictability , 2002 .

[3]  Steven Finlay,et al.  Predictive Analytics, Data Mining and Big Data , 2014 .

[4]  Sal Speaker,et al.  Big Data and Data Science: Some Hype but Real Opportunities , .

[5]  Werner Bailer,et al.  A Novel Metadata Standard for Multimedia Preservation , 2014, iPRES.

[6]  S. Jørgensen The art of computer systems performance analysis: Techniques for Experimental Design, Measurement, Simulation and Modeling. Raj Jain. John Wiley, New York. Hardcover, 720 p. U.S. $52.95. , 1992 .

[7]  Srinivasan Parthasarathy,et al.  Proceedings of the 2014 SIAM International Conference on Data Mining, Philadelphia, Pennsylvania, USA, April 24-26, 2014 , 2014, SDM.

[8]  Christopher A. Badurek,et al.  Review of Information visualization in data mining and knowledge discovery by Usama Fayyad, Georges G. Grinstein, and Andreas Wierse. Morgan Kaufmann 2002 , 2003 .

[9]  Philip S. Yu,et al.  Structural Diversity for Privacy in Publishing Social Networks , 2011, SDM.

[10]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[11]  Prasoon Goyal,et al.  Probabilistic Databases , 2009, Encyclopedia of Database Systems.

[12]  Jean-Marc Vincent,et al.  Monitoring parallel programs for performance tuning in cluster environments , 2001 .

[13]  Roman Pyzh,et al.  Impact of analytic provenance in genome analysis , 2014, BMC Genomics.

[14]  Anupam Datta,et al.  Privacy through Accountability: A Computer Science Perspective , 2014, ICDCIT.

[15]  Daniel A. Keim,et al.  Information Visualization and Visual Data Mining , 2002, IEEE Trans. Vis. Comput. Graph..

[16]  Craig A. Knoblock,et al.  Exploiting Semantics for Big Data Integration , 2015, AI Mag..

[17]  J. Pearl Causal inference in statistics: An overview , 2009 .

[18]  James Cheney,et al.  Provenance in Databases: Why, How, and Where , 2009, Found. Trends Databases.

[19]  D. George Understanding Structural and Semantic Heterogeneity in the Context of Database Schema Integration , 2006 .

[20]  Eric Yu,et al.  Conceptual Modeling: Foundations and Applications , 2009 .

[21]  Charu C. Aggarwal,et al.  On Anonymization of Multi-graphs , 2011, SDM.

[22]  Sunil Prabhakar,et al.  A Statistical Method for Integrated Data Cleaning and Imputation , 2009 .

[23]  Kathleen M. Carley,et al.  Spatiotemporal Network Analysis and Visualization , 2015, Int. J. Appl. Geospat. Res..

[24]  Tina Hesman Saey Big data, big challenges: As researchers begin analyzing massive datasets, Opportunities for chaos and errors multiply , 2015 .

[25]  Lise Getoor,et al.  Using Semantics and Statistics to Turn Data into Knowledge , 2015, AI Mag..

[26]  Yogesh L. Simmhan,et al.  A survey of data provenance in e-science , 2005, SGMD.

[27]  Isabel Meirelles,et al.  Design for Information: An Introduction to the Histories, Theories, and Best Practices Behind Effective Information Visualizations , 2013 .

[28]  Timothy W. Finin,et al.  Entity Type Recognition for Heterogeneous Semantic Graphs , 2013, AI Mag..

[29]  Steven Finlay,et al.  Predictive Analytics, Data Mining and Big Data: Myths, Misconceptions and Methods , 2014 .

[30]  Sunita Sarawagi,et al.  Active Evaluation of Classifiers on Large Datasets , 2012, 2012 IEEE 12th International Conference on Data Mining.

[31]  Peter Christen,et al.  Data Matching , 2012, Data-Centric Systems and Applications.

[32]  Download Book,et al.  Information Visualization in Data Mining and Knowledge Discovery , 2001 .

[33]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[34]  M. C. Jones,et al.  E. Fix and J.L. Hodges (1951): An Important Contribution to Nonparametric Discriminant Analysis and Density Estimation: Commentary on Fix and Hodges (1951) , 1989 .

[35]  Sean D Dessureault,et al.  Understanding big data , 2016 .

[36]  Ben Shneiderman,et al.  The Craft of Information Visualization: Readings and Reflections , 2003 .

[37]  Konstantinos Kalpakis,et al.  Detecting Road Traffic Events by Coupling Multiple Timeseries With a Nonparametric Bayesian Method , 2014, IEEE Transactions on Intelligent Transportation Systems.

[38]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[39]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[40]  Lise Getoor,et al.  Using Semantics & Statistics to Turn Data into Knowledge , 2014 .

[41]  Raffael Marty,et al.  Applied Security Visualization , 2008 .

[42]  Anshul Mittal,et al.  Stock Prediction Using Twitter Sentiment Analysis , 2011 .

[43]  Ashwin Machanavajjhala,et al.  Entity Resolution: Theory, Practice & Open Challenges , 2012, Proc. VLDB Endow..

[44]  Sanjeev Khanna,et al.  Data Provenance: Some Basic Issues , 2000, FSTTCS.

[45]  Dan Suciu,et al.  Bringing Provenance to Its Full Potential Using Causal Reasoning , 2011, TaPP.

[46]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .