Leveraging Data and People to Accelerate Data Science
暂无分享,去创建一个
Doing data science - extracting insight by analyzing data - is not easy. Data science is used to answer interesting questions that typically involve multiple diverse data sources, many different types of analysis, and often, large and messy data volumes. To answer one of these questions, several types of expertise may be needed to understand the context and domain being served, to import and transform individual data sets, to implement effective machine learning and/or statistical methods, to design and program applications and interfaces to extract and share data and insights, and to manage the data and systems used for analysis and storage. In the IBM Research Accelerated Discovery Lab, we are studying how data scientists work, and using what we learn to help them gain insights faster. In this talk, we will look at what we have learned to date, through user studies and experience with tens of analytics projects, and the environment that we’ve built as a result. In particular, I will describe how we capture information to enable contextual search, provenance queries, and other functionality to afford teams faster progress in data-intensive investigations. I will also touch on our efforts to leverage data and people to explain what happens during an investigation, with an ultimate goal of moving from descriptive to prescriptive analytics in order to accelerate data science and the analytic process. I will illustrate these various efforts using an ambitious current project on applying metagenomics to food safety, and will conclude with a discussion of where more work is needed and our future directions.