论文信息 - Big data trends and evolution: a human perspective

Big data trends and evolution: a human perspective

The Big Data revolution has already happened and, through it, organizations started realizing the potential of using data to take better informed decisions, mitigate risks and overall better control their destiny. With all the benefits that Big Data brings, it also creates new challenges; the growing talent gap possibly being the most representative of them all. In order to effectively leverage Big Data, a new profession is emerging: the data scientist. Tasked with understanding the methodologies to process and analyze vast and complex data, this professional must possess knowledge in a broad spectrum of domains, including mathematics (calculus, linear algebra, statistics, probabilities and even possibly category theory), programming languages (Python and R being frequently cited), data processing and analysis expertise (profiling, parsing, cleansing, linking), machine learning techniques (supervised and unsupervised learning, dimensionality reduction, feature selection, etc.) and business domain knowledge. While it is conceivable to identify individuals that can achieve this breadth of knowledge with significant depth, it is unreasonable to expect this to be the norm, so these individuals fall usually far into the upper tail of the population distribution. To make things worse, the current toolsets available to the data scientist tend to be very involved and require considerable amounts of time to develop applications, reducing the overall effectiveness of these experts. The solution to this talent gap is certainly not to try and breed a new step up the evolutionary ladder that can cope with this vast knowledge, but to create radically different abstractions as part of the toolsets that data scientists use, to increase efficiency and reduce the scope of the basic knowledge required to build Big Data applications. During this presentation we will explore this challenge and provide a new perspective on more efficient toolsets for Big Data applications.

Flavio Villanustre