Hybrid.Poly: Performance Evaluation of Linear Algebra Analytical Extensions

Anecdotal evidence suggests that Variety is one of the most challenging problems in Big data research [1]. Different data providers use different data models and formats to represent their data, which causes significant impediment to data scientists, whose goal is to make sense of all relevant data regardless of the source. Hybrid.Poly [2], [3] is the analytical polystore data management system designed to make all data accessible to the analyst, oblivious of the source differences.In this paper, we focus on the in-depth analysis and performance evaluation of the Linear Algebra extensions added to the Hybrid.Poly language.

[1]  Michael N. Gubanov,et al.  CognitiveDB: An Intelligent Navigator for Large-scale Dark Structured Data , 2017, WWW.

[2]  Shirish Tatikonda,et al.  SystemML: Declarative Machine Learning on Spark , 2016, Proc. VLDB Endow..

[3]  Lin Ma,et al.  Query-based Workload Forecasting for Self-Driving Database Management Systems , 2018, SIGMOD Conference.

[4]  Michael N. Gubanov,et al.  Model Management Engine for Data Integration with Reverse-Engineering Support , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[5]  Michael N. Gubanov PolyFuse: A Large-Scale Hybrid Data Fusion System , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[6]  Hidekazu Oiwa,et al.  Scalable Semantic Querying of Text , 2018, Proc. VLDB Endow..

[7]  Thomas Heinis,et al.  QUASII: QUery-Aware Spatial Incremental Index , 2018, EDBT.

[8]  Michael Stonebraker,et al.  Large-scale Semantic Profile Extraction , 2014, EDBT.

[9]  Carlo Curino,et al.  Towards Geo-Distributed Machine Learning , 2017, IEEE Data Eng. Bull..

[10]  Michael N. Gubanov,et al.  MEDREADFAST: A structural information retrieval engine for big clinical text , 2012, 2012 IEEE 13th International Conference on Information Reuse & Integration (IRI).

[11]  Anastasia Ailamaki,et al.  Interactive Visual Exploration of Spatio-Temporal Urban Data Sets using Urbane , 2018, SIGMOD Conference.

[12]  Michael N. Gubanov,et al.  Hybrid.poly: An Interactive Large-Scale In-memory Analytical Polystore , 2017, 2017 IEEE International Conference on Data Mining Workshops (ICDMW).

[13]  Linda G. Shapiro,et al.  ReadFast: Structural Information Retrieval from Biomedical Big Text by Natural Language Processing , 2013 .

[14]  Michael Stonebraker,et al.  Text and structured data fusion in data tamer at scale , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[15]  Michael N. Gubanov,et al.  Scalable Linear Algebra on a Relational Database System , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[16]  Linda G. Shapiro,et al.  Using unified famous objects (UFO) to automate Alzheimer's disease diagnostics , 2011, 2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW).

[17]  Shirish Tatikonda,et al.  SPOOF: Sum-Product Optimization and Operator Fusion for Large-Scale Machine Learning , 2017, CIDR.

[18]  Michael N. Gubanov,et al.  Hybrid.JSON: High-velocity parallel in-memory polystore JSON ingest , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[19]  Hiren Patel,et al.  Computation Reuse in Analytics Job Service at Microsoft , 2018, SIGMOD Conference.

[20]  Michael N. Gubanov,et al.  Hybrid.media: High velocity video ingestion in an in-memory scalable analytical polystore , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[21]  Michael N. Gubanov,et al.  Type-aware Web-search , 2016, EDBT.

[22]  Maksim Podkorytov,et al.  IntelliLIGHT: A Flashlight for Large-scale Dark Structured Data , 2017 .

[23]  Michael N. Gubanov,et al.  Hybrid.AI: A Learning Search Engine for Large-scale Structured Data , 2018, WWW.