论文信息 - FlashView: An Interactive Visual Explorer for Raw Data

FlashView: An Interactive Visual Explorer for Raw Data

New data has been generated in an unexpected high speed. To get insight of those data, data analysts will perform a thorough study using state-of-the-art big data analytical tools. Before the analysis starts, a preprocessing is conducted, where data analyst tends to issue a few ad-hoc queries on a new dataset to explore and gain a better understanding. However, it is costly to perform such ad-hoc queries on large scale data using traditional data management systems, e.g., DBMS, because data loading and indexing are very expensive. In this demo, we propose a novel visual data explorer system, FlashView, which omits the loading process by directly querying raw data. FlashView applies approximate query processing technique to achieve real-time query results. It builds both in-memory index and disk index to facilitate the data scanning. It also supports tracking and updating multiple queries concurrently. Note that FlashView is not designed as a replacement of full-fledged DBMS. Instead, it tries to help the analysts quickly understand the characteristics of data, so he/she can selectively load data into the DBMS to do more sophisticated analysis.

Lidan Shou | Ke Chen | Sai Wu | Gang Chen | Zhifei Pang

[1] Anastasia Ailamaki,et al. NoDB in Action: Adaptive Query Processing on Raw Data , 2012, Proc. VLDB Endow..

[2] Peter J. Haas,et al. Online Query Processing , 2001, SIGMOD Conference.

[3] Anastasia Ailamaki,et al. NoDB: efficient query execution on raw data files , 2012, Commun. ACM.

[4] Abraham Silberschatz,et al. HadoopDB in action: building real world applications , 2010, SIGMOD Conference.

[5] Martin L. Kersten,et al. Database Cracking , 2007, CIDR.

[6] Anastasia Ailamaki,et al. Adaptive Query Processing on RAW Data , 2014, Proc. VLDB Endow..

[7] Helen J. Wang,et al. Online aggregation , 1997, SIGMOD '97.

[8] Beng Chin Ooi,et al. Continuous sampling for online aggregation over multiple queries , 2010, SIGMOD Conference.

[9] Peter J. Haas,et al. Online query processing: a tutorial , 2001, SIGMOD '01.

[10] Zheng Shao,et al. Hive - a petabyte scale data warehouse using Hadoop , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).