Big Data Analytics

Efforts to derive maximum value from data have led to an expectation that this is “just the cost of living in the modern world.” Ultimately this form of data exploitation will not be sustainable either due to customer dissatisfaction or government intervention to ensure private information is treated with the same level of protection that we currently find in paper-based systems. Legal, technical, and moral boundaries need to be placed on how personal information is used and how it can be combined to create inferences that are often highly accurate but not guaranteed to be correct. Agrawal’s initial call-to-arms in 2002 has generated a large volume of work but the analytics and privacy communities are not truly communicating with the goal of providing high utility from the data collected but in such a way that it does not violate the intended purpose for which it was initially collected [2]. This paper describes the current state of the art and makes a call to open a true dialog between these two communities. Ultimately, this may be the only way current analytics will be allowed to continue without severe government intervention and/or without severe actions on behalf of the people from whom the data is being collected and analyzed by either refusing to work with exploitative corporations or litigation to address the harms arising from the current practices.

[1]  P. Krishna Reddy,et al.  An Efficient Approach to Mine Periodic-Frequent Patterns in Transactional Databases , 2011, PAKDD Workshops.

[2]  Masaru Kitsuregawa,et al.  Discovering Recurring Patterns in Time Series , 2015, EDBT.

[3]  Ayellet V. Segrè,et al.  Common Inherited Variation in Mitochondrial Genes Is Not Enriched for Associations with Type 2 Diabetes or Related Glycemic Traits , 2010, PLoS genetics.

[4]  Philip S. Yu,et al.  Mining Asynchronous Periodic Patterns in Time Series Data , 2003, IEEE Trans. Knowl. Data Eng..

[5]  Heng Li,et al.  Tabix: fast retrieval of sequence features from generic TAB-delimited files , 2011, Bioinform..

[6]  Heather J. Ruskin,et al.  Cross-Platform Microarray Data Normalisation for Regulatory Network Inference , 2010, PloS one.

[7]  Lalit Kumar,et al.  An efficient map-reduce algorithm for computing formal concepts from binary data , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[8]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[9]  P. Krishna Reddy,et al.  An Alternative Interestingness Measure for Mining Periodic-Frequent Patterns , 2011, DASFAA.

[10]  Philippe Lenca,et al.  Mining Top-K Periodic-Frequent Pattern from Transactional Databases without Support Threshold , 2009, IAIT.

[11]  Jiawei Han,et al.  Efficient mining of partial periodic patterns in time series database , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[12]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[13]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[14]  Vilém Vychodil,et al.  Distributed Algorithm for Computing Formal Concepts Using Map-Reduce Framework , 2009, IDA.

[15]  Masaru Kitsuregawa,et al.  Novel Techniques to Reduce Search Space in Periodic-Frequent Pattern Mining , 2014, DASFAA.

[16]  Walid G. Aref,et al.  Incremental, online, and merge mining of partial periodic patterns in time-series databases , 2004, IEEE Transactions on Knowledge and Data Engineering.

[17]  Shih-Sheng Chen,et al.  New and efficient knowledge discovery of partial periodic patterns with multiple minimum supports , 2011, J. Syst. Softw..

[18]  M. Schatz,et al.  Big Data: Astronomical or Genomical? , 2015, PLoS biology.

[19]  Philip S. Yu,et al.  InfoMiner+: mining partial periodic patterns with gap penalties , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[20]  Ronan M. T. Fleming,et al.  A community-driven global reconstruction of human metabolism , 2013, Nature Biotechnology.

[21]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[22]  Masaru Kitsuregawa,et al.  Discovering Quasi-Periodic-Frequent Patterns in Transactional Databases , 2013, BDA.

[23]  D. Barh,et al.  XomAnnotate: Analysis of Heterogeneous and Complex Exome- A Step towards Translational Medicine , 2015, PloS one.

[24]  Nikos Mamoulis,et al.  Discovering Partial Periodic Patterns in Discrete Data Sequences , 2004, PAKDD.

[25]  Wynne Hsu,et al.  Mining association rules with multiple minimum supports , 1999, KDD '99.

[26]  Young-Koo Lee,et al.  Discovering Periodic-Frequent Patterns in Transactional Databases , 2009, PAKDD.

[27]  M. Ritchie,et al.  Methods of integrating data to uncover genotype–phenotype interactions , 2015, Nature Reviews Genetics.

[28]  Walid G. Aref,et al.  On the Discovery of Weak Periodicities in Large Time Series , 2002, PKDD.