Database Systems for Advanced Applications

As data expands into big data, enhanced or entirely novel data mining algorithms often become necessary. The real value of big data is often only exposed when we can adequately mine and learn from it. We provide an overview of new scalable techniques for knowledge discovery. Our focus is on the areas of cloud data mining and machine learning, semi-supervised processing, and deep learning. We also give practical advice for choosing among different methods and discuss open research problems and concerns.

[1]  Yin Zhang,et al.  Secure distributed data-mining and its application to large-scale network measurements , 2006, CCRV.

[2]  Wolf-Tilo Balke,et al.  Pushing the Boundaries of Crowd-enabled Databases with Query-driven Schema Expansion , 2012, Proc. VLDB Endow..

[3]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[4]  Francesca Gino,et al.  The Microwork Solution: A New Approach to Outsourcing Can Support Economic Development—and Add to Your Bottom Line , 2012 .

[5]  Karl Aberer,et al.  An Evaluation of Aggregation Techniques in Crowdsourcing , 2013, WISE.

[6]  Wolf-Tilo Balke,et al.  Skyline queries in crowd-enabled databases , 2013, EDBT '13.

[7]  J. Linacre Understanding Rasch measurement: estimation methods for Rasch measures. , 1999, Journal of outcome measurement.

[8]  L. Guttman A basis for scaling qualitative data. , 1944 .

[9]  Mihaela Ulieru,et al.  The State of the Art in Trust and Reputation Systems: A Framework for Comparison , 2010, J. Theor. Appl. Electron. Commer. Res..

[10]  Florin Rusu,et al.  Statistical analysis of sketch estimators , 2007, SIGMOD '07.

[11]  Mark Dredze,et al.  Annotating Named Entities in Twitter Data with Crowdsourcing , 2010, Mturk@HLT-NAACL.

[12]  Aleksandar Ignjatovic,et al.  An Analytic Approach to Reputation Ranking of Participants in Online Transactions , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[13]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[14]  Wolf-Tilo Balke,et al.  Information Extraction Meets Crowdsourcing: A Promising Couple , 2012, Datenbank-Spektrum.

[15]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[16]  Johanna D. Moore,et al.  Twitter Sentiment Analysis: The Good the Bad and the OMG! , 2011, ICWSM.

[17]  Ben Carterette,et al.  An Analysis of Assessor Behavior in Crowdsourced Preference Judgments , 2010 .

[18]  W. Batchelder,et al.  Test theory without an answer key , 1988 .

[19]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[20]  Panagiotis G. Ipeirotis,et al.  Quality management on Amazon Mechanical Turk , 2010, HCOMP '10.

[21]  F. Lord Applications of Item Response Theory To Practical Testing Problems , 1980 .

[22]  Georg Rasch,et al.  Probabilistic Models for Some Intelligence and Attainment Tests , 1981, The SAGE Encyclopedia of Research Design.

[23]  Chris Callison-Burch,et al.  Fast, Cheap, and Creative: Evaluating Translation Quality Using Amazon’s Mechanical Turk , 2009, EMNLP.

[24]  C. Fox,et al.  Applying the Rasch Model: Fundamental Measurement in the Human Sciences , 2001 .

[25]  Gabriella Kazai,et al.  In Search of Quality in Crowdsourcing for Search Engine Evaluation , 2011, ECIR.

[26]  Peter Christen,et al.  A taxonomy of privacy-preserving record linkage techniques , 2013, Inf. Syst..

[27]  Siu-Ming Yiu,et al.  An Efficient Bloom Filter Based Solution for Multiparty Private Matching , 2006, Security and Management.

[28]  Wolf-Tilo Balke,et al.  Skill Ontology-Based Model for Quality Assurance in Crowdsourcing , 2014, DASFAA Workshops.

[29]  Robert P. W. Duin,et al.  Limits on the majority vote accuracy in classifier fusion , 2003, Pattern Analysis & Applications.

[30]  G Karabatsos A critique of Rasch residual fit statistics. , 2000, Journal of applied measurement.