Managing and Mining Uncertain Data

Managing and Mining Uncertain Data, a survey with chapters by a variety of well known researchers in the data mining field, presents the most recent models, algorithms, and applications in the uncertain data mining field in a structured and concise way. This book is organized to make it more accessible to applications-driven practitioners for solving real problems. Also, given the lack of structurally organized information on this topic, Managing and Mining Uncertain Data provides insights which are not easily accessible elsewhere. Managing and Mining Uncertain Data is designed for a professional audience composed of researchers and practitioners in industry. This book is also suitable as a reference book for advanced-level students in computer science and engineering, as well as the ACM, IEEE, SIAM, INFORMS and AAAI Society groups.

[1]  Hilary Putnam,et al.  A Computing Procedure for Quantification Theory , 1960, JACM.

[2]  N. Meyers,et al.  H = W. , 1964, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Lotfi A. Zadeh,et al.  Fuzzy Sets , 1996, Inf. Control..

[4]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[5]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[6]  R. P. Dilworth,et al.  Algebraic theory of lattices , 1973 .

[7]  Ashok K. Chandra,et al.  Optimal implementation of conjunctive queries in relational data bases , 1977, STOC '77.

[8]  Raymond H. Myers,et al.  Probability and Statistics for Engineers and Scientists. , 1973 .

[9]  Don S. Batory,et al.  On searching transposed files , 1978, ACM Trans. Database Syst..

[10]  Mihalis Yannakakis,et al.  Equivalences Among Relational Expressions with the Union and Difference Operators , 1980, J. ACM.

[11]  Catriel Beeri,et al.  On the Desirability of Acyclic Database Schemes , 1983, JACM.

[12]  Richard M. Karp,et al.  Monte-Carlo algorithms for enumeration and reliability problems , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[13]  Tomasz Imielinski,et al.  Incomplete Information in Relational Databases , 1984, JACM.

[14]  Stefan Arnborg,et al.  Efficient algorithms for combinatorial problems on graphs with bounded decomposability — A survey , 1985, BIT.

[15]  Jack A. Orenstein Spatial query processing in an object-oriented database system , 1986, SIGMOD '86.

[16]  C. Batini,et al.  A comparative analysis of methodologies for database schema integration , 1986, CSUR.

[17]  Raymond Reiter,et al.  A sound and sometimes complete query evaluation algorithm for relational databases with null values , 1986, JACM.

[18]  Moshe Y. Vardi Querying Logical Databases , 1986, J. Comput. Syst. Sci..

[19]  Richard Hull Relative Information Capacity of Simple Relational Database Schemata , 1986, SIAM J. Comput..

[20]  Serge Abiteboul,et al.  On the representation and querying of sets of possible worlds , 1987, SIGMOD '87.

[21]  Michael Pittarelli,et al.  The Theory of Probabilistic Databases , 1987, VLDB.

[22]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[23]  Richard M. Karp,et al.  Monte-Carlo Approximation Algorithms for Enumeration Problems , 1989, J. Algorithms.

[24]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[25]  Amihai Motro,et al.  Accommodating imprecision in database systems: issues and solutions , 1990, SGMD.

[26]  R. Durrett Probability: Theory and Examples , 1993 .

[27]  Hector Garcia-Molina,et al.  A Probalilistic Relational Data Model , 1990, EDBT.

[28]  Fereidoon Sadri,et al.  Modeling uncertainty in databases , 1991, [1991] Proceedings. Seventh International Conference on Data Engineering.

[29]  Tomasz Imielinski,et al.  Incomplete object—a data model for design and planning applications , 1991, SIGMOD '91.

[30]  Luc Vincent,et al.  Watersheds in Digital Spaces: An Efficient Algorithm Based on Immersion Simulations , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Serge Abiteboul,et al.  On the Representation and Querying of Sets of Possible Worlds , 1991, Theor. Comput. Sci..

[32]  Anthony Kosky,et al.  Theoretical Aspects of Schema Merging , 1992, EDBT.

[33]  Hector Garcia-Molina,et al.  The Management of Probabilistic Data , 1992, IEEE Trans. Knowl. Data Eng..

[34]  L. Libkin,et al.  Semantic representations and query languages for or-sets , 1993, PODS '93.

[35]  Renée J. Miller,et al.  The Use of Information Capacity in Schema Integration and Translation , 1993, VLDB.

[36]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[37]  Laks V. S. Lakshmanan,et al.  Probabilistic Deductive Databases , 1994, ILPS.

[38]  Nevin L. Zhang,et al.  A simple approach to Bayesian network computations , 1994 .

[39]  Raghu Ramakrishnan,et al.  Containment of conjunctive queries: beyond relations as sets , 1995, TODS.

[40]  Leonid Libkin,et al.  Aspects of partial information in databases , 1995 .

[41]  Alain Pirotte,et al.  Imperfect Information in Relational Databases , 1996, Uncertainty Management in Information Systems.

[42]  David J. DeWitt,et al.  Partition based spatial-merge join , 1996, SIGMOD '96.

[43]  Sumit Sarkar,et al.  A probabilistic relational model and algebra , 1996, TODS.

[44]  Ming-Ling Lo,et al.  Spatial hash-joins , 1996, SIGMOD '96.

[45]  Laks V. S. Lakshmanan,et al.  ProbView: a flexible probabilistic database system , 1997, TODS.

[46]  Norbert Fuhr,et al.  A probabilistic relational algebra for the integration of information retrieval and database systems , 1997, TOIS.

[47]  Shashi Shekhar,et al.  Optimizing join index based join processing: a graph partitioning approach , 1998, Proceedings Seventeenth IEEE Symposium on Reliable Distributed Systems (Cat. No.98CB36281).

[48]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[49]  Ron van der Meyden,et al.  Logical Approaches to Incomplete Information: A Survey , 1998, Logics for Databases and Information Systems.

[50]  Yuri Gurevich,et al.  The complexity of query reliability , 1998, PODS.

[51]  Michael I. Jordan,et al.  Loopy Belief Propagation for Approximate Inference: An Empirical Study , 1999, UAI.

[52]  Dieter Pfoser,et al.  Capturing the Uncertainty of Moving-Object Representations , 1999, SSD.

[53]  Lise Getoor,et al.  Learning Probabilistic Relational Models , 1999, IJCAI.

[54]  Michel Scholl,et al.  A Performance Evaluation of Spatial Join Processing Strategies , 1999, SSD.

[55]  Avi Pfeffer,et al.  SPOOK: A system for probabilistic object-oriented knowledge representation , 1999, UAI.

[56]  Lise Getoor,et al.  Learning Probabilistic Relational Models with Structural Uncertainty , 2000 .

[57]  Jennifer Widom,et al.  Tracing the lineage of view data in a warehousing environment , 2000, TODS.

[58]  Richard M. Karp,et al.  An Optimal Algorithm for Monte Carlo Estimation , 2000, SIAM J. Comput..

[59]  Thomas Lukasiewicz,et al.  Probabilistic object bases , 2001, TODS.

[60]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[61]  Jennifer Widom,et al.  Lineage tracing in data warehouses , 2001 .

[62]  Rajeev Rastogi,et al.  Independence is good: dependency-based histogram synopses for high-dimensional data , 2001, SIGMOD '01.

[63]  Surajit Chaudhuri,et al.  DBXplorer: a system for keyword-based search over relational databases , 2002, Proceedings 18th International Conference on Data Engineering.

[64]  Vagelis Hristidis,et al.  DISCOVER: Keyword Search in Relational Databases , 2002, VLDB.

[65]  Neil J. Gordon,et al.  A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking , 2002, IEEE Trans. Signal Process..

[66]  Pedro M. Domingos,et al.  Learning to map between ontologies on the semantic web , 2002, WWW '02.

[67]  Wenzhi Sun,et al.  Large‐scale morphological survey of mouse retinal ganglion cells , 2002, The Journal of comparative neurology.

[68]  Creating a Mediated Schema Based on Initial Correspondences , 2002, IEEE Data Eng. Bull..

[69]  Sunil Prabhakar,et al.  Evaluating probabilistic queries over imprecise data , 2003, SIGMOD '03.

[70]  Joseph Y. Halpern Reasoning about uncertainty , 2003 .

[71]  Vijay V. Vazirani,et al.  Approximation Algorithms , 2001, Springer Berlin Heidelberg.

[72]  Kevin Chen-Chuan Chang,et al.  Statistical schema matching across web query interfaces , 2003, SIGMOD '03.

[73]  David Poole,et al.  First-order probabilistic inference , 2003, IJCAI.

[74]  Jan Chomicki,et al.  Answer sets for consistent query answering in inconsistent databases , 2002, Theory and Practice of Logic Programming.

[75]  Jeffrey F. Naughton,et al.  On schema matching with opaque column names and data values , 2003, SIGMOD '03.

[76]  Pedro M. Domingos Multi-Relational Record Linkage , 2003 .

[77]  Jeffrey Scott Vitter,et al.  Efficient Indexing Methods for Probabilistic Threshold Queries over Uncertain Data , 2004, VLDB.

[78]  Wei Hong,et al.  Model-Driven Data Acquisition in Sensor Networks , 2004, VLDB.

[79]  Thomas Lukasiewicz,et al.  A data model and algebra for probabilistic complex values , 2001, Annals of Mathematics and Artificial Intelligence.

[80]  Wei-Ying Ma,et al.  Instance-based Schema Matching for Web Databases by Domain-specific Query Probing , 2004, VLDB.

[81]  Pedro M. Domingos,et al.  iMAP: discovering complex semantic matches between database schemas , 2004, SIGMOD '04.

[82]  Matteo Magnani,et al.  Schema Integration Based on Uncertain Semantic Mappings , 2005, ER.

[83]  Christopher Ré,et al.  MYSTIQ: a system for finding more answers by using probabilities , 2005, SIGMOD '05.

[84]  Hans-Peter Kriegel,et al.  Hierarchical density-based clustering of uncertain data , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[85]  Avigdor Gal,et al.  Automatic Ontology Matching Using Application Semantics , 2005, AI Mag..

[86]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[87]  Jennifer Widom,et al.  Trio: A System for Integrated Management of Data, Accuracy, and Lineage , 2004, CIDR.

[88]  Yufei Tao,et al.  Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions , 2005, VLDB.

[89]  Hans-Peter Kriegel,et al.  Density-based clustering of uncertain data , 2005, KDD '05.

[90]  Umberto Straccia,et al.  Information retrieval and machine learning for probabilistic schema matching , 2005, CIKM '05.

[91]  Wang Chiew Tan,et al.  Debugging schema mappings with routes , 2006, VLDB.

[92]  Jennifer Widom,et al.  An Introduction to ULDBs and the Trio System , 2006, IEEE Data Eng. Bull..

[93]  Ambuj K. Singh,et al.  Probabilistic Segmentation and Analysis of Horizontal Cells , 2006, Sixth International Conference on Data Mining (ICDM'06).

[94]  David Maier,et al.  Principles of dataspace systems , 2006, PODS '06.

[95]  Joann J. Ordille,et al.  Data integration: the teenage years , 2006, VLDB.

[96]  Hans-Peter Kriegel,et al.  Probabilistic Similarity Join on Uncertain Data , 2006, DASFAA.

[97]  Rahul Gupta,et al.  Creating probabilistic databases from information extraction models , 2006, VLDB.

[98]  Jennifer Widom,et al.  Working Models for Uncertain Data , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[99]  Jennifer Widom,et al.  ULDBs: databases with uncertainty and lineage , 2006, VLDB.

[100]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[101]  Parag Agrawal,et al.  Trio: a system for data, uncertainty, and lineage , 2006, VLDB.

[102]  Reynold Cheng,et al.  Efficient Clustering of Uncertain Data , 2006, Sixth International Conference on Data Mining (ICDM'06).

[103]  Sunita Sarawagi,et al.  Efficient inference on sequence segmentation models , 2006, ICML.

[104]  Anastasia Ailamaki,et al.  Challenges inbuilding a DBMS Resource Advisor , 2006, IEEE Data Eng. Bull..

[105]  D. Montesi,et al.  Uncertainty in data integration: current approaches and open problems , 2007, MUD.

[106]  Susanne E. Hambrusch,et al.  Indexing Uncertain Categorical Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[107]  Sunita Sarawagi,et al.  Probabilistic Graphical Models and their Role in Databases , 2007, VLDB.

[108]  Charu C. Aggarwal,et al.  On Density Based Transforms for Uncertain Data Mining , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[109]  Prithviraj Sen,et al.  Representing and Querying Correlated Tuples in Probabilistic Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[110]  Jayant Madhavan,et al.  Web-Scale Data Integration: You can afford to Pay as You Go , 2007, CIDR.

[111]  Raghu Ramakrishnan,et al.  Optimizing mpf queries: decision support and probabilistic inference , 2007, SIGMOD '07.

[112]  Dan Olteanu,et al.  Query language support for incomplete information in the MayBMS system , 2007, VLDB.

[113]  Dan Olteanu,et al.  10106 Worlds and Beyond: Efficient Representation and Processing of Incomplete Information , 2007, ICDE.

[114]  Edward Hung,et al.  Mining Frequent Itemsets from Uncertain Data , 2007, PAKDD.

[115]  Ambuj K. Singh,et al.  APLA: Indexing Arbitrary Probability Distributions , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[116]  Wenfei Fan,et al.  Conditional Functional Dependencies for Data Cleaning , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[117]  Christopher Ré,et al.  Efficient Top-k Query Evaluation on Probabilistic Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[118]  Hans-Arno Jacobsen,et al.  Evaluating Proximity Relations Under Uncertainty , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[119]  Parag Agrawal,et al.  Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS (Demo) , 2007, CIDR.

[120]  Dan Olteanu,et al.  MayBMS: Managing Incomplete Information with Probabilistic World-Set Decompositions , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[121]  Mohamed A. Soliman,et al.  Top-k Query Processing in Uncertain Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[122]  Hans-Peter Kriegel,et al.  Probabilistic Nearest-Neighbor Query on Uncertain Objects , 2007, DASFAA.

[123]  Dan Olteanu,et al.  From complete to incomplete information and back , 2007, SIGMOD '07.

[124]  Feifei Li,et al.  Efficient Processing of Top-k Queries in Uncertain Databases with x-Relations , 2008, IEEE Trans. Knowl. Data Eng..

[125]  Christoph Koch,et al.  Approximating predicates and expressive queries on probabilistic databases , 2008, PODS.

[126]  Alon Y. Halevy,et al.  Pay-as-you-go user feedback for dataspace systems , 2008, SIGMOD Conference.

[127]  Alon Y. Halevy,et al.  Bootstrapping pay-as-you-go data integration systems , 2008, SIGMOD Conference.

[128]  Christoph Koch,et al.  On APIs for probabilistic databases , 2008, QDB/MUD.

[129]  Ambuj K. Singh,et al.  Top-k Spatial Joins of Probabilistic Objects , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[130]  Christopher Ré,et al.  Approximate lineage for probabilistic databases , 2008, Proc. VLDB Endow..

[131]  Ben Kao,et al.  A Decremental Approach for Mining Frequent Itemsets from Uncertain Data , 2008, PAKDD.

[132]  Philip S. Yu,et al.  A Framework for Clustering Uncertain Data Streams , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[133]  Parag Agrawal,et al.  Towards Special-Purpose Indexes and Statistics for Uncertain Data , 2008, QDB/MUD.

[134]  James Cheney,et al.  Curated databases , 2008, PODS.

[135]  Jayant Madhavan,et al.  Google's Deep Web crawl , 2008, Proc. VLDB Endow..

[136]  Dan Olteanu,et al.  Fast and Simple Relational Processing of Uncertain Data , 2007, 2008 IEEE 24th International Conference on Data Engineering.

[137]  Dan Olteanu,et al.  Conditioning probabilistic databases , 2008, Proc. VLDB Endow..

[138]  Lise Getoor,et al.  Exploiting shared correlations in probabilistic databases , 2008, Proc. VLDB Endow..

[139]  Alon Y. Halevy,et al.  Data integration with uncertainty , 2007, The VLDB Journal.

[140]  Charu C. Aggarwal On Unifying Privacy and Uncertain Data Models , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[141]  Graham Cormode,et al.  Approximation algorithms for clustering uncertain data , 2008, PODS.

[142]  Amol Deshpande,et al.  Online Filtering, Smoothing and Probabilistic Modeling of Streaming data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[143]  Xiang Lian,et al.  Probabilistic ranked queries in uncertain databases , 2008, EDBT '08.

[144]  Christoph Koch,et al.  World-set decompositions: Expressiveness and efficient algorithms , 2007, Theor. Comput. Sci..

[145]  Christoph Koch,et al.  A compositional query algebra for second-order logic and uncertain databases , 2008, ICDT '09.

[146]  Charu C. Aggarwal,et al.  Frequent pattern mining with uncertain data , 2009, KDD.

[147]  Philip S. Yu,et al.  A Survey of Uncertain Data Algorithms and Applications , 2009, IEEE Transactions on Knowledge and Data Engineering.

[148]  Christoph Koch,et al.  A compositional framework for complex queries over uncertain data , 2009, ICDT '09.

[149]  Jennifer Widom,et al.  Making Aggregation Work in Uncertain and Probabilistic Databases , 2011, IEEE Transactions on Knowledge and Data Engineering.