Dealing with Uncertainty: A Survey of Theories and Practices

Uncertainty accompanies our life processes and covers almost all fields of scientific studies. Two general categories of uncertainty, namely, aleatory uncertainty and epistemic uncertainty, exist in the world. While aleatory uncertainty refers to the inherent randomness in nature, derived from natural variability of the physical world (e.g., random show of a flipped coin), epistemic uncertainty origins from human's lack of knowledge of the physical world, as well as ability of measuring and modeling the physical world (e.g., computation of the distance between two cities). Different kinds of uncertainty call for different handling methods. Aggarwal, Yu, Sarma, and Zhang et al. have made good surveys on uncertain database management based on the probability theory. This paper reviews multidisciplinary uncertainty processing activities in diverse fields. Beyond the dominant probability theory and fuzzy theory, we also review information-gap theory and recently derived uncertainty theory. Practices of these uncertainty handling theories in the domains of economics, engineering, ecology, and information sciences are also described. It is our hope that this study could provide insights to the database community on how uncertainty is managed in other disciplines, and further challenge and inspire database researchers to develop more advanced data management techniques and tools to cope with a variety of uncertainty issues in the real world.

[1]  Robert Ries,et al.  Characterizing, Propagating, and Analyzing Uncertainty in Life‐Cycle Assessment: A Survey of Quantitative Approaches , 2007 .

[2]  Amihai Motro,et al.  Management of uncertainty in database systems , 1995 .

[3]  Mohamed A. Ismail,et al.  Fuzzy query processing using clustering techniques , 1990, Inf. Process. Manag..

[4]  Chi-Yin Chow,et al.  Probabilistic Verifiers: Evaluating Constrained Nearest-Neighbor Queries over Uncertain Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[5]  Bryan Beresford-Smith,et al.  Managing credit risk with info‐gap uncertainty , 2007 .

[6]  Peter J. Haas,et al.  MCDB-R , 2010, Proc. VLDB Endow..

[7]  J M Alho,et al.  Stochastic methods in population forecasting. , 1990, International journal of forecasting.

[8]  Feifei Li,et al.  Efficient Processing of Top-k Queries in Uncertain Databases with x-Relations , 2008, IEEE Transactions on Knowledge and Data Engineering.

[9]  Wolfgang Lutz,et al.  Frontiers of Population Forecasting , 1999 .

[10]  Subramanian Arumugam,et al.  Evaluation of probabilistic threshold queries in MCDB , 2010, SIGMOD Conference.

[11]  Yakov Ben-Haim Info-Gap Economics , 2010 .

[12]  Henri Prade,et al.  Fuzzy relational databases: Representational issues and reduction using similarity measures , 1987, J. Am. Soc. Inf. Sci..

[13]  Dan Olteanu,et al.  MayBMS: a probabilistic database management system , 2009, SIGMOD Conference.

[14]  Christophe Schinckus,et al.  Economic uncertainty and econophysics , 2009 .

[15]  Tim Kraska,et al.  CrowdDB: answering queries with crowdsourcing , 2011, SIGMOD '11.

[16]  T. Koopmans Three Essays on the State of Economic Science , 1958 .

[17]  Zongmin Ma,et al.  A Literature Overview of Fuzzy Database Models , 2008, J. Inf. Sci. Eng..

[18]  Etienne Kerre,et al.  An Overview of Fuzzy Data Models , 1995 .

[19]  Suk Kyoon Lee,et al.  Imprecise and uncertain information in databases: an evidential approach , 1992, [1992] Eighth International Conference on Data Engineering.

[20]  Peter Walley,et al.  Measures of Uncertainty in Expert Systems , 1996, Artif. Intell..

[21]  B. Fischhoff,et al.  Assessing uncertainty in physical constants , 1986 .

[22]  Jennifer Widom,et al.  Working Models for Uncertain Data , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[23]  L. Zadeh Fuzzy sets as a basis for a theory of possibility , 1999 .

[24]  Jian Li,et al.  Ranking continuous probabilistic datasets , 2010, Proc. VLDB Endow..

[25]  Kenneth Lange,et al.  Numerical analysis for statisticians , 1999 .

[26]  Jian Pei,et al.  Query answering techniques on uncertain and probabilistic data: tutorial summary , 2008, SIGMOD Conference.

[27]  Jian Li,et al.  A unified approach to ranking in probabilistic databases , 2009, The VLDB Journal.

[28]  Henri Prade,et al.  Generalizing Database Relational Algebra for the Treatment of Incomplete/Uncertain Information and Vague Queries , 1984, Inf. Sci..

[29]  Jian Pei,et al.  Ranking queries on uncertain data: a probabilistic threshold approach , 2008, SIGMOD Conference.

[30]  Elisabeth Paté-Cornell An Introduction to Probabilistic Risk Analysis for Engineered Systems , 2011 .

[31]  David Baccarini,et al.  The Logical Framework Method for Defining Project Success , 1999 .

[32]  Mohamed A. Soliman,et al.  Top-k Query Processing in Uncertain Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[33]  Adnan Yazici,et al.  A survey of conceptual and logical data models for uncertainty management , 1992 .

[34]  Peter Walley,et al.  Coherent Upper And Lower Previsions , 1998 .

[35]  Raymond R. Tan,et al.  Using fuzzy numbers to propagate uncertainty in matrix-based LCI , 2008 .

[36]  Christopher Ré,et al.  Queries and materialized views on probabilistic databases , 2011, J. Comput. Syst. Sci..

[37]  F. Knight The economic nature of the firm: From Risk, Uncertainty, and Profit , 2009 .

[38]  Ming-Lung Hung,et al.  Quantifying system uncertainty of life cycle assessment based on Monte Carlo simulation , 2008 .

[39]  Rob Miller,et al.  Crowdsourced Databases: Query Processing with People , 2011, CIDR.

[40]  Xiang Lian,et al.  Probabilistic Inverse Ranking Queries over Uncertain Data , 2009, DASFAA.

[41]  Christopher Ré,et al.  Understanding cardinality estimation using entropy maximization , 2012, ACM Trans. Database Syst..

[42]  F. O. Hoffman,et al.  Propagation of uncertainty in risk assessments: the need to distinguish between uncertainty due to lack of knowledge and uncertainty due to variability. , 1994, Risk analysis : an official publication of the Society for Risk Analysis.

[43]  Jon C. Helton,et al.  Treatment of Uncertainty in Performance Assessments for Complex Systems , 1994 .

[44]  Leo Egghe,et al.  Uncertainty and information: Foundations of generalized information theory , 2007, J. Assoc. Inf. Sci. Technol..

[45]  H. Chernoff A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations , 1952 .

[46]  Jian Pei,et al.  Continuously monitoring top-k uncertain data streams: a probabilistic threshold method , 2009, Distributed and Parallel Databases.

[47]  Ramana V. Grandhi,et al.  Comparison of evidence theory and Bayesian theory for uncertainty modeling , 2004, Reliab. Eng. Syst. Saf..

[48]  Lise Getoor,et al.  Exploiting shared correlations in probabilistic databases , 2008, Proc. VLDB Endow..

[49]  Patrick Bosc,et al.  SQLf: a relational database language for fuzzy querying , 1995, IEEE Trans. Fuzzy Syst..

[50]  Lise Getoor,et al.  PrDB: managing and exploiting rich correlations in probabilistic databases , 2009, The VLDB Journal.

[51]  Ihab F. Ilyas,et al.  A survey of top-k query processing techniques in relational database systems , 2008, CSUR.

[52]  Amol Deshpande,et al.  Ef?cient Query Evaluation over Temporally Correlated Probabilistic Streams , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[53]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[54]  S Greenland,et al.  Sensitivity Analysis, Monte Carlo Risk Analysis, and Bayesian Uncertainty Assessment , 2001, Risk analysis : an official publication of the Society for Risk Analysis.

[55]  Tim Kraska,et al.  Crowdsourcing Applications and Platforms: A Data Management Perspective , 2011, Proc. VLDB Endow..

[56]  Xuemin Lin,et al.  Efficient rank based KNN query processing over uncertain data , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[57]  J. Baker,et al.  TREATMENT OF UNCERTAINTIES IN LIFE CYCLE ASSESSMENT , 2009 .

[58]  Daniel Kahneman,et al.  Judgment under uncertainty: Variants of uncertainty , 1982 .

[59]  Arie Tzvieli Possibility theory: An approach to computerized processing of uncertainty , 1990, J. Am. Soc. Inf. Sci..

[60]  Tomasz Imielinski,et al.  Incomplete Information in Relational Databases , 1984, JACM.

[61]  Jeffrey Scott Vitter,et al.  Efficient join processing over uncertain data , 2006, CIKM '06.

[62]  Christian Böhm,et al.  ProVeR: Probabilistic Video Retrieval using the Gauss-Tree , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[63]  Serge Abiteboul,et al.  On the representation and querying of sets of possible worlds , 1987, SIGMOD '87.

[64]  Yufei Tao,et al.  Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions , 2005, VLDB.

[65]  Feifei Li,et al.  Efficient Processing of Top-k Queries in Uncertain Databases with x-Relations , 2008, IEEE Trans. Knowl. Data Eng..

[66]  Zongmin Ma,et al.  A Literature Overview of Fuzzy Conceptual Data Modeling , 2010, J. Inf. Sci. Eng..

[67]  Norbert Fuhr,et al.  A probabilistic relational algebra for the integration of information retrieval and database systems , 1997, TOIS.

[68]  Sihem Amer-Yahia,et al.  Challenges in Searching Online Communities , 2007, IEEE Data Eng. Bull..

[69]  Hans-Peter Kriegel,et al.  A novel probabilistic pruning approach to speed up similarity queries in uncertain databases , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[70]  M. Huijbregts,et al.  Evaluating uncertainty in environmental life-cycle assessment. A case study comparing two insulation options for a Dutch one-family dwelling. , 2003, Environmental science & technology.

[71]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB Journal.

[72]  Feifei Li,et al.  Semantics of Ranking Queries for Probabilistic Data and Expected Ranks , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[73]  A. N. Kolmogorov,et al.  Foundations of the theory of probability , 1960 .

[74]  Christopher Ré,et al.  Event queries on correlated probabilistic streams , 2008, SIGMOD Conference.

[75]  Wolfgang Lutz,et al.  Introduction: How to Deal with Uncertainty in Population Forecasting? , 2004 .

[76]  Reynold Cheng,et al.  Evaluating probability threshold k-nearest-neighbor queries over uncertain data , 2009, EDBT '09.

[77]  Douglas W. Hubbard,et al.  How to Measure Anything: Finding the Value of "Intangibles" in Business , 2007 .

[78]  Motohide Umano,et al.  Fuzzy relational algebra for possibility-distribution-fuzzy-relational model of fuzzy data , 1994, Journal of Intelligent Information Systems.

[79]  Arthur P. Dempster,et al.  Upper and Lower Probabilities Induced by a Multivalued Mapping , 1967, Classic Works of the Dempster-Shafer Theory of Belief Functions.

[80]  Jennifer Widom,et al.  Human-assisted graph search: it's okay to ask questions , 2011, Proc. VLDB Endow..

[81]  Amol Deshpande,et al.  Lineage processing over correlated probabilistic databases , 2010, SIGMOD Conference.

[82]  Spyros Makridakis,et al.  Forecasting and uncertainty in the economic and business world , 2009 .

[83]  Ambuj K. Singh,et al.  APLA: Indexing Arbitrary Probability Distributions , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[84]  Dan Suciu,et al.  Computing query probability with incidence algebras , 2010, PODS '10.

[85]  Liz Eckermann,et al.  Uncertainty and risk : multidisciplinary perspectives , 2011 .

[86]  Christopher Ré,et al.  Efficient Top-k Query Evaluation on Probabilistic Data , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[87]  Norbert Fuhr,et al.  A Probabilistic Framework for Vague Queries and Imprecise Information in Databases , 1990, VLDB.

[88]  F. T. Dweiri,et al.  Using fuzzy decision making for the evaluation of the project management internal efficiency , 2006, Decis. Support Syst..

[89]  Prithviraj Sen,et al.  Representing and Querying Correlated Tuples in Probabilistic Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[90]  René von Schomberg,et al.  Controversies and Political Decision Making , 1993 .

[91]  Philippe Smets,et al.  BELIEF FUNCTIONS AND THE TRANSFERABLE BELIEF MODEL , 2000 .

[92]  A. Tversky,et al.  Variants of uncertainty , 1982, Cognition.

[93]  Pankaj K. Agarwal,et al.  Nearest-neighbor searching under uncertainty , 2012, PODS.

[94]  F. E. A Relational Model of Data Large Shared Data Banks , 2000 .

[95]  A. Rollett,et al.  The Monte Carlo Method , 2004 .

[96]  Jennifer Widom,et al.  Making Aggregation Work in Uncertain and Probabilistic Databases , 2011, IEEE Transactions on Knowledge and Data Engineering.

[97]  P.H.A.J.M. van Gelder,et al.  Statistical methods for the risk-based design of civil structures , 2000 .

[98]  M. Elisabeth Paté-Cornell,et al.  Uncertainties in risk analysis: Six levels of treatment , 1996 .

[99]  Lotfi A. Zadeh,et al.  Fuzzy Logic , 2009, Encyclopedia of Complexity and Systems Science.

[100]  R. Yarwood,et al.  Beyond Six Billion: Forecasting the World's Population , 2003 .

[101]  Heather Booth,et al.  Demographic forecasting: 1980 to 2005 in review , 2006 .

[102]  Sunil Prabhakar,et al.  Evaluating probabilistic queries over imprecise data , 2003, SIGMOD '03.

[103]  Dan Suciu,et al.  Management of probabilistic data: foundations and challenges , 2007, PODS '07.

[104]  Baoding Liu,et al.  Uncertainty Theory - A Branch of Mathematics for Modeling Human Uncertainty , 2011, Studies in Computational Intelligence.

[105]  S Reid,et al.  The Apology of Socrates. , 1975, Psychoanalytic Review.

[106]  Jan Rotmans,et al.  Uncertainty in Integrated Assessment Modelling , 2002 .

[107]  Maurice King,et al.  Beyond Six Billion: Forecasting the World's Population , 2001, BMJ : British Medical Journal.

[108]  Yakov Ben-Haim,et al.  Info-Gap Economics: An Operational Introduction , 2010 .

[109]  Philip S. Yu,et al.  A Survey of Uncertain Data Algorithms and Applications , 2009, IEEE Transactions on Knowledge and Data Engineering.

[110]  Jakub Bijak,et al.  What do Bayesian methods offer population forecasters , 2010 .

[111]  T. S. Jayram,et al.  Efficient aggregation algorithms for probabilistic data , 2007, SODA '07.

[112]  Raymond R. Tan,et al.  Application of possibility theory in the life‐cycle inventory assessment of biofuels , 2002 .

[113]  Peter Green,et al.  Markov chain Monte Carlo in Practice , 1996 .

[114]  AnHai Doan,et al.  Crowds, clouds, and algorithms: exploring the human side of "big data" applications , 2010, SIGMOD Conference.

[115]  Peter J. Haas,et al.  MCDB: a monte carlo approach to managing uncertain data , 2008, SIGMOD Conference.

[116]  Richard M. Karp,et al.  Monte-Carlo algorithms for enumeration and reliability problems , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[117]  Christopher Ré,et al.  Managing Uncertainty in Social Networks , 2007, IEEE Data Eng. Bull..

[118]  Hans-Peter Kriegel,et al.  Scalable Probabilistic Similarity Ranking in Uncertain Databases , 2010, IEEE Transactions on Knowledge and Data Engineering.

[119]  Chris A. McMahon,et al.  Uncertainty in Through-Life Costing–-Review and Perspectives , 2010, IEEE Transactions on Engineering Management.

[120]  B. Plato,et al.  Apology of Socrates , 1877 .

[121]  Jennifer Widom,et al.  Representing uncertain data: models, properties, and algorithms , 2009, The VLDB Journal.

[122]  Baoding Liu Uncertainty Theory: An Introduction to its Axiomatic Foundations , 2004 .

[123]  Ihab F. Ilyas,et al.  Efficient search for the top-k probable nearest neighbors in uncertain databases , 2008, Proc. VLDB Endow..

[124]  Parag Agrawal,et al.  Trio: a system for data, uncertainty, and lineage , 2006, VLDB.

[125]  Aditya G. Parameswaran,et al.  Answering Queries using Humans, Algorithms and Databases , 2011, CIDR.

[126]  Hans-Peter Kriegel,et al.  Probabilistic frequent itemset mining in uncertain databases , 2009, KDD.

[127]  Dan Olteanu,et al.  Fast and Simple Relational Processing of Uncertain Data , 2007, 2008 IEEE 24th International Conference on Data Engineering.