Effects of data quality and quantity in systems modelling: a case study

When constructing a probability distribution from incomplete and imprecise data, the effects of the quantity and the quality of the data are of serious concern in practical applications. Consider a situation when one is building a matrix of a joint probability distribution. For some events, the probabilities are available only approximately, and for the majority of the events they are not available at all. Traditionally, if the known values are exact values, this type of problem is dealt with by maximizing the Shannon entropy of the distribution while using the known values as constraints. In this case, however, the available information is approximate and represented by fuzzy numbers. A multi-objective optimization method is proposed that employs the well-known principles of maximum and minimum uncertainty. In this method, the Shannon entropy is maximized and, in addition, the known elements whose membership grades are as high as possible are searched for. The method is applied to the construction of an origin–destination (O–D) table of a transit from incomplete and imprecise data. The behaviour of the solution with respect to quantity and quality of available data is tested with sensitivity analysis using real-world data of four transit lines. This analysis reveals how changes in the quantity and quality of data affect the acceptable level of an O–D table. Furthermore, the issue of how to combine O–D tables developed on the basis of different sets of approximate values is examined using a method that minimizes the sum of the relative Shannon entropies.

[1]  L. Schrage Optimization Modeling With LINDO , 1997 .

[2]  George J. Klir,et al.  Fuzzy sets and fuzzy logic - theory and applications , 1995 .

[3]  Richard Bellman,et al.  Decision-making in fuzzy environment , 2012 .

[4]  George J. Klir,et al.  Uncertainty-Based Information , 1999 .

[5]  Michael G.H. Bell,et al.  The Estimation of an Origin-Destination Matrix from Traffic Counts , 1983 .

[6]  Tom V. Mathew,et al.  Origin–Destination Matrix Generation from Boarding–Alighting and Household Survey Data , 2010 .

[7]  Dragana Miljkovic,et al.  Method To Preprocess Observed Traffic Data for Consistency: Application of Fuzzy Optimization Concept , 1999 .

[8]  Didier Dubois,et al.  Advances in the Egalitarist Approach to Decision-Making in a Fuzzy Environment , 2001 .

[9]  Leo Egghe,et al.  Uncertainty and information: Foundations of generalized information theory , 2007, J. Assoc. Inf. Sci. Technol..

[10]  Dragana Miljkovic,et al.  Examination of Methods That Adjust Observed Traffic Volumes on a Network , 2000 .

[11]  Peter G Furth,et al.  BUS ROUTE O-D MATRIX GENERATION: RELATIONSHIP BETWEEN BIPROPORTIONAL AND RECURSIVE METHODS , 1992 .

[12]  Peter G Furth,et al.  Using Archived AVL-APC Data to Improve Transit Performance and Management , 2006 .

[13]  Thomas D. Sandry Uncertainty Modeling and Analysis in Engineering and the Sciences, by Bilal M. Ayyub and George J. Klir , 2008 .

[14]  Shinya Kikuchi,et al.  Method for Balancing Observed Boarding and Alighting Counts on a Transit Line , 2006 .

[15]  Hans-Jürgen Zimmermann,et al.  Fuzzy Set Theory - and Its Applications , 1985 .

[16]  Yetis Sazi Murat,et al.  Fuzzy Optimization Approach , 2007 .

[17]  Adam Rahbee,et al.  Origin and Destination Estimation in New York City with Automated Fare System Data , 2002 .

[18]  G. Klir,et al.  Uncertainty-based information: Elements of generalized information theory (studies in fuzziness and soft computing). , 1998 .

[19]  Peter G Furth,et al.  GENERATING A BUS ROUTE O-D MATRIX FROM ON-OFF DATA , 1985 .