Linear discrimination with strategically missing values

This study analyzes a problem where a decision maker needs to estimate missing values that are hidden strategically by agents so that further analysis can be carried out as if data are complete. Data can be missing due to different reasons. When data are provided by intelligent agents, most often information is hidden strategically to receive a favorable classification (or potential transaction) from the decision maker. Anticipating such strategic moves, we find a set of default vectors that the decision maker can use for replacing missing values, such that she minimizes her misclassification rate and incents agents to publish information at the same time. In theoretical and empirical studies, the performance of this set of default vectors is compared to some common statistical methods for handling missing data. Empirical results show that the default vectors chosen from this set dominates other methods in terms of misclassification rates. (Full text of this dissertation may be available via the University of Florida Libraries web site. Please check http://www.uflib.ufl.edu/etd.html)

[1]  Max Bramer,et al.  Techniques for Dealing with Missing Values in Classification , 1997, IDA.

[2]  Y. Haitovsky Missing Data in Regression Analysis , 1968 .

[3]  Olive Jean Dunn,et al.  Alternative Approaches to Missing Values in Discriminant Analysis , 1976 .

[4]  S. F. Buck A Method of Estimation of Missing Values in Multivariate Data Suitable for Use with an Electronic Computer , 1960 .

[5]  H. Raghav Rao,et al.  A trust-based consumer decision-making model in electronic commerce: The role of trust, perceived risk, and their antecedents , 2008, Decis. Support Syst..

[6]  Gary J. Koehler,et al.  Induction over constrained strategic agents , 2010, Eur. J. Oper. Res..

[7]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[8]  S. Li Concise Formulas for the Area and Volume of a Hyperspherical Cap , 2011 .

[9]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[10]  S. S. Wilks Moments and Distributions of Estimates of Population Parameters from Fragmentary Samples , 1932 .

[11]  Gary J. Koehler,et al.  Induction over Strategic Agents , 2010, Inf. Syst. Res..

[12]  Gary J. Koehler,et al.  Linear Discriminant Functions Determined by Genetic Search , 1991, INFORMS J. Comput..

[13]  Heinrich von Stackelberg Market Structure and Equilibrium , 2010 .

[14]  Jerzy W. Grzymala-Busse,et al.  A Comparison of Several Approaches to Missing Attribute Values in Data Mining , 2000, Rough Sets and Current Trends in Computing.

[15]  Gary J. Koehler,et al.  Induction over Strategic Agents: a genetic algorithm solution , 2010, Ann. Oper. Res..

[16]  F. Glover,et al.  Simple but powerful goal programming models for discriminant problems , 1981 .

[17]  Esther C. Jackson Missing Values in Linear Multiple Discriminant Analysis , 1968 .

[18]  Tom Fawcett "In vivo" spam filtering: A challenge problem for data mining , 2004, ArXiv.

[19]  Greg Huber,et al.  Gamma Function Derivation of n-Sphere Volumes , 1982 .

[20]  David J. Hand,et al.  Discrimination and Classification , 1982 .

[21]  Gary J. Koehler,et al.  Considerations for mathematical programming models in discriminant analysis , 1990 .

[22]  J. Ross Quinlan,et al.  Unknown Attribute Values in Induction , 1989, ML.

[23]  Peter Stone,et al.  Implicit Negotiation in Repeated Games , 2001, ATAL.

[24]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[25]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[26]  John Van Hoewyk,et al.  A multivariate technique for multiply imputing missing values using a sequence of regression models , 2001 .

[27]  David Gefen,et al.  Managing User Trust in B2C e-Services , 2003 .

[28]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[29]  Philip K. Chan,et al.  Learning nonstationary models of normal network traffic for detecting novel attacks , 2002, KDD.

[30]  E A Joachimsthaler,et al.  Mathematical Programming Approaches for the Classification Problem in Two-Group Discriminant Analysis. , 1990, Multivariate behavioral research.

[31]  T. Stijnen,et al.  Review: a gentle introduction to imputation of missing values. , 2006, Journal of clinical epidemiology.

[32]  W. Loh,et al.  Tree-Structured Classification via Generalized Discriminant Analysis. , 1988 .