A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Computer Science and Engineering

[1]  Greg Hamerly,et al.  Learning the k in k-means , 2003, NIPS.

[2]  Charles Elkan,et al.  Using the Triangle Inequality to Accelerate k-Means , 2003, ICML.

[3]  T. Sherwood,et al.  Phase tracking and prediction , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[4]  Roland E. Wunderlich,et al.  SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..

[5]  Nikos A. Vlassis,et al.  The global k-means clustering algorithm , 2003, Pattern Recognit..

[6]  C. Elkan,et al.  Alternatives to the k-means algorithm that find better clusterings , 2002, CIKM '02.

[7]  Yi Li,et al.  COOLCAT: an entropy-based algorithm for categorical clustering , 2002, CIKM '02.

[8]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[9]  Chris H. Q. Ding,et al.  Adaptive dimension reduction for clustering high dimensional data , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[10]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Jitendra Malik,et al.  Efficient spatiotemporal grouping using the Nystrom method , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[12]  Kevin Skadron,et al.  Minimal subset evaluation: rapid warm-up for simulated hardware state , 2001, Proceedings 2001 IEEE International Conference on Computer Design: VLSI in Computers and Processors. ICCD 2001.

[13]  Brad Calder,et al.  Basic block distribution analysis to find periodic behavior and simulation points in applications , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[14]  James E. Smith,et al.  Modeling superscalar processors via statistical simulation , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[15]  John Flynn,et al.  Adapting the SPEC 2000 benchmark suite for simulation-based computer architecture research , 2001 .

[16]  Pat Langley,et al.  Generalized clustering, supervised learning, and data assignment , 2001, KDD '01.

[17]  Greg Hamerly,et al.  Bayesian approaches to failure prediction for disk drives , 2001, ICML.

[18]  Andrew W. Moore,et al.  Repairing Faulty Mixture Models using Density Estimation , 2001, ICML.

[19]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[20]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[21]  André Seznec,et al.  Choosing representative slices of program execution for microarchitecture simulations: a preliminary , 2000 .

[22]  Ping Chen,et al.  Using the fractal dimension to cluster datasets , 2000, KDD '00.

[23]  Andrew W. Moore,et al.  The Anchors Hierarchy: Using the Triangle Inequality to Survive High Dimensional Data , 2000, UAI.

[24]  Sanjoy Dasgupta,et al.  Experiments with Random Projection , 2000, UAI.

[25]  Charles Elkan,et al.  Scalability for clustering algorithms revisited , 2000, SKDD.

[26]  David M. Mount,et al.  The analysis of a simple k-means clustering algorithm , 2000, SCG '00.

[27]  Frederic T. Chong,et al.  HLS: combining statistical and symbolic simulation to guide microprocessor designs , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[28]  Bin Zhang,et al.  Genera lized K- Harmonic Means - - Boosting in Unsupervised Learnin g , 2000 .

[29]  Pedro Larrañaga,et al.  An empirical comparison of four initialization methods for the K-Means algorithm , 1999, Pattern Recognit. Lett..

[30]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[31]  Andrew W. Moore,et al.  Accelerating exact k-means algorithms with geometric reasoning , 1999, KDD '99.

[32]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Brad Calder,et al.  Time Varying Behavior of Programs , 1999 .

[34]  Alan M. Frieze,et al.  Clustering in large graphs and matrices , 1999, SODA '99.

[35]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[36]  Paul S. Bradley,et al.  Refining Initial Points for K-Means Clustering , 1998, ICML.

[37]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[38]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[39]  G. M. D. Corso Estimating an Eigenvector by the Power Method with a Random Start , 1997 .

[40]  Yishay Mansour,et al.  An Information-Theoretic Analysis of Hard and Soft Assignment Methods for Clustering , 1997, UAI.

[41]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[42]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[43]  Jon M. Kleinberg,et al.  Two algorithms for nearest-neighbor search in high dimensions , 1997, STOC '97.

[44]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[45]  Thomas M. Conte,et al.  Reducing state loss for effective trace sampling of superscalar processors , 1996, Proceedings International Conference on Computer Design. VLSI in Computers and Processors.

[46]  H. Abarbanel Analysis of Observed Chaotic Data , 1995 .

[47]  L. Wasserman,et al.  A Reference Bayesian Test for Nested Hypotheses and its Relationship to the Schwarz Criterion , 1995 .

[48]  Dana Ron,et al.  An Experimental and Theoretical Comparison of Model Selection Methods , 1995, COLT '95.

[49]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[50]  A. Eustace,et al.  ATOM: a system for building customized program analysis tools , 1994, PLDI '94.

[51]  Dietmar Saupe,et al.  Chaos and fractals - new frontiers of science , 1992 .

[52]  G. Casella,et al.  Statistical Inference , 2003, Encyclopedia of Social Network Analysis and Mining.

[53]  R. D'Agostino,et al.  Goodness-of-Fit-Techniques , 1987 .

[54]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[55]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[56]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[57]  Michael A. Stephens,et al.  Asymptotic Results for Goodness-of-Fit Statistics with Unknown Parameters , 1976 .

[58]  M. Stephens EDF Statistics for Goodness of Fit and Some Comparisons , 1974 .

[59]  N. E. Day Estimating the components of a mixture of normal distributions , 1969 .

[60]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[61]  Lotfi A. Zadeh,et al.  Fuzzy Sets , 1996, Inf. Control..

[62]  M. E. Muller,et al.  A Note on the Generation of Random Normal Deviates , 1958 .