Data Classification: Algorithms and Applications

This series aims to capture new developments and applications in data mining and knowledge discovery, while summarizing the computational tools and techniques useful in data analysis. This series encourages the integration of mathematical, statistical, and computational methods and techniques through the publication of a broad range of textbooks, reference works, and handbooks. The inclusion of concrete examples and applications is highly encouraged. The scope of the series includes, but is not limited to, titles in the areas of data mining and knowledge discovery methods and applications, modeling, algorithms, theory and foundations, data and knowledge visualization, data mining systems and tools, and privacy and security issues. This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Data classification : algorithms and applications / edited by Charu C. Aggarwal. pages cm-(Chapman & Hall/CRC data mining and knowledge discovery series ; 35) Summary: " This book homes in on three primary aspects of data classification: the core methods for data classification including probabilistic classification, decision trees, rule-based methods, and SVM methods; different problem domains and scenarios such as multimedia data, text data, biological data, categorical data, network data, data streams and uncertain data: and different variations of the classification problem such as ensemble methods, visual methods, transfer learning, semi-supervised methods and active learning. These advanced methods can be used to enhance the quality of the underlying classification results "-Provided by publisher. Includes bibliographical references and index.

[1]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[2]  Foster J. Provost,et al.  An expected utility approach to active feature-value acquisition , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[3]  Pedro M. Domingos Bayesian Averaging of Classifiers and the Overfitting Problem , 2000, ICML.

[4]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[5]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[6]  Charu C. Aggarwal,et al.  Towards systematic design of distance functions for data mining applications , 2003, KDD '03.

[7]  Charu C. Aggarwal Toward Exploratory Test-Instance-Centered Diagnosis in High-Dimensional Classification , 2007, IEEE Transactions on Knowledge and Data Engineering.

[8]  David D. Lewis,et al.  Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[9]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[10]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[11]  Joydeep Ghosh,et al.  Data Clustering Algorithms And Applications , 2013 .

[12]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Charu C. Aggarwal,et al.  Managing and Mining Graph Data , 2010, Managing and Mining Graph Data.

[14]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[15]  Wai Lam,et al.  Using a generalized instance set for automatic text categorization , 1998, SIGIR '98.

[16]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[17]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[18]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[19]  Hwee Tou Ng,et al.  Feature selection, perceptron learning, and a usability case study for text categorization , 1997, SIGIR '97.

[20]  Alfred Bork,et al.  Multimedia in Learning , 2001 .

[21]  Charu C. Aggarwal,et al.  Feature Selection for Classification: A Review , 2014, Data Classification: Algorithms and Applications.

[22]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[23]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[24]  Charu C. Aggarwal,et al.  Data Streams - Models and Algorithms , 2014, Advances in Database Systems.

[25]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[26]  Charu C. Aggarwal,et al.  Towards semantic knowledge propagation from text corpus to web images , 2011, WWW.

[27]  Charu C. Aggarwal,et al.  On effective classification of strings with wavelets , 2002, KDD.

[28]  Philippe Flajolet,et al.  Probabilistic Counting Algorithms for Data Base Applications , 1985, J. Comput. Syst. Sci..

[29]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[30]  Mohak Shah,et al.  Evaluating Learning Algorithms: A Classification Perspective , 2011 .

[31]  Charu C. Aggarwal,et al.  Towards cross-category knowledge propagation for learning visual concepts , 2011, CVPR 2011.

[32]  Hans-Peter Kriegel,et al.  Towards an effective cooperation of the user and the computer for classification , 2000, KDD '00.

[33]  Laura Schweitzer,et al.  Advances In Kernel Methods Support Vector Learning , 2016 .

[34]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[35]  Andreas S. Weigend,et al.  A neural network approach to topic spotting , 1995 .

[36]  Charu C. Aggarwal,et al.  A Survey of Text Classification Algorithms , 2012, Mining Text Data.

[37]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[38]  Bertrand Clarke,et al.  Comparing Bayes Model Averaging and Stacking When Model Approximation Error Cannot be Ignored , 2003, J. Mach. Learn. Res..

[39]  Johannes Gehrke,et al.  BOAT—optimistic decision tree construction , 1999, SIGMOD '99.

[40]  David Cohn,et al.  Active Learning , 2010, Encyclopedia of Machine Learning.

[41]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[42]  Rakesh Agrawal,et al.  SPRINT: A Scalable Parallel Classifier for Data Mining , 1996, VLDB.

[43]  Mark Craven,et al.  An Analysis of Active Learning Strategies for Sequence Labeling Tasks , 2008, EMNLP.

[44]  Charu C. Aggarwal,et al.  Outlier Analysis , 2013, Springer New York.

[45]  Ron Kohavi,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998 .

[46]  Rong Jin,et al.  Distance Metric Learning: A Comprehensive Survey , 2006 .

[47]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[48]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[49]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[50]  Shirish Tatikonda,et al.  SystemML: Declarative machine learning on MapReduce , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[51]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[52]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[53]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[54]  Lutz Hamel,et al.  Knowledge Discovery with Support Vector Machines , 2009 .

[55]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[56]  Qiang Yang,et al.  Heterogeneous Transfer Learning for Image Classification , 2011, AAAI.

[57]  Hinrich Schütze,et al.  A comparison of classifiers and document representations for the routing problem , 1995, SIGIR '95.

[58]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[59]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[60]  Charu C. Aggarwal,et al.  Mining Text Data , 2012, Springer US.

[61]  Qiang Yang,et al.  Translated Learning: Transfer Learning across Different Feature Spaces , 2008, NIPS.

[62]  Charles Hansen,et al.  The Visualization Handbook , 2011 .

[63]  Charu C. Aggarwal,et al.  Towards effective and interpretable data mining by visual interaction , 2002, SKDD.

[64]  David W. Aha,et al.  A Review and Empirical Evaluation of Feature Weighting Methods for a Class of Lazy Learning Algorithms , 1997, Artificial Intelligence Review.

[65]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[66]  Charu C. Aggarwal,et al.  On Density Based Transforms for Uncertain Data Mining , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[67]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[68]  Mikhail Belkin,et al.  Semi-Supervised Learning on Riemannian Manifolds , 2004, Machine Learning.

[69]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[70]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[71]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[72]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[73]  Yoram Singer,et al.  Context-sensitive learning methods for text categorization , 1996, SIGIR '96.

[74]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[75]  Jorma Rissanen,et al.  SLIQ: A Fast Scalable Classifier for Data Mining , 1996, EDBT.

[76]  Sreerama K. Murthy,et al.  Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey , 1998, Data Mining and Knowledge Discovery.

[77]  Ido Dagan,et al.  Mistake-Driven Learning in Text Categorization , 1997, EMNLP.

[78]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[79]  Jian Pei,et al.  A brief survey on sequence classification , 2010, SKDD.

[80]  ThrunSebastian,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000 .

[81]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[82]  Bernard Zenko,et al.  Is Combining Classifiers Better than Selecting the Best One , 2002, ICML.

[83]  Charu C. Aggarwal,et al.  Social Network Data Analytics , 2011 .

[84]  Fei Wang Distance Metric Learning for Data Classification , 2014, Data Classification: Algorithms and Applications.

[85]  Sanjay Ghemawat,et al.  MapReduce: a flexible data processing tool , 2010, CACM.

[86]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[87]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.

[88]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[89]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[90]  Simon Haykin,et al.  Neural Networks and Learning Machines , 2010 .

[91]  JOHANNES GEHRKE,et al.  RainForest—A Framework for Fast Decision Tree Construction of Large Datasets , 1998, Data Mining and Knowledge Discovery.

[92]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[93]  Philip S. Yu,et al.  On Classification of High-Cardinality Data Streams , 2010, SDM.