Real-time Knowledge Discovery and Dissemination for Intelligence Analysis

This paper describes the issues and challenges for real-time knowledge discovery and then discusses approaches and challenges for real-time data mining and stream mining. Our goal is to extract accurate information to support the emergency responder, the war fighter, as well as the intelligence analyst in a timely manner.

[1]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[2]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[3]  Laks V. S. Lakshmanan,et al.  Exploratory mining and pruning optimizations of constrained associations rules , 1998, SIGMOD '98.

[4]  Sudipto Guha,et al.  Streaming-data algorithms for high-quality clustering , 2002, Proceedings 18th International Conference on Data Engineering.

[5]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[6]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD 2000.

[7]  Charu C. Aggarwal,et al.  On High Dimensional Projected Clustering of Uncertain Data Streams , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[8]  Hongyan Liu,et al.  Error-Adaptive and Time-Aware Maintenance of Frequency Counts over Data Streams , 2006, WAIM.

[9]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[10]  David Heckerman,et al.  Bayesian Networks for Knowledge Discovery , 1996, Advances in Knowledge Discovery and Data Mining.

[11]  Anthony K. H. Tung,et al.  Ranking Outliers Using Symmetric Neighborhood Relationship , 2006, PAKDD.

[12]  Shashi Shekhar,et al.  Detecting graph-based spatial outliers: algorithms and applications (a summary of results) , 2001, KDD '01.

[13]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[14]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[15]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[16]  JOHANNES GEHRKE,et al.  RainForest—A Framework for Fast Decision Tree Construction of Large Datasets , 1998, Data Mining and Knowledge Discovery.

[17]  Wei-Yin Loh,et al.  A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms , 2000, Machine Learning.

[18]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[19]  Jiawei Han,et al.  MAIDS: mining alarming incidents from data streams , 2004, SIGMOD '04.

[20]  Jian Pei,et al.  Efficient computation of Iceberg cubes with complex measures , 2001, SIGMOD '01.

[21]  Jennifer Widom,et al.  Continuous queries over data streams , 2001, SGMD.

[22]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[23]  Kamesh Munagala,et al.  Local Search Heuristics for k-Median and Facility Location Problems , 2004, SIAM J. Comput..

[24]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[25]  Wolfgang Maass,et al.  Algorithms and Lower Bounds for On-Line Learning of Geometrical Concepts , 1994, Machine Learning.

[26]  Leonid Khachiyan,et al.  On the Complexity of Dualization of Monotone Disjunctive Normal Forms , 1996, J. Algorithms.

[27]  Jiawei Han,et al.  Discovering an Evolutionary Classifier over a High-speed Nonstatic Stream , 2005 .

[28]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[29]  Jiawei Han,et al.  High-Dimensional OLAP: A Minimal Cubing Approach , 2004, VLDB.

[30]  S. Muthukrishnan,et al.  Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries , 2001, VLDB.

[31]  Jiawei Han,et al.  Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[32]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[33]  Jorma Rissanen,et al.  An MDL Framework for Data Clustering , 2005 .

[34]  Philip S. Yu,et al.  Fast algorithms for projected clustering , 1999, SIGMOD '99.

[35]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[36]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[37]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[38]  Philip S. Yu,et al.  A framework for on-demand classification of evolving data streams , 2006, IEEE Transactions on Knowledge and Data Engineering.

[39]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[40]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[41]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[42]  James Kelly,et al.  AutoClass: A Bayesian Classification System , 1993, ML.

[43]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[44]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[45]  Yixin Chen,et al.  Multi-Dimensional Regression Analysis of Time-Series Data Streams , 2002, VLDB.

[46]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[47]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[48]  Paul S. Bradley,et al.  Scaling Clustering Algorithms to Large Databases , 1998, KDD.

[49]  Jiawei Han,et al.  IncSpan: incremental mining of sequential patterns in large database , 2004, KDD.

[50]  Jeffrey F. Naughton,et al.  An array-based algorithm for simultaneous multidimensional aggregates , 1997, SIGMOD '97.

[51]  Philip S. Yu,et al.  A Framework for Projected Clustering of High Dimensional Data Streams , 2004, VLDB.

[52]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[53]  Sudipto Guha,et al.  Data-streams and histograms , 2001, STOC '01.

[54]  Jiawei Han,et al.  Star-Cubing: Computing Iceberg Cubes by Top-Down and Bottom-Up Integration , 2003, Very Large Data Bases Conference.

[55]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[56]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[57]  Rajeev Rastogi,et al.  Processing complex aggregate queries over data streams , 2002, SIGMOD '02.

[58]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[59]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[60]  Yixin Chen,et al.  Stream Cube: An Architecture for Multi-Dimensional Analysis of Data Streams , 2005, Distributed and Parallel Databases.

[61]  Philip S. Yu,et al.  On demand classification of data streams , 2004, KDD.

[62]  Jiawei Han,et al.  CloseGraph: mining closed frequent graph patterns , 2003, KDD '03.

[63]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[64]  Raghu Ramakrishnan,et al.  Bottom-up computation of sparse and Iceberg CUBE , 1999, SIGMOD '99.

[65]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[66]  Hongyan Liu,et al.  C-Cubing: Efficient Computation of Closed Cubes by Aggregation-Based Checking , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[67]  Sudipto Guha,et al.  Clustering Data Streams , 2000, FOCS.

[68]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .