Strategies to Process Voluminous Data in Support of Counter-Terrorism

In this paper we present a survey of techniques and strategies that can be utilized to process high-volumes of data in support of counter-terrorism. Data reduction is a critical problem for counter-terrorism; there are large collections of documents that must be analyzed and processed, raising issues related to performance, lossless reduction, polysemy (i.e., the meaning of individual words being influenced by their surrounding words), and synonymy (i.e., the possibility of the same term being described in different ways). Our main objective in this paper is to provide a survey of data reduction strategies, ranging from data clustering to learning to latent semantic indexing

[1]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[2]  George Kollios,et al.  BoostMap: A method for efficient approximate similarity rankings , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[3]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[4]  Sanguthevar Rajasekaran Efficient parallel hierarchical clustering algorithms , 2005, IEEE Transactions on Parallel and Distributed Systems.

[5]  Santosh S. Vempala,et al.  Latent semantic indexing: a probabilistic analysis , 1998, PODS '98.

[6]  Eamonn J. Keogh,et al.  Locally adaptive dimensionality reduction for indexing large time series databases , 2001, SIGMOD '01.

[8]  P. Praks,et al.  On SVD-free latent semantic indexing for image retrieval for application in a hard industrial environment , 2003, IEEE International Conference on Industrial Technology, 2003.

[9]  Eamonn J. Keogh,et al.  A Simple Dimensionality Reduction Technique for Fast Similarity Search in Large Time Series Databases , 2000, PAKDD.

[10]  Kaizhong Zhang,et al.  Evaluating a class of distance-mapping algorithms for data mining and clustering , 1999, KDD '99.

[11]  Christos Faloutsos,et al.  FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets , 1995, SIGMOD '95.

[12]  Santosh S. Vempala,et al.  Latent Semantic Indexing , 2000, PODS 2000.

[13]  Heikki Mannila,et al.  Random projection in dimensionality reduction: applications to image and text data , 2001, KDD '01.

[14]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[15]  Yousef Saad,et al.  Polynomial filtering in latent semantic indexing for information retrieval , 2004, SIGIR '04.

[16]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[17]  B. S. Manjunath,et al.  An Eigenspace Update Algorithm for Image Analysis , 1997, CVGIP Graph. Model. Image Process..

[18]  Ronald L. Rivest,et al.  On the sample complexity of pac-learning using random and chosen examples , 1990, Annual Conference Computational Learning Theory.

[19]  Juha Karhunen,et al.  Principal component neural networks — Theory and applications , 1998, Pattern Analysis and Applications.

[20]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[21]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[22]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[23]  Dimitris Achlioptas,et al.  Database-friendly random projections , 2001, PODS.

[24]  George Kollios,et al.  BoostMap: A method for efficient approximate similarity rankings , 2004, CVPR 2004.

[25]  Stavros J. Perantonis,et al.  Dimensionality reduction using a novel neural network based feature extraction method , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[26]  Eduard Hoenkamp Unitary operators for fast latent semantic indexing (FLSI) , 2001, SIGIR '01.

[27]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.