Chapter 3 Advancements in Text Mining Algorithms and Software

In this chapter, we present two advancements in the development of algorithms and software for the mining of textual information. For large-scale indexing needs, we present the General Text Parser (GTP) software environment with network storage capability. This object-oriented software (C++, Java) is designed to provide information retrieval (IR) and data mining specialists the ability to parse and index large text collections. GTP utilizes Latent Semantic Indexing (or LSI) for the construction of a vector space IR model. Users can choose to store the files generated by GTP on a remote network in order to overcome local storage restrictions and facilitate file sharing. For

[1]  C. Cleverdon On the Inverse Relationship of Recall and Precision. , 1972 .

[2]  M. W. Shields An Introduction to Automata Theory , 1988 .

[3]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[4]  Jack Dongarra,et al.  MPI: The Complete Reference , 1996 .

[5]  Ellen M. Voorhees,et al.  The fifth text REtrieval conference (TREC-5) , 1997 .

[6]  Michael W. Berry,et al.  Understanding search engines: mathematical modeling and text retrieval (software , 1999 .

[7]  Richard Wolski,et al.  The network weather service: a distributed resource performance forecasting service for metacomputing , 1999, Future Gener. Comput. Syst..

[8]  Elizabeth R. Jessup,et al.  Matrices, Vector Spaces, and Information Retrieval , 1999, SIAM Rev..

[9]  Eric Brill,et al.  Pattern-Based Disambiguation for Natural Language Processing , 2000, EMNLP.

[10]  Manuel Trajtenberg,et al.  Market Value and Patent Citations: A First Look , 2000 .

[11]  Alessandro Bassi,et al.  Managing Data Storage in the Network , 2001, IEEE Internet Comput..

[12]  Alessandro Bassi,et al.  Mobile management of network files , 2001, Proceedings Third Annual International Workshop on Active Middleware Services.

[13]  Hsinchun Chen,et al.  Extracting Meaningful Entities from Police Narrative Reports , 2002, DG.O.

[14]  Anthony F. Breitzman,et al.  Technological Powerhouse or Diluted Competence: Techniques for Assessing Mergers Via Patent Analysis , 2002 .

[15]  Micah Beck,et al.  An end-to-end approach to globally scalable network storage , 2002, SIGCOMM '02.

[16]  Michael W. Berry,et al.  GTP (General Text Parser) Software for Text Mining , 2003 .

[17]  Ying Ding,et al.  Algorithms for High Performance, Wide-Area Distributed File Downloads , 2003, Parallel Process. Lett..

[18]  Svetlana Y Mironova Integrating network storage into information retrieval applications , 2003 .

[19]  William M. Pottenger,et al.  A Semi-supervised Algorithm for Pattern Discovery in Information Extraction from Textual Data , 2003, PAKDD.

[20]  Alessandro Bassi,et al.  The Internet Backplane Protocol: a study in resource sharing , 2003, Future Gener. Comput. Syst..

[21]  Stephen Soderland,et al.  Learning Information Extraction Rules for Semi-Structured and Free Text , 1999, Machine Learning.