Mining Text with Pimiento

To perform analysis, decision-making, and knowledge management tasks, information systems use an increasing amount of unstructured information in the form of text. This data influx, in turn, has spawned a need to improve the text-mining technologies required for information retrieval, filtering, and classification. This article compares some of the options available. In particular, the authors focus on Pimiento, a new object-oriented application framework that lets developers create distributed applications that use machine-learning and statistical techniques to automatically process documents

[1]  Kalina Bontcheva,et al.  Developing reusable and robust language processing components for information systems using GATE , 2002, Proceedings. 13th International Workshop on Database and Expert Systems Applications.

[2]  R.A. Calvo,et al.  Applying Plagiarism Detection to Engineering Education , 2006, 2006 7th International Conference on Information Technology Based Higher Education and Training.

[3]  David A. Ferrucci,et al.  Building an example application with the Unstructured Information Management Architecture , 2004, IBM Syst. J..

[4]  Stephen Albin The Art of Software Architecture: Design Methods and Techniques , 2003 .

[5]  Marti A. Hearst Untangling Text Data Mining , 1999, ACL.

[6]  David A. Ferrucci,et al.  UIMA: an architectural approach to unstructured information processing in the corporate research environment , 2004, Natural Language Engineering.

[7]  Kalina Bontcheva,et al.  Software Infrastructure for Language Resources: a Taxonomy of Previous Work and a Requirements Analysis , 2000, LREC.

[8]  Eric Nyberg,et al.  Integrated Information Management: An Interactive, Extensible Architecture for Information Retrieval , 2001, HLT.

[9]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[10]  Rafael A. Calvo,et al.  A decomposition scheme based on error-correcting output codes for ensembles of text categorizers , 2005, Third International Conference on Information Technology and Applications (ICITA'05).

[11]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[12]  Douglas C. Schmidt,et al.  Object-oriented application frameworks , 1997, CACM.

[13]  Ralph E. Johnson,et al.  Frameworks = (components + patterns) , 1997, CACM.

[14]  Juan Jose García Adeva,et al.  Web Misuse Detection through Text Categorisation of Application Server Logs , 2006, Int. J. Artif. Intell. Tools.

[15]  Fabrizio Sebastiani,et al.  An Analysis of the Relative Hardness of Reuters-21578 Subsets , 2003 .