In Search of Unstructured Documents Categorization

Transferring information from one part to another of the world is the main aim of communication. Now a day, the information is available in forms of documents or files created on requirements basis. The more the requirements the large the documents are. That is why; the way of creation which is random in nature as well as storage bends the documents unstructured in nature. The result is that, dealing with these documents becomes a headache. For the ease of process, the frequently required data should maintain certain pattern. But being unfortunate enough, most of the time we have to face problems like erroneous data retrieving or modification anomalies or even a large amount of time may be given for retrieving a single document. To overcome the situation, a solution has raised named unstructured document categorization. This field is a vast one containing all kind of solutions for various type of document categorization. Basically, the documents which are unstructured in nature will be categorized based on some given constraints. And through this paper we would like to highlight the most as well as popular techniques like text and data mining, genetic algorithm, lexical chaining, binarization methods in the field of unstructured document categorization so that we can reach the fulfillment of desired unstructured document categorization.

[1]  A. Muller,et al.  The TaxGen framework: automating the generation of a taxonomy for a large document collection , 1999, Proceedings of the 32nd Annual Hawaii International Conference on Systems Sciences. 1999. HICSS-32. Abstracts and CD-ROM of Full Papers.

[2]  Luiz Gonzaga,et al.  A Simple and Fast Term Selection Procedure for Text Clustering , 2007, Seventh International Conference on Intelligent Systems Design and Applications (ISDA 2007).

[3]  Tsvi Kuflik,et al.  Supervised Learning for Automatic Classification of Documents using Self-Organizing Maps , 2000, DELOS.

[4]  Clare E. Bond,et al.  Knowledge transfer in a digital world: field data acquisition, uncertainty, visualization, and data management , 2007 .

[5]  Fuad Rahman,et al.  Structured and unstructured document summarization:design of a commercial summarizer using Lexical chains , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[6]  Weiguo Fan,et al.  Effective information retrieval using genetic algorithms based matching functions adaptation , 2000, Proceedings of the 33rd Annual Hawaii International Conference on System Sciences.

[7]  Klara Kedem,et al.  Classification of Hebrew calligraphic handwriting styles: preliminary results , 2004, First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings..