Adding Structure to Unstructured Text

An overview of the authors’ research program in document engineering is presented. Underlying techniques are being developed for agile parsing of unstructured and semi-structured text to extract metadata. XML technologies are leveraged in novel ways to support complex querying, analysis, and transformation of large text bases. New methods for difference analysis are being developed to support document evolution and maintenance. Additionally, advanced information retrieval methods, namely latent semantic indexing, in conjunction with clustering techniques are used to extract high level features and concepts from large corpora.

[1]  Jonathan I. Maletic,et al.  Automatic software clustering via Latent Semantic Analysis , 1999, 14th IEEE International Conference on Automated Software Engineering.

[2]  Leon Moonen,et al.  Generating robust parsers using island grammars , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[3]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[4]  Jonathan I. Maletic,et al.  An XML-based lightweight C++ fact extractor , 2003, 11th IEEE International Workshop on Program Comprehension, 2003..

[5]  Andrian Marcus,et al.  Supporting document and data views of source code , 2002, DocEng '02.

[6]  Michael L. Collard An infrastructure to support meta-differencing and refactoring of source code , 2003, 18th IEEE International Conference on Automated Software Engineering, 2003. Proceedings..

[7]  Andrian Marcus,et al.  3D representations for software visualization , 2003, SoftVis '03.

[8]  Jonathan I. Maletic,et al.  Supporting source code difference analysis , 2004, 20th IEEE International Conference on Software Maintenance, 2004. Proceedings..

[9]  Andrian Marcus,et al.  Supporting program comprehension using semantic and structural information , 2001, Proceedings of the 23rd International Conference on Software Engineering. ICSE 2001.

[10]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[11]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[12]  Jonathan I. Maletic,et al.  Meta-differencing: an infrastructure for source code difference analysis , 2004 .

[13]  Andrian Marcus,et al.  Identification of high-level concept clones in source code , 2001, Proceedings 16th Annual International Conference on Automated Software Engineering (ASE 2001).

[14]  Andrian Marcus,et al.  Recovering documentation-to-source-code traceability links using latent semantic indexing , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[15]  Andrian Marcus,et al.  A task oriented view of software visualization , 2002, Proceedings First International Workshop on Visualizing Software for Understanding and Analysis.

[16]  Jonathan I. Maletic Leveraging XML technologies in developing program analysis tools , 2004, ICSE 2004.

[17]  Andrian Marcus,et al.  Source Viewer 3D (sv3D) - a framework for software visualization , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..