Mining Diagnostic Text Reports by Learning to Annotate Knowledge Roles

Several tasks approached by using text mining techniques, like text categorization, document clustering, or information retrieval, operate on the document level, making use of the so-called bag-of-words model. Other tasks, like document summarization, information extraction, or question answering, have to operate on the sentence level, in order to fulfill their specific requirements. While both groups of text mining tasks are typically affected by the problem of data sparsity, this is more accentuated for the latter group of tasks. Thus, while the tasks of the first group can be tackled by statistical and machine learning methods based on a bag-of-words approach alone, the tasks of the second group need natural language processing (NLP) at the sentence or paragraph level in order to produce more informative features.

[1]  Wolfgang Lezius Ims Morphy -- German Morphology, Part-of-Speech Tagging and Applications , 2000 .

[2]  Daniel Jurafsky,et al.  Support Vector Learning for Semantic Argument Classification , 2005, Machine Learning.

[3]  Josef Ruppenhofer,et al.  FrameNet: Theory and Practice , 2003 .

[4]  Katrin Erk,et al.  The SALSA Annotation Tool , 2003 .

[5]  Helmut Schmid,et al.  Improvements in Part-of-Speech Tagging with an Application to German , 1999 .

[6]  Xavier Carreras,et al.  Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling , 2005, CoNLL.

[7]  Michael Schiehlen Annotation Strategies for Probabilistic Parsing in German , 2004, COLING.

[8]  Helmut Schmid Efficient Parsing of Highly Ambiguous Context-Free Grammars with Bit Vectors , 2004, COLING.

[9]  Walter Daelemans,et al.  TiMBL: Tilburg Memory-Based Learner , 2007 .

[10]  Daniel Jurafsky,et al.  Automatic Labeling of Semantic Roles , 2002, CL.

[11]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[12]  Tong Zhang,et al.  Text Mining: Predictive Methods for Analyzing Unstructured Information , 2004 .

[13]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[14]  Paul R. Kingsbury,et al.  PropBank , SALSA , and FrameNet : How Design Determines Product , 2022 .

[15]  Guus Schreiber,et al.  Knowledge Engineering and Management: The CommonKADS Methodology , 1999 .

[16]  Amit Dubey,et al.  Statistical parsing for German: modeling syntactic properties and annotation differences , 2005 .

[17]  Sabine Schulte im Walde Experiments on the Automatic Induction of German Semantic Verb Classes , 2006, CL.

[18]  Min Tang,et al.  Active Learning for Statistical Natural Language Parsing , 2002, ACL.

[19]  Manfred Pinkal,et al.  Towards a Resource for Lexical Semantics: A Large German Corpus with Extensive Semantic Annotation , 2003, ACL.

[20]  Ellen Riloff,et al.  An Empirical Approach to Conceptual Case Frame Acquisition , 1998, VLC@COLING/ACL.

[21]  C. Fillmore FRAME SEMANTICS AND THE NATURE OF LANGUAGE * , 1976 .

[22]  Amit Dubey,et al.  What to Do When Lexicalization Fails: Parsing German with Suffix Analysis and Smoothing , 2005, ACL.

[23]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[24]  R. Jones,et al.  A Comparison Of Efficacy And Assumptions Of Bootstrapping Algorithms For Training Information Extraction Systems , 2002 .