Creating a structured knowledge base by parsing natural language text

This research develops the tools and methodologies necessary to perform automated conversion of natural language text into a structured knowledge base. More specifically, it takes a chapter on cardiovascular pathophysiology from a textbook written by Joel Michael (see Appendix A) and converts the text into a structured knowledge base to be used as the domain knowledge base of a tutoring system. The knowledge contained in the chapter is background medical knowledge important to the solution of problems in cardiovascular physiology. This research project was divided into four phases. Each phase places the information in a more convenient form for the succeeding phase. The first two phases operate on single sentences while the last two phases require larger units such as paragraphs or sections as input. The four phases are: (1) Preparation of Text, (2) Parsing of Text, (3) Conversion of Parse Trees to Information Formats, and (4) Conversion of Information Formats to Frames. The Preparation of Text phase converts the raw text from the chapter into a form suitable for the LSP parser by removing incompatible data, correcting errors, numbering sentences, and converting all characters to upper case. The tools used in this phase consist of a series of C programs. The Parsing of the chapter is done with the Linguistic String Parser (LSP) developed at New York University. In order to successfully parse the chapter the vocabulary and grammar that is supplied with the LSP was extended. A sublanguage study was performed to identify those words and grammatical constructs that were missing from those supplied with the LSP. The conversion of parse trees to information formats and then to frames was done using the AI language CLIPS. CLIPS was developed at NASA's Johnson Space Center.