论文信息 - classification-extraction system based meaning for text-mining of large data.

classification-extraction system based meaning for text-mining of large data.

The present invention relates to a semantic-based classification extraction system for text mining of large amounts of data, and more particularly, to perform text mining for fast and accurate search results in a search environment for large amounts of data through a computer system and the Internet. The present invention relates to a semantic-based classification extraction system for text mining of large amounts of data for automatically extracting and selecting required categories from electronic documents. The semantic-based classification extraction system for text mining of large data of the present invention, A morpheme analysis unit 110 for extracting a word, Sentence end character information DB (120) for storing word or character code information that can be used at the end of a sentence, A sentence unit extracting means (100) comprising a sentence defining unit (130) defining a character string up to a place where a word or character code existing at the end of a sentence is located with reference to the sentence end character information DB; Inclination word dictionary storing the inclination information for each word 210 and, An inclination inclination word dictionary 220 for storing inclination propensity information in which the meaning when the words are used continuously is inverted; An accuracy score information DB 230 for storing a given accuracy score and an inclination accuracy score; After determining the subject and the propensity of the sentence from the sentence information extracted by the sentence unit extraction means, calculate the accuracy score and propensity accuracy score by referring to the inclination word dictionary division, the inclination word dictionary division, and the accuracy score information division. A sentence control and propensity extraction means 200 including an accuracy score calculator 240; The ratio of the length of each sentence in the entire document, the weight according to the position of the sentence in the entire document, and the weight in the case of the same subject and the propensity, by referring to the subject and propensity information of the sentences extracted from the sentence master and propensity extraction means. By calculating An entire document main propensity and propensity extraction means (300) for extracting subject and propensity information of a sentence having the highest score in the entire document; And a classification data selection means 400 for selecting the subject of the document derived from the entire document main propensity and propensity extraction means and the propensity word of the entire document as the classification data. Through the present invention, a text mining operation is performed by automatically extracting and selecting a category from an electronic document, which is required to perform text mining for fast and accurate search results in a search environment for a large amount of data through a computer system and the Internet. Will be able to provide classification data to improve accuracy. In addition, by using accurate classification (Category) it is possible to accurately search the data desired by the user in a large number of data.

이재희 | 배성환