Extracting Academic Activity Transaction in Chinese Documents

As academic relationship networks are implied in various academic activities which relate to study and work experiences of scholars, extracting the transaction information of academic activities is important for mining academic relations networks. For the characteristics of long-distance dependencies in Chinese sentences of academic activities, this paper proposes to extract the transaction information from Chinese documents by using sequence forecast based on Conditional Random Fields (CRF). Often, in research project applications, the resumes of applicant and team members include many complex Chinese sentences about academic activities. We design novel methods to analyze special sentence patterns in those resumes. More specifically, we focus on the design of feature templates according to the sentences characteristics of academic activities, and employ the regular matching method to deal with inaccurate words segmentation, especially for academic-specific words. Through evaluating tests, we choose the optimum feature templates and input to CRF++ model to label trunk words of the sentences. The transaction information extraction of academic activities is implemented. Experimental results show the effectiveness of the proposed approach.