On developing data integration and mining platform for classical Chinese literature study

With advancement of data digitization hardware and software, we have witnessed many successful applications of computer-aided methodologies in literature study and analysis. Recently, the data integration and mining technique became an important research topic, as it is a critical yet challenging issue in the development of effective digital platform to facilitate various practical literature research tasks, such as categorization of large literature data, identification of authors or interested readers from given texts, etc. In this paper, we study the processing of Chinese classic literature data, discuss a group of related data processing techniques, and then provide a few general suggestions in effectively applying these techniques.

[1]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[2]  A. Kaplan,et al.  Users of the world, unite! The challenges and opportunities of Social Media , 2010 .

[3]  Navneet Kaur,et al.  Opinion mining and sentiment analysis , 2016, 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom).

[4]  Jason D. M. Rennie Improving multi-class text classification with Naive Bayes , 2001 .

[5]  Rhonda K. Reger,et al.  A Content Analysis of the Content Analysis Literature in Organization Studies: Research Themes, Data Sources, and Methodological Refinements , 2007 .

[6]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[7]  Hinrich Schütze,et al.  Projections for efficient document clustering , 1997, SIGIR '97.

[8]  Shivakumar Vaithyanathan,et al.  Exploiting clustering and phrases for context-based information retrieval , 1997, SIGIR '97.

[9]  B. Lundman,et al.  Qualitative content analysis in nursing research: concepts, procedures and measures to achieve trustworthiness. , 2004, Nurse education today.

[10]  Qiaozhu Mei,et al.  From Text to Exhibitions: A New Approach for E-Learning on Language and Literature based on Text Mining , 2004 .

[11]  Ming-der Wu,et al.  Humanities Graduate Students' Use Behavior on Full-Text Databases for Ancient Chinese Books , 2007, ICADL.

[12]  W. Casper,et al.  Work and family research in IO/OB: Content analysis and review of the literature (1980–2002) , 2005 .

[13]  Steven Skiena,et al.  Spatial Analysis of News Sources , 2006, IEEE Transactions on Visualization and Computer Graphics.

[14]  Jon M. Kleinberg,et al.  Challenges in mining social network data: processes, privacy, and paradoxes , 2007, KDD '07.

[15]  Prabhakar Raghavan,et al.  Using Taxonomy, Discriminants, and Signatures for Navigating in Text Databases , 1997, VLDB.

[16]  Andrew McCallum,et al.  Distributional clustering of words for text classification , 1998, SIGIR '98.

[17]  F. Ren,et al.  Classic Chinese Automatic Question Answering System Based on Pragmatics Information , 2008, 2008 Seventh Mexican International Conference on Artificial Intelligence.