Historical inference based on semi-supervised learning

Abstract In the past, most historical research has been manually carried out by exploring historical facts reading between the lines of documents. Nowadays, historical big data has become electronically available and advances in machine learning techniques allow us to analyze the vast amount of historical data. From a historical perspective, making inferences about political stances of historical figures is important for grasping historical rivalries and power structures of an era. Thus, in this paper, we propose an approach to the systematic inference of power mechanisms based on a human network constructed from historical data. In this network, humans are linked according to the degree of kinship using genealogy records, and identified by political stances on agendas recorded in the annals of a dynasty as a political force. And then, a machine learning algorithm, semi-supervised learning, classifies humans who cannot identify political stances as political forces that reflect the links of the networks. The data consist of the genealogy of the Andong Gwon clan, a record of family relations of 10,243 people from the 10th to 15th century Korea, and the Annals of the Joseon Dynasty, a historical volume that describes historical facts of the Joseon Dynasty for 472 years and is composed of 1894 fascicles and 888 books. From the data, we construct a human network based on a historically meaningful period (1443–1488), and classify people into two political forces using the proposed method. We suggest that this machine learning approach to historical study could be utilized as a potent reference tool devoid of the subjectivism of human experts in the field of history.

[1]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[2]  G. Yule On the Methods of Measuring Association between Two Attributes , 1912 .

[3]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[4]  Omri Allouche,et al.  Assessing the accuracy of species distribution models: prevalence, kappa and the true skill statistic (TSS) , 2006 .

[5]  Jean-soo Chung,et al.  Analysis of People Networks in Goguryeo, Baekje, and Silla Dynasty Silloks , 2011 .

[6]  Sotiris B. Kotsiantis,et al.  Machine learning: a review of classification and combining techniques , 2006, Artificial Intelligence Review.

[7]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[8]  Alice H. Oh,et al.  Five Centuries of Monarchy in Korea: Mining the Text of the Annals of the Joseon Dynasty , 2015, LaTeCH@ACL.

[9]  Hyunjung Shin,et al.  Research and applications: Breast cancer survivability prediction using labeled, unlabeled, and pseudo-labeled patient data , 2013, J. Am. Medical Informatics Assoc..

[10]  T. Manabe The Digitized Kobe Collection, Phase I: Historical Surface Marine Meteorological Observations in the Archive of the Japan Meteorological Agency , 1999 .

[11]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[12]  C. Smith Diagnostic tests (1) – sensitivity and specificity , 2012, Phlebology.

[13]  Karl Pearson,et al.  Mathematical contributions to the theory of evolution. VIII. On the correlation of characters not quantitatively measurable , 1900, Proceedings of the Royal Society of London.

[14]  Jingrui He,et al.  Graph-Based Semi-Supervised Learning as a Generative Model , 2007, IJCAI.

[15]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[16]  Xiaojin Zhu,et al.  Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization , 2006 .

[17]  M. Foucault,et al.  Power/Knowledge: Selected Interviews and Other Writings 1972-1977 , 1980 .

[18]  R. Dahl The concept of power , 2007 .

[19]  Wei Pan,et al.  On Efficient Large Margin Semisupervised Learning: Method and Theory , 2009, J. Mach. Learn. Res..

[20]  Sungzoon Cho,et al.  Neighborhood PropertyBased Pattern Selection for Support Vector Machines , 2007, Neural Computation.

[21]  A. Roli Artificial Neural Networks , 2012, Lecture Notes in Computer Science.

[22]  Sangkuk Lee Conditions and potentials of Korean history research based on 'big data' analysis: the beginning of 'digital history' , 2016 .

[23]  Marlene Manoff,et al.  Archive and Database as Metaphor: Theorizing the Historical Record , 2010 .

[24]  Andreas Martin Lisewski,et al.  Graph sharpening , 2010, Expert Syst. Appl..

[25]  Yogesh Singh,et al.  A REVIEW OF STUDIES ON MACHINE LEARNING TECHNIQUES , 2007 .

[26]  Seema Sharma,et al.  Machine learning techniques for data mining: A survey , 2013 .

[27]  Sangkuk Lee,et al.  Long-Term Patterns of Seasonality of Mortality in Korea from the Seventeenth to the Twentieth Century , 2012 .

[28]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[29]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[30]  K. Pearson,et al.  DETERMINATION OF THE COEFFICIENT OF CORRELATION. , 1909, Science.

[31]  L. Schipper,et al.  Towards a sustainable future , 1992 .

[32]  Tom Fawcett,et al.  Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions , 1997, KDD.

[33]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[34]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[35]  James Lee,et al.  China Multi-generational Panel Dataset, Liaoning (CMGPD-LN) 1749-1909: User Guide , 2010 .

[36]  Yuhua Liu,et al.  GenealogyVis: A System for Visual Analysis of Multidimensional Genealogical Data , 2017, IEEE Transactions on Human-Machine Systems.

[37]  Qihui Wu,et al.  A survey of machine learning for big data processing , 2016, EURASIP Journal on Advances in Signal Processing.

[38]  Partha Pratim Talukdar,et al.  Graph-Based Semi-Supervised Learning , 2014, Graph-Based Semi-Supervised Learning.

[39]  G. V. Kass An Exploratory Technique for Investigating Large Quantities of Categorical Data , 1980 .

[40]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[41]  Sungzoon Cho,et al.  Response modeling with support vector machines , 2006, Expert Syst. Appl..

[42]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.