Roper Center is one of the largest public opinion data archives in the world. It collects data sets of polled survey questions from numerous media outlets and organizations. The volume of data introduces search complexities over survey questions and poses challenges when analyzing search trends. Roper Center question-level retrieval applications used human metadata experts to assign topics to content. This has been insufficient to reach required levels of consistency and provides an inadequate base for creating an advanced search experience. The objective of this work is to combine the human expert teams' knowledge of the nature of the survey questions and the concepts and topics these questions express, with the ability of multi-label classifiers to learn this knowledge and apply it to an automated, fast and accurate classification mechanism. This approach cuts down the question analysis and tagging time significantly as well as provides enhanced consistency and scalability for topics' descriptions. At the same time, creating an ensemble of machine learning classifiers combined with expert knowledge is expected to enhance the search experience and provide much needed analytic capabilities to the survey questions databases. In our design, we use classification from several machine learning algorithms like SVM and Decision Trees, combined with expert knowledge in form of handcrafted rules, data analysis and result review. We consolidate the different techniques into a Multipath Classifier with a Confidence point system that decides upon the relevance of topics assigned to survey questions with nearly perfect accuracy.
[1]
M. Kubát.
An Introduction to Machine Learning
,
2017,
Springer International Publishing.
[2]
David W. Opitz,et al.
Generating Accurate and Diverse Members of a Neural-Network Ensemble
,
1995,
NIPS.
[3]
Stephen V. Stehman,et al.
Selecting and interpreting measures of thematic classification accuracy
,
1997
.
[4]
Lior Rokach,et al.
Ensemble-based classifiers
,
2010,
Artificial Intelligence Review.
[5]
Marcos M. Campos,et al.
SVM in Oracle Database 10g: Removing the Barriers to Widespread Adoption of Support Vector Machines
,
2005,
VLDB.
[6]
R. Polikar,et al.
Ensemble based systems in decision making
,
2006,
IEEE Circuits and Systems Magazine.