Cross-Domain Opinion Word Identification with Query-By-Committee Active Learning

Opinion word identification (OWI). is an important task for opinion mining. In OWI, it is necessary to find the exact positions of opinion word mentions. Supervised learning approaches can locate such mentions with high accuracy. To construct an OWI system for a new domain, it is necessary to annotate sufficient amounts of data to represent the new domain’s characteristics. However, since annotating every new domain extensively is costly, how to best utilize existing annotated data is a very important challenge for mention-based OWI systems. In this work, we propose a cross-domain OWI system. The query by committee (QBC) active learning scheme is used to select controlled amounts of data in the new domain for manual annotation. This new annotated data is used to complement the existing annotated data of the original domain. We compile three annotated datasets, each for one of three different domains, and conduct domain adaptation experiments on all six domain pairs. Our experiments show that by adding only 1,000 newly annotated sentences from the new domain to the existing annotated data, our system can achieve nearly the same level of accuracy as a system trained on 10,000 annotated new-domain sentences. Our system with the QBC active learning scheme also outperforms the same system with a random selection scheme.