A Sentence-Wide Collocation Recommendation System with Error Detection for Academic Writing

Collocation plays an important role in English article writing. This research builds a collocation corpus for academic writings in engineering and science fields. Based on the collocation corpus, this research also establishes a sentence-wide collocation recommendation and error detection system for academic writing. The corpus is built from Science Citation Index (SCI) papers and industry field thesis, which are collected and processed by a formal procedure developed in this research. The first step of the procedure uses the Stanford Parser to parse and retrieve collocations sentence by sentence from those papers and thesis. The second step classifies these collected collocations in different types and gathers their information to establish a collocation corpus specifically for academic article writings. The use of the corpus is through a web-based collocation system built in this study. Distinguished from other collocation systems found on the web nowadays, the system can do full sentence collocation error detections and recommendations. After several conducted experiments, the system is proved capable of giving satisfied feedbacks and recommendations for scientific article authors. Although the collocation corpus now is not complete enough to give the most precise results, the formal procedure can still keep enhancing the corpus and improving the system by automatically collecting articles from various fields.