Enhanced Representation Learning for Examination Papers with Hierarchical Document Structure

Representation learning of examination papers is the cornerstone of the Examination Paper Analysis (EPA) in education area including Paper Difficulty Prediction (PDR) and Finding Similar Papers (FSP). Previous works mainly focus on the representation learning of each test item, but few works notice the hierarchical document structure in examination papers. To this end, in this paper, we propose a novel Examination Organization Encoder (EOE) to learn a robust representation of the examination paper with the hierarchical document structure. Specifically, we first propose a syntax parser to recover the hierarchical document structure and convert an examination paper to an Examination Organization Tree (EOT), where the test items are the leaf nodes and the internal nodes are summarization of their child nodes. Then, we applied a two-layer GRU-based module to obtain the representation of each leaf node. After that, we design a subtree encoder module to aggregate the representation of each leaf node, which is used to calculate an embedding for each layer in the EOT. Finally, we feed all the layer embedding into an output module, the process is over and we get the examination paper representation that can be used for downstream tasks. Extensive experiments on real-world data demonstrate the effectiveness and interpretability of our method.