Automatic Webpage Briefing

We introduce the task of webpage briefing (WB) to provide a summary of a webpage in a hierarchical manner, from the broad topic of the webpage, to finer level key attributes. A straightforward approach for this task is to train a machine learning model for generating topics and extracting key attributes. However, such a model may not perform well on webpages that are from domains not seen in the training data. An ideal model should be able to adapt to unseen domains while preserving knowledge learned from the seen domains. Knowledge distillation (KD) offers a potential solution, in which a teacher pre-trained with specific domains can pass the knowledge to a student, while unseen domains can also be added to increase the robustness of the models. However, existing works usually assume the models have no access to seen domains during distillation and the knowledge on seen domains may be lost. In our setting, we have access to the generated topics, which contain representative knowledge of seen domains and can help preserve that knowledge during distillation. Moreover, a vanilla KD does not pass on the knowledge about the location patterns of the informative contents in webpages, which are essential for identifying the topics to be generated or the key attributes to be extracted. To preserve more knowledge of seen domains and to better utilize the location patterns, we propose a Dual Distillation model which consists of identification distillation (ID) and understanding distillation (UD); ID distills knowledge on the identification of informative contents under the guidance of the learned topics of seen domains, while UD distills knowledge on topic generation or key attribute extraction. Since topics and key attributes are distilled separately in two students in Dual Distillation, the inherent correlations between them are not utilized. To better exploit such correlations, we propose a Triple Distillation model which consists of a shared ID and two UDs, one for topic generation and the other for key attribute extraction. We further propose a joint model for WB with signal enhancement and exchange among a key attribute extractor, a topic generator, and an informative section predictor. Experiments on real-world webpages show that our models achieve high performances for WB, and validate the superiority of Dual Distillation and Triple Distillation in their target settings. Experiments also show that the proposed joint model outperforms single-task baselines and other joint models.

[1]  Pietro Zanuttigh,et al.  Knowledge Distillation for Incremental Learning in Semantic Segmentation , 2019, Comput. Vis. Image Underst..

[2]  Rui Zhang,et al.  Person Name Recognition with Fine-grained Annotation , 2020, JCDL.

[3]  Si Sun,et al.  Joint Keyphrase Chunking and Salience Ranking with BERT , 2020, ArXiv.

[4]  Rui Zhang,et al.  Joint Recognition of Names and Publications in Academic Homepages , 2020, WSDM.

[5]  Sujith Ravi,et al.  Learning from a Teacher using Unlabeled Data , 2019, ArXiv.

[6]  Chenyan Xiong,et al.  Open Domain Web Keyphrase Extraction Beyond Language Modeling , 2019, EMNLP.

[7]  Mirella Lapata,et al.  Text Summarization with Pretrained Encoders , 2019, EMNLP.

[8]  Qi Tian,et al.  An End-to-End Architecture for Class-Incremental Object Detection with Knowledge Distillation , 2019, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[9]  Amos Storkey,et al.  Zero-shot Knowledge Transfer via Adversarial Belief Matching , 2019, NeurIPS.

[10]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[11]  Xuan-Hieu Phan,et al.  Exploiting User Posts for Web Document Summarization , 2018, ACM Trans. Knowl. Discov. Data.

[12]  Houfeng Wang,et al.  Joint Learning for Targeted Sentiment Analysis , 2018, EMNLP.

[13]  Shuguang Han,et al.  Deep Keyphrase Generation , 2017, ACL.

[14]  Li Zhao,et al.  Attention-based LSTM for Aspect-level Sentiment Classification , 2016, EMNLP.

[15]  Bowen Zhou,et al.  Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.

[16]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[17]  Keishi Tajima,et al.  Extracting Logical Hierarchical Structure of HTML Documents Based on Headings , 2015, Proc. VLDB Endow..

[18]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[19]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[20]  Nadeem Akhtar,et al.  Visual and textual summarization of webpages , 2014, 2014 International Conference on Data Mining and Intelligent Computing (ICDMIC).

[21]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[22]  Thierry Poibeau,et al.  Multi-source, Multilingual Information Extraction and Summarization , 2012, Theory and Applications of Natural Language Processing.

[23]  Thierry Poibeau,et al.  Automatic Text Summarization: Past, Present and Future , 2013, Multi-source, Multilingual Information Extraction and Summarization.

[24]  Qiang Hao,et al.  From one tree to a forest: a unified solution for structured web data extraction , 2011, SIGIR.

[25]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[26]  Edleno Silva de Moura,et al.  Structure-driven crawler generation by example , 2006, SIGIR.

[27]  Bernadette Bouchon-Meunier,et al.  Enhanced web document summarization using hyperlinks , 2003, HYPERTEXT '03.

[28]  Marc Moens,et al.  Articles Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status , 2002, CL.

[29]  Vibhu O. Mittal,et al.  OCELOT: a system for summarizing Web pages , 2000, SIGIR '00.