Doc2Dial: A Framework for Dialogue Composition Grounded in Documents

We introduce Doc2Dial, an end-to-end framework for generating conversational data grounded in given documents. It takes the documents as input and generates the pipelined tasks for obtaining the annotations specifically for producing the simulated dialog flows. Then, the dialog flows are used to guide the collection of the utterances via the integrated crowdsourcing tool. The outcomes include the human-human dialogue data grounded in the given documents, as well as various types of automatically or human labeled annotations that help ensure the quality of the dialog data with the flexibility to (re)composite dialogues. We expect such data can facilitate building automated dialogue agents for goal-oriented tasks. We demonstrate Doc2Dial system with the various domain documents for customer care.

[1]  Guillaume Bouchard,et al.  Interpretation of Natural Language Rules in Conversational Machine Reading , 2018, EMNLP.

[2]  Manfred Stede,et al.  Constructing a Lexicon of English Discourse Connectives , 2018, SIGDIAL Conference.

[3]  I. V. Ramakrishnan,et al.  Automatic discovery of semantic structures in HTML documents , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[4]  Eunsol Choi,et al.  QuAC: Question Answering in Context , 2018, EMNLP.

[5]  Danqi Chen,et al.  CoQA: A Conversational Question Answering Challenge , 2018, TACL.