SpeakNav: Voice-based Route Description Language Understanding for Template Driven Path Search

Many navigation applications take natural language speech as input, which avoids users typing in words and thus improves traffic safety. However, navigation applications often fail to understand a user’s free-form description of a route. In addition, they only support input of a specific source or destination, which does not enable users to specify additional route requirements. We propose a SpeakNav framework that enables users to describe intended routes via speech and then recommends appropriate routes. Specifically, we propose a novel Route Template based Bidirectional Encoder Representation from Transformers (RT-BERT) model that supports the understanding of natural language route descriptions. The model enables extraction of information of intended POI keywords and related distances. Then we formalize a template-driven path query that uses the extracted information. To enable efficient query processing, we develop a hybrid label index for computing network distances between POIs, and we propose a branch-and-bound algorithm along with a pivot reverse B-tree (PB-tree) index. Experiments with real and synthetic data indicate that RT-BERT offers high accuracy and that the proposed algorithm is capable of outperforming baseline algorithms. PVLDB Reference Format: Bolong Zheng, Lei Bi, Juan Cao, Hua Chai, Jun Fang, Lu Chen, Yunjun Gao, Xiaofang Zhou, Christian S. Jensen. SpeakNav: Voice-based Route Description Language Understanding for Template-driven Path Search. PVLDB, 14(12): 3056 3068, 2021. doi:10.14778/3476311.3476383

[1]  Francesco Caltagirone,et al.  Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces , 2018, ArXiv.

[2]  Feifei Li,et al.  Multi-approximate-keyword routing in GIS data , 2011, GIS.

[3]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[4]  Gökhan Tür,et al.  Optimizing SVMs for complex call classification , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[5]  Xiang Cheng,et al.  Group-based keyword-aware route querying in road networks , 2018, Inf. Sci..

[6]  Xilin Chen,et al.  Self-Supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Wen Wang,et al.  BERT for Joint Intent Classification and Slot Filling , 2019, ArXiv.

[8]  Xiaokui Xiao,et al.  Keyword-aware Optimal Route Search , 2012, Proc. VLDB Endow..

[9]  Andrew V. Goldberg,et al.  A Hub-Based Labeling Algorithm for Shortest Paths in Road Networks , 2011, SEA.

[10]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[11]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[12]  Bing Liu,et al.  Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling , 2016, INTERSPEECH.

[13]  Kai Zheng,et al.  Efficient Clue-Based Route Search on Road Networks , 2017, IEEE Transactions on Knowledge and Data Engineering.

[14]  George R. Doddington,et al.  The ATIS Spoken Language Systems Pilot Corpus , 1990, HLT.

[15]  Gökhan Tür,et al.  Multi-Domain Joint Semantic Frame Parsing Using Bi-Directional RNN-LSTM , 2016, INTERSPEECH.

[16]  Hironobu Fujiyoshi,et al.  Attention Branch Network: Learning of Attention Mechanism for Visual Explanation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Meina Song,et al.  A Novel Bi-directional Interrelated Model for Joint Intent Detection and Slot Filling , 2019, ACL.

[18]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[19]  Geoffrey Zweig,et al.  Joint semantic utterance classification and slot filling with recursive neural networks , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[20]  Christian S. Jensen,et al.  SpeakNav: A Voice-based Navigation System via Route Description Language Understanding , 2021, 2021 IEEE 37th International Conference on Data Engineering (ICDE).

[21]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[22]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[23]  Giuseppe Riccardi,et al.  Generative and discriminative algorithms for spoken language understanding , 2007, INTERSPEECH.

[24]  Geoffrey Zweig,et al.  Recurrent neural networks for language understanding , 2013, INTERSPEECH.

[25]  Man Lung Yiu,et al.  An Experimental Study on Hub Labeling based Shortest Path Algorithms , 2017, Proc. VLDB Endow..

[26]  Andreas Stolcke,et al.  Recurrent neural network and LSTM models for lexical utterance classification , 2015, INTERSPEECH.

[27]  Chih-Li Huo,et al.  Slot-Gated Modeling for Joint Slot Filling and Intent Prediction , 2018, NAACL.

[28]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[29]  Takuya Akiba,et al.  Fast exact shortest-path distance queries on large networks by pruned landmark labeling , 2013, SIGMOD '13.

[30]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .