Detecting Institutional Dialog Acts in Police Traffic Stops

We apply computational dialog methods to police body-worn camera footage to model conversations between police officers and community members in traffic stops. Relying on the theory of institutional talk, we develop a labeling scheme for police speech during traffic stops, and a tagger to detect institutional dialog acts (Reasons, Searches, Offering Help) from transcribed text at the turn (78% F-score) and stop (89% F-score) level. We then develop speech recognition and segmentation algorithms to detect these acts at the stop level from raw camera audio (81% F-score, with even higher accuracy for crucial acts like conveying the reason for the stop). We demonstrate that the dialog structures produced by our tagger could reveal whether officers follow law enforcement norms like introducing themselves, explaining the reason for the stop, and asking permission for searches. This work may therefore inform and aid efforts to ensure the procedural justice of police-community interactions.

[1]  Augusto Gnisci,et al.  Sequential strategies of accommodation: a new method in courtroom. , 2005, The British journal of social psychology.

[2]  Lukás Burget,et al.  Semi-supervised training of Deep Neural Networks , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[3]  Jure Leskovec,et al.  Large-scale Analysis of Counseling Conversations: An Application of Natural Language Processing to Mental Health , 2016, TACL.

[4]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[5]  David Miller,et al.  The Fisher Corpus: a Resource for the Next Generations of Speech-to-Text , 2004, LREC.

[6]  Regina Barzilay,et al.  Bayesian Unsupervised Topic Segmentation , 2008, EMNLP.

[7]  Rod K. Brunson,et al.  “POLICE DON'T LIKE BLACK PEOPLE”: AFRICAN‐AMERICAN YOUNG MEN'S ACCUMULATED POLICE EXPERIENCES* , 2007 .

[8]  Owen Rambow,et al.  Improving the Quality of Minority Class Identification in Dialog Act Tagging , 2013, NAACL.

[9]  H. Giles,et al.  Accommodation theory: Communication, context, and consequence. , 1991 .

[10]  Timothy Baldwin,et al.  Classifying Dialogue Acts in Multi-party Live Chats , 2012, PACLIC.

[11]  President's Task Force on st Century Policing Final Report of The President's Task Force on 21st Century Policing , 2015 .

[12]  Marilyn R. Whalen,et al.  Sequential and Institutional Contexts in Calls for Help , 1987 .

[13]  Owen Rambow,et al.  Gender and Power: How Gender and Gender Environment Affect Manifestations of Power , 2014, EMNLP.

[14]  Daniel P. W. Ellis,et al.  Speech/music discrimination based on posterior probability features , 1999, EUROSPEECH.

[15]  Michael J. Paul Mixed Membership Markov Models for Unsupervised Conversation Modeling , 2012, EMNLP.

[16]  Richard J. Lundman,et al.  DRIVING WHILE BLACK: EFFECTS OF RACE, ETHNICITY, AND GENDER ON CITIZEN SELF‐REPORTS OF TRAFFIC STOPS AND POLICE ACTIONS* , 2003 .

[17]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[18]  Yun Lei,et al.  Using Context Information for Dialog Act Classification in DNN Framework , 2017, EMNLP.

[19]  Kirk Miller,et al.  Pulled Over: How Police Stops Define Race and Citizenship , 2016 .

[20]  Elizabeth Shriberg,et al.  Automatic dialog act segmentation and classification in multiparty meetings , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[21]  David M. Blei,et al.  Topic segmentation with an aspect hidden Markov model , 2001, SIGIR '01.

[22]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[23]  Wei Li,et al.  Multi-level Gated Recurrent Neural Network for dialog act classification , 2016, COLING.

[24]  Philip Resnik,et al.  SITS: A Hierarchical Nonparametric Model using Speaker Identity for Topic Segmentation in Multiparty Conversations , 2012, ACL.

[25]  Andrew W. Senior,et al.  Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.

[26]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[27]  Geoffrey E. Hinton,et al.  Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[28]  E. Schegloff,et al.  Opening up Closings , 1973 .

[29]  Timothy Baldwin,et al.  Classifying Dialogue Acts in One-on-One Live Chats , 2010, EMNLP.

[30]  Rodney D. Nielsen,et al.  Dialogue Act Classification in Domain-Independent Conversations Using a Deep Recurrent Neural Network , 2016, COLING.

[31]  Tom M. Mitchell,et al.  Learning to Classify Email into “Speech Acts” , 2004, EMNLP.

[32]  Gary King,et al.  A Method of Automated Nonparametric Content Analysis for Social Science , 2010 .

[33]  Prateek Verma,et al.  Structural segmentation of Hindustani concert audio with posterior features , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[34]  Min-Yen Kan,et al.  Using Discourse Signals for Robust Instructor Intervention Prediction , 2017, AAAI.

[35]  Carolyn Penstein Rosé,et al.  “ Turn on , Tune in , Drop out ” : Anticipating student dropouts in Massive Open Online Courses , 2013 .

[36]  E. Schegloff,et al.  A simplest systematics for the organization of turn-taking for conversation , 1974 .

[37]  Sanjeev Khudanpur,et al.  Audio augmentation for speech recognition , 2015, INTERSPEECH.

[38]  Csr Young,et al.  How to Do Things With Words , 2009 .

[39]  Geoffrey Zweig,et al.  Deep bi-directional recurrent networks over spectral windows , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[40]  René M. Dailey,et al.  Communication Accommodation: Law Enforcement and the Public , 2005 .

[41]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[42]  Valerie Barker,et al.  Accommodation and Institutional Talk: Communicative Dimensions of Police—Civilian Interactions , 2007 .

[43]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[44]  Jürgen Schmidhuber,et al.  Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..

[45]  Andreas Stolcke,et al.  Dialogue act modeling for automatic tagging and recognition of conversational speech , 2000, CL.

[46]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[47]  William L. Hamilton,et al.  Language from police body camera footage shows racial disparities in officer respect , 2017, Proceedings of the National Academy of Sciences.

[48]  A. Stolcke,et al.  Automatic detection of discourse structure for speech recognition and understanding , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[49]  Robin S. Engel Citizens' Perceptions of Distributive and Procedural Injustice During Traffic Stops with Police , 2005 .

[50]  Frank Burton,et al.  Order in Court , 1979, The Routledge Handbook of Forensic Linguistics.

[51]  Owen Rambow,et al.  Predicting Overt Display of Power in Written Dialogs , 2012, NAACL.

[52]  Alan Ritter,et al.  Unsupervised Modeling of Twitter Conversations , 2010, NAACL.

[53]  Barbara J. Grosz,et al.  The representation and use of focus in dialogue understanding. , 1977 .