Write, Attend and Spell

Text entry on a smartwatch is challenging due to its small form factor. Handwriting recognition using the built-in sensors of the watch (motion sensors, microphones, etc.) provides an efficient and natural solution to deal with this issue. However, prior works mainly focus on individual letter recognition rather than word recognition. Therefore, they need users to pause between adjacent letters for segmentation, which is counter-intuitive and significantly decreases the input speed. In this paper, we present 'Write, Attend and Spell' (WriteAS), a word-level text-entry system which enables free-style handwriting recognition using the motion signals of the smartwatch. First, we design a multimodal convolutional neural network (CNN) to abstract motion features across modalities. After that, a stacked dilated convolutional network with an encoder-decoder network is applied to get around letter segmentation and output words in an end-to-end way. More importantly, we leverage a multi-task sequence learning method to enable handwriting recognition in a streaming way. We construct the first sequence-to-sequence handwriting dataset using smartwatch. WriteAS can yield 9.3% character error rate (CER) on 250 words for new users and 3.8% CER for words unseen in the training set. In addition, WriteAS can handle various writing conditions very well. Given the promising performance, we envision that WriteAS can be a fast and accurate input tool for smartwatch.

[1]  I. Scott MacKenzie,et al.  Text Entry for Mobile Computing: Models and Methods,Theory and Practice , 2002, Hum. Comput. Interact..

[2]  Yixin Chen,et al.  SHOW , 2018, Silent Cinema.

[3]  S3 , 2021 .

[4]  Quinn Jones,et al.  Few-Shot Adversarial Domain Adaptation , 2017, NIPS.

[5]  Mani Srivastava,et al.  MiLift: Efficient Smartwatch-Based Workout Tracking Using Automatic Segmentation , 2018, IEEE Transactions on Mobile Computing.

[6]  Geehyuk Lee,et al.  Typing on a Smartwatch for Smart Glasses , 2017, ISS.

[7]  Yuan Feng,et al.  MotionHacker: Motion sensor based eavesdropping on handwriting via smartwatch , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[8]  Kaishun Wu,et al.  A Low Latency On-Body Typing System through Single Vibration Sensor , 2020, IEEE Transactions on Mobile Computing.

[9]  Quoc V. Le,et al.  SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.

[10]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[11]  Kaishun Wu,et al.  EchoWrite: An Acoustic-based Finger Input System Without Training , 2019, 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS).

[12]  Qian Zhang,et al.  ASSV: Handwritten Signature Verification Using Acoustic Signals , 2019, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[13]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[14]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[15]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[16]  Xiang 'Anthony' Chen,et al.  WrisText: One-handed Text Entry on Smartwatch using Wrist Gestures , 2018, CHI.

[17]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[18]  Per Ola Kristensson,et al.  The Impact of Word, Multiple Word, and Sentence Input on Virtual Keyboard Decoding Performance , 2018, CHI.

[19]  Per Ola Kristensson,et al.  VelociWatch: Designing and Evaluating a Virtual Keyboard for the Input of Challenging Text , 2019, CHI.

[20]  Evangelos Kalogerakis,et al.  RisQ: recognizing smoking gestures with inertial sensors on a wristband , 2014, MobiSys.

[21]  Wenzhong Li,et al.  AttnSense: Multi-level Attention Mechanism For Multimodal Human Activity Recognition , 2019, IJCAI.

[22]  Hisashi Kawai,et al.  An Investigation of a Knowledge Distillation Method for CTC Acoustic Models , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Jaime Teevan,et al.  WearWrite: Crowd-Assisted Writing from Smartwatches , 2016, CHI.

[24]  Alexander M. Rush,et al.  Sequence-Level Knowledge Distillation , 2016, EMNLP.

[25]  He Wang,et al.  I am a Smartwatch and I can Track my User's Arm , 2016, MobiSys.

[26]  Archan Misra,et al.  Smartwatch-based Early Gesture Detection 8 Trajectory Tracking for Interactive Gesture-Driven Applications , 2018, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[27]  Xue Liu,et al.  SafeDrive , 2018, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[28]  Tinghan Yang,et al.  S3 , 2021, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[29]  Mark W. Newman,et al.  Toward Lightweight In-situ Self-reporting , 2020, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[30]  Run Zhao,et al.  MType: A Magnetic Field-based Typing System on the Hand for Around-Device Interaction , 2019, 2019 16th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON).

[31]  Qian Zhang,et al.  MyoSign: enabling end-to-end sign language recognition with wearables , 2019, IUI.

[32]  Tao Gu,et al.  AirContour , 2019, ACM Transactions on Sensor Networks.

[33]  Mark W. Newman,et al.  Heed: Exploring the Design of Situated Self-Reporting Devices , 2018, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[34]  Ravin Balakrishnan,et al.  DualKey: Miniature Screen Text Entry via Finger Identification , 2016, CHI.

[35]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[36]  Gregory D. Abowd,et al.  TapSkin: Recognizing On-Skin Input for Smartwatches , 2016, ISS.

[37]  Geehyuk Lee,et al.  SplitBoard: A Simple Split Soft Keyboard for Wristwatch-sized Touch Screens , 2015, CHI.

[38]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[39]  Chen Wang,et al.  PalmBoard: Leveraging Implicit Touch Pressure in Statistical Decoding for Indirect Text Entry , 2020, CHI.

[40]  Lior Wolf,et al.  CNN-N-Gram for HandwritingWord Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Panlong Yang,et al.  WordRecorder: Accurate Acoustic-based Handwriting Recognition Using Deep Learning , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[42]  Klara Nahrstedt,et al.  Mobile Devices based Eavesdropping of Handwriting , 2020, IEEE Transactions on Mobile Computing.

[43]  Chenglin Miao,et al.  DeepFusion: A Deep Learning Framework for the Fusion of Heterogeneous Sensory Data , 2019, MobiHoc.

[44]  Stephen A. Brewster,et al.  Gesture and voice prototyping for early evaluations of social acceptability in multimodal interfaces , 2010, ICMI-MLMI '10.

[45]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[46]  I. Scott MacKenzie,et al.  Phrase sets for evaluating text entry techniques , 2003, CHI Extended Abstracts.

[47]  Quoc V. Le,et al.  Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[48]  Wei Wang,et al.  Device-free gesture tracking using acoustic signals , 2016, MobiCom.

[49]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[50]  Ian Oakley,et al.  Fingers and Angles , 2018, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[51]  Carl Gutwin,et al.  Faster Command Selection on Touchscreen Watches , 2016, CHI.

[52]  Alex Graves,et al.  Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.

[53]  Amy Ogan,et al.  ZoomBoard: a diminutive qwerty soft keyboard using iterative zooming for ultra-small devices , 2013, CHI.

[54]  Panlong Yang,et al.  Your Table Can Be an Input Panel , 2019, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[55]  Navdeep Jaitly,et al.  Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.

[56]  Michael Rohs,et al.  Pentelligence: Combining Pen Tip Motion and Writing Sounds for Handwritten Digit Recognition , 2018, CHI.

[57]  Vladlen Koltun,et al.  An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling , 2018, ArXiv.

[58]  Jong-Hwan Kim,et al.  Writing in the Air: Unconstrained Text Recognition From Finger Movement Using Spatio-Temporal Convolution , 2021, IEEE Transactions on Artificial Intelligence.

[59]  Boyuan Yang,et al.  MagHacker: eavesdropping on stylus pen writing via magnetic sensing from commodity mobile devices , 2020, MobiSys.

[60]  Per Ola Kristensson,et al.  VelociTap: Investigating Fast Mobile Text Entry using Sentence-Based Decoding of Touchscreen Keyboard Input , 2015, CHI.

[61]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[62]  Panlong Yang,et al.  SignSpeaker: A Real-time, High-Precision SmartWatch-based Sign Language Translator , 2019, MobiCom.

[63]  Honglak Lee,et al.  Improved Multimodal Deep Learning with Variation of Information , 2014, NIPS.

[64]  Victor Carbune,et al.  Fast multi-language LSTM-based online handwriting recognition , 2020, International Journal on Document Analysis and Recognition (IJDAR).

[65]  Barry A. T. Brown,et al.  100 days of iPhone use: understanding the details of mobile device use , 2014, MobileHCI '14.

[66]  MaHuadong,et al.  Learning to Recognize Handwriting Input with Acoustic Features , 2020 .

[67]  Colin Raffel,et al.  Monotonic Chunkwise Attention , 2017, ICLR.

[68]  Anna L. Cox,et al.  Always On(line)?: User Experience of Smartwatches and their Role within Multi-Device Ecologies , 2017, CHI.