EchoPrint: Two-factor Authentication using Acoustics and Vision on Smartphones

User authentication on smartphones must satisfy both security and convenience, an inherently difficult balancing art. Apple's FaceID is arguably the latest of such efforts, at the cost of additional hardware (e.g., dot projector, flood illuminator and infrared camera). We propose a novel user authentication system EchoPrint, which leverages acoustics and vision for secure and convenient user authentication, without requiring any special hardware. EchoPrint actively emits almost inaudible acoustic signals from the earpiece speaker to "illuminate" the user's face and authenticates the user by the unique features extracted from the echoes bouncing off the 3D facial contour. To combat changes in phone-holding poses thus echoes, a Convolutional Neural Network (CNN) is trained to extract reliable acoustic features, which are further combined with visual facial landmark locations to feed a binary Support Vector Machine (SVM) classifier for final authentication. Because the echo features depend on 3D facial geometries, EchoPrint is not easily spoofed by images or videos like 2D visual face recognition systems. It needs only commodity hardware, thus avoiding the extra costs of special sensors in solutions like FaceID. Experiments with 62 volunteers and non-human objects such as images, photos, and sculptures show that EchoPrint achieves 93.75% balanced accuracy and 93.50% F-score, while the average precision is 98.05%, and no image/video based attack is observed to succeed in spoofing.

[1]  Derya Birant,et al.  ST-DBSCAN: An algorithm for clustering spatial-temporal data , 2007, Data Knowl. Eng..

[2]  Mahadev Satyanarayanan,et al.  OpenFace: A general-purpose face recognition library with mobile applications , 2016 .

[3]  Nguyen Minh Duc Your face is NOT your password Face Authentication ByPassing Lenovo – Asus – Toshiba , 2009 .

[4]  Desney S. Tan,et al.  FingerIO: Using Active Sonar for Fine-Grained Finger Tracking , 2016, CHI.

[5]  Bing Zhou,et al.  BatTracker: High Precision Infrastructure-free Mobile Device Tracking in Indoor Environments , 2017, SenSys.

[6]  Jie Yang,et al.  Push the limit of WiFi based localization for smartphones , 2012, Mobicom '12.

[7]  David Chu,et al.  SwordFight: enabling a new class of phone-to-phone action games on commodity phones , 2012, MobiSys '12.

[8]  Christoph Busch,et al.  Fingerphoto recognition with smartphone cameras , 2012, 2012 BIOSIG - Proceedings of the International Conference of Biometrics Special Interest Group (BIOSIG).

[9]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[10]  Guobin Shen,et al.  BeepBeep: a high accuracy acoustic ranging system using COTS mobile devices , 2007, SenSys '07.

[11]  Archan Misra,et al.  BreathPrint: Breathing Acoustics-based User Authentication , 2017, MobiSys.

[12]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Kang G. Shin,et al.  EchoTag: Accurate Infrastructure-Free Indoor Location Tagging with Smartphones , 2015, MobiCom.

[14]  Phillip J. McKerrow,et al.  Classifying still faces with ultrasonic sensing , 2007, Robotics Auton. Syst..

[15]  Shiwen Mao,et al.  SonarBeat: Sonar Phase for Breathing Beat Monitoring with Smartphones , 2017, 2017 26th International Conference on Computer Communication and Networks (ICCCN).

[16]  Sercan Ömer Arik,et al.  Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning , 2017, ICLR.

[17]  Shyamnath Gollakota,et al.  Contactless Sleep Apnea Detection on Smartphones , 2015, GetMobile Mob. Comput. Commun..

[18]  Richard P. Martin,et al.  Detecting driver phone use leveraging car speakers , 2011, MobiCom.

[19]  Beth Logan,et al.  Mel Frequency Cepstral Coefficients for Music Modeling , 2000, ISMIR.

[20]  Emmanuel Ifeachor,et al.  Digital Signal Processing: A Practical Approach , 1993 .

[21]  Daniel Gatica-Perez,et al.  StressSense: detecting stress in unconstrained acoustic environments using smartphones , 2012, UbiComp.

[22]  Andrew Gerald Stove,et al.  Linear FMCW radar techniques , 1992 .

[23]  Wei Wang,et al.  Device-free gesture tracking using acoustic signals , 2016, MobiCom.

[24]  L. Calderón,et al.  Ultrasonic echoes from complex surfaces: An application to object recognition , 1992 .

[25]  Bing Zhou,et al.  BatMapper: Acoustic Sensing Based Indoor Floor Plan Construction Using Smartphones , 2017, MobiSys.

[26]  Anders Grunnet-Jepsen,et al.  Intel(R) RealSense(TM) Stereoscopic Depth Cameras , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[27]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[28]  Xinyu Zhang,et al.  Ubiquitous keyboard for small mobile devices: harnessing multipath fading for fine-grained keystroke localization , 2014, MobiSys.

[29]  Xiaolin Li,et al.  Guoguo: enabling fine-grained indoor localization via smartphone , 2013, MobiSys '13.

[30]  Itiel E. Dror,et al.  Using artificial bat sonar neural networks for complex pattern recognition: Recognizing faces and the speed of a moving target , 1996, Biological Cybernetics.

[31]  Konstantinos G. Derpanis,et al.  Overview of the RANSAC Algorithm , 2005 .

[32]  Serkan Gurkan,et al.  Fast face recognition: Eye blink as a reliable behavioral response , 2011, Neuroscience Letters.

[33]  Lili Qiu,et al.  CAT: high-precision acoustic motion tracking , 2016, MobiCom.

[34]  Jie Yang,et al.  Snooping Keystrokes with mm-level Audio Ranging on a Single Phone , 2015, MobiCom.

[35]  Wolfgang Göpel Quo vadis Eurosensors , 1992 .

[36]  Bhiksha Raj,et al.  Recognizing talking faces from acoustic Doppler reflections , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[37]  Mo Li,et al.  DopEnc: acoustic-based encounter profiling using smartphones , 2016, MobiCom.

[38]  Lie Lu,et al.  Music type classification by spectral contrast feature , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[39]  A. Erdélyi,et al.  On the Finite Difference Analogue of Rodrigues' Formula , 1952 .

[40]  A. Ant Ozok,et al.  A comparison of perceived and real shoulder-surfing risks between alphanumeric and graphical passwords , 2006, SOUPS '06.

[41]  Peter Robinson,et al.  OpenFace: An open source facial behavior analysis toolkit , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[42]  Paul Lukowicz,et al.  Symbolic Object Localization Through Active Sampling of Acceleration and Sound Signatures , 2007, UbiComp.

[43]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[44]  Bing Zhou,et al.  Demo: Acoustic Sensing Based Indoor Floor Plan Construction Using Smartphones , 2017, MobiCom.

[45]  Sangki Yun,et al.  Turning a Mobile Device into a Mouse in the Air , 2015, MobiSys.

[46]  Meinard Müller,et al.  Audio Matching via Chroma-Based Statistical Features , 2005, ISMIR.