Improving hidden Markov models with a similarity histogram for typing pattern biometrics

A highly feasible user-authentication biometric is to examine and identify the typing patterns exhibited among people in order to authenticate the users whenever during a user session. We interpret keystroke data as four-dimensional timing vectors, and clusters of keystroke vectors can thus form the static basis for analysis. Associated with the key transition is also a more dynamic probability distribution that can be used to form a Markov chain. For a special keystroke, there is ideally one cluster of vectors that resembles it; but since the actual observed cluster are usually formed from various different keys, only a set of probabilities to what a given key really is assured. This can be transcribed as a hidden state being observed in a hidden Markov model (HMM). In our previous research, we implemented a user-authentication process with a HMM that learned the special typing patterns from individuals, and then identified them. During the training stage, the timing information of each keystroke within a word was first gathered from each user as repeated words were typed. Then the key-transition and observation probability matrices for each user were built from the gathered typing data. Afterward, the model was tested with typing data from words that each user had entered separately to match their profiles. We chose to examine the typing patterns of users' login names since they would be the most frequently typed individual words. We found the HMM approach to be very suitable for our process due to the stochastic nature of typing patterns, and because it reveals pattern distributions and predicts possible keystroke sequences. However, an issue regarding the accuracy of observation being translated by the probability matrices was noted in the first-order non-ergodic HMM that was applied. Extended from our previous work, we propose a method to improve the process with a histogram of the similarity measured between the actual observation and the cluster centroid that it resembles. Such a histogram can be easily applied to higher-order models as the key factor when setting matching thresholds. The experimental results are thus improved due to the adaptive histogram in the process.

[1]  B. Miller,et al.  Vital signs of identity [biometrics] , 1994, IEEE Spectrum.

[2]  L. R. Rabiner,et al.  An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition , 1983, The Bell System Technical Journal.

[3]  Cheng Soon Ong,et al.  A comparison of artificial neural networks and cluster analysis for typing biometrics authentication , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[4]  R. Stockton Gaines,et al.  Authentication by Keystroke Timing , 1980 .

[5]  Jr. G. Forney,et al.  The viterbi algorithm , 1973 .

[6]  Kai-Fu Lee,et al.  On large-vocabulary speaker-independent continuous speech recognition , 1988, Speech Commun..

[7]  Daw-Tung Lin Computer-access authentication with neural network based keystroke identity verification , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[8]  L. R. Rabiner,et al.  Some properties of continuous hidden Markov model representations , 1985, AT&T Technical Journal.

[9]  Gopal K. Gupta,et al.  Identity authentication based on keystroke latencies , 1990, Commun. ACM.

[10]  Stephen E. Levinson,et al.  Continuously variable duration hidden Markov models for automatic speech recognition , 1986 .

[11]  Fabian Monrose,et al.  Authentication via keystroke dynamics , 1997, CCS '97.

[12]  Mohammad S. Obaidat,et al.  Verification of computer users using keystroke dynamics , 1997, IEEE Trans. Syst. Man Cybern. Part B.

[13]  Raj Reddy,et al.  Large-vocabulary speaker-independent continuous speech recognition: the sphinx system , 1988 .

[14]  L. Rabiner,et al.  An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[15]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[16]  Sungzoon Cho,et al.  GA-SVM wrapper approach for feature subset selection in keystroke dynamics identity verification , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[17]  Marcus Brown,et al.  A practical approach to user authentication , 1994, Tenth Annual Computer Security Applications Conference.