Identification scheme for romanized Indian languages from their plain and ciphered bit stream

Abstract Identification of the Indian languages, when they are communicated in their plain bit stream after Romanizing their script has been dealt. An Attempt has also been made to identify them from their enciphered bit stream obtained through standard encryption schemes. In this context plain and cipher bit stream of four Indian languages viz. Hindi, Punjabi, Oriya and Bengali have been studied. A novice method proposed earlier [6] has been extended for extraction of statistical features. Several other feature extraction and features selection technique have been used for experimenting with four classifiers and finally the results are summarized. Maximum Likelihood Classifier (MLC) has performed better than Minimum Distance Classifier (MDC), Linear Statistical Classifier (LSC) and Piecewise Linear Classifier (PLC) in terms of performance accuracy and consistency.

[1]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  D. A. Bell,et al.  Applied Statistics , 1953, Nature.

[3]  M. Thomason Interactive Pattern Recognition , 1981, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[5]  Neelam Verma,et al.  An effective source recognition algorithm: extraction of significant binary words , 2000, Pattern Recognit. Lett..

[6]  염흥렬,et al.  [서평]「Applied Cryptography」 , 1997 .

[7]  J. Wade Davis,et al.  Statistical Pattern Recognition , 2003, Technometrics.

[8]  William Stallings,et al.  Cryptography and Network Security: Principles and Practice , 1998 .