Information Measure Similarity Theory: Message Importance Measure via Shannon Entropy

Rare events attract more attention and interests in many scenarios of big data such as anomaly detection and security systems. To characterize the rare events importance from probabilistic perspective, the message importance measure (MIM) is proposed as a kind of semantics analysis tool. Similar to Shannon entropy, the MIM has its special functional on information processing, in which the parameter $\varpi$ of MIM plays a vital role. Actually, the parameter $\varpi$ dominates the properties of MIM, based on which the MIM has three work regions where the corresponding parameters satisfy $ 0 \le \varpi \le 2/\max\{p(x_i)\}$, $\varpi > 2/\max\{p(x_i)\}$ and $\varpi < 0$ respectively. Furthermore, in the case $ 0 \le \varpi \le 2/\max\{p(x_i)\}$, there are some similarity between the MIM and Shannon entropy in the information compression and transmission, which provide a new viewpoint for information theory. This paper first constructs a system model with message importance measure and proposes the message importance loss to enrich the information processing strategies. Moreover, we propose the message importance loss capacity to measure the information importance harvest in a transmission. Furthermore, the message importance distortion function is presented to give an upper bound of information compression based on message importance measure. Additionally, the bitrate transmission constrained by the message importance loss is investigated to broaden the scope for Shannon information theory.

[1]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[2]  Neri Merhav,et al.  A large deviations approach to secure lossy compression , 2015, 2016 IEEE International Symposium on Information Theory (ISIT).

[3]  Xinyu Yang,et al.  A Survey on Internet of Things: Architecture, Enabling Technologies, Security and Privacy, and Applications , 2017, IEEE Internet of Things Journal.

[4]  Pingyi Fan,et al.  Focusing on a probability element: Parameter selection of message importance measure in big data , 2017, 2017 IEEE International Conference on Communications (ICC).

[5]  Rafael Prieto Curiel,et al.  A measure of the concentration of rare events , 2016, Scientific Reports.

[6]  Miriam A. M. Capretz,et al.  Machine Learning With Big Data: Challenges and Approaches , 2017, IEEE Access.

[7]  Zhanyu Ma,et al.  Text-Independent Speaker Identification Using the Histogram Transform Model , 2016, IEEE Access.

[8]  Michael Devetsikiotis,et al.  Blockchains and Smart Contracts for the Internet of Things , 2016, IEEE Access.

[9]  Vikram Garaniya,et al.  Nonlinear Gaussian Belief Network based fault diagnosis for industrial processes , 2015 .

[10]  S. Lokhande,et al.  An improved lane departure method for Advanced Driver Assistance System , 2012, 2012 International Conference on Computing, Communication and Applications.

[11]  Einoshin Suzuki,et al.  An Information Theoretic Approach to Detection of Minority Subsets in Database , 2006, Sixth International Conference on Data Mining (ICDM'06).

[12]  Michael Baldea,et al.  An improved methodology for outlier detection in dynamic datasets , 2015 .

[13]  Farid Kadri,et al.  Improved principal component analysis for anomaly detection: Application to an emergency department , 2015, Comput. Ind. Eng..

[14]  Orestes Llanes-Santiago,et al.  Principal components selection for dimensionality reduction using discriminant information applied to fault diagnosis , 2015 .

[15]  Xue-wen Chen,et al.  Big Data Deep Learning: Challenges and Perspectives , 2014, IEEE Access.

[16]  Thomas M. Cover,et al.  Elements of information theory (2. ed.) , 2006 .

[17]  Houbing Song,et al.  Internet of Things and Big Data Analytics for Smart and Connected Communities , 2016, IEEE Access.

[18]  Pingyi Fan,et al.  Non-Parametric Message Importance Measure: Storage Code Design and Transmission Planning for Big Data , 2017, IEEE Transactions on Communications.

[19]  Yonggang Wen,et al.  Toward Scalable Systems for Big Data Analytics: A Technology Tutorial , 2014, IEEE Access.

[20]  Sebastian Ramos,et al.  Detecting unexpected obstacles for self-driving cars: Fusing deep learning and geometric modeling , 2016, 2017 IEEE Intelligent Vehicles Symposium (IV).

[21]  Wei Zhao,et al.  Design and Realization of WInternet , 2016, ACM Trans. Cyber Phys. Syst..

[22]  Pingyi Fan,et al.  Differential Message Importance Measure: A New Approach to the Required Sampling Number in Big Data Structure Characterization , 2018, IEEE Access.

[23]  A. Rényi On Measures of Entropy and Information , 1961 .

[24]  Jun Guo,et al.  DNN Filter Bank Cepstral Coefficients for Spoofing Detection , 2017, IEEE Access.

[25]  Chunxiao Jiang,et al.  Information Security in Big Data: Privacy and Data Mining , 2014, IEEE Access.

[26]  Sneha A. Dalvi,et al.  Internet of Things for Smart Cities , 2017 .

[27]  Jaerock Kwon,et al.  Lane following and obstacle detection techniques in autonomous driving vehicles , 2016, 2016 IEEE International Conference on Electro Information Technology (EIT).

[28]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[29]  Dong Xiang,et al.  Information-theoretic measures for anomaly detection , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[30]  Hitesh Shah,et al.  An anomaly detection in smart cities modeled as wireless sensor network , 2016, 2016 International Conference on Signal and Information Processing (IConSIP).

[31]  H. Touchette The large deviation approach to statistical mechanics , 2008, 0804.0327.

[32]  Pingyi Fan,et al.  Message Importance Measure and Its Application to Minority Subset Detection in Big Data , 2016, 2016 IEEE Globecom Workshops (GC Wkshps).

[33]  Samuel Cheng,et al.  On the Rate-Distortion Function for Binary Source Coding With Side Information , 2016, IEEE Transactions on Communications.