Digital Watermarking and Perceptual Hashing of Audio Signals with Focus on their Evaluation

Digital watermarking is a growing research area to mark digital content by embedding information into the content itself. Perceptual hashing is used to identify a specific content or to identify integrity violations up to a specific threshold. The evaluation of watermarking algorithms provides a fair and automated analysis of specific watermarking schemes for selected application fields. In this paper, we present a theoretical framework, the design and formalization of an evaluation profile especially for perceptual hashing algorithms. Based on this profile, on one hand, the transparency of the digital watermarking scheme or on the other hand the robustness of the perceptual hashing function can be evaluated. The introduction of practical tests and their setup show the benefit of the evaluation profile. 1 Motivation and Introduction Mostly, the development of digital watermarks is directly connected to their evaluation. In literature different strategies are introduced to provide a widely applicable and comparable evaluation of watermarking algorithms [2–4]. The existing evaluation techniques differ in their strategy and procedure. For example, a simple evaluation procedure for robustness can be provided by using single attacks with following detection or retrieval of the embedded information. If the watermarking schemes should be evaluated in the context of application scenarios, profiles are more realistic and are of higher use for the evaluation. With profiles, the evaluation is easier, abstracted and useable for the developer as well as for end users with no or only few limited inside knowledge on watermarking techniques. Therefore, application profiles have model typical application scenarios. Furthermore, the comparability of given watermarking schemes can be analyzed for a specific application scenario or a subset of application fields. In this paper, the application profile Perceptual Hashing PE/A−Perceptual Hash is introduced, designed and formalized especially to evaluate perceptual hashing algorithms. Thereby, the influence of the embedding function of a watermarking 1 Note, we use the same notation as introduced in [3]. Thereby “E/A” means, that the embedding and/or attacking function is addressed, whereby “E” addresses only the embedding and “A” only the attacking function. scheme to the perceptual hash as well as the changed perceptual hash as result of the embedded watermark within an application scenario are analyzed and discussed. The bases for all evaluation techniques and evaluation profiles are standardized measurements of one or more specific properties or characteristics. Seven basic profiles are introduced in [4], defining the single characteristics of a digital watermarking scheme. These elementary properties are transparency, capacity, robustness, complexity, invertible, verification and security. Derived from the introduction of the single properties in [5] their formalization and measurement is defined and presented. Furthermore test results are presented, which compare different watermarking schemes with each other. A perceptual hash is also known as content fingerprinting or content based identification [1, 7]. Considering the example of audio content, the principle of such a perceptual hash is, that acoustically relevant characteristics of a piece of audio material are identified and the computed features are stored in a database. When an unknown piece of audio signal is presented, then the same features are computed and matched against those stored in the database. The authors in [8] split the audio signal into n frames. The frequency domain representation of each frame is divided in the range 300Hz to 3000Hz into 32 frequency bands m to compute a 32 bit content hash value H(n, m). In this paper, we do not consider framing of the audio signal for simplification reasons. In our application scenarios and the watermark evaluation, the perceptual hash H(S) is computed from the complete audio signal S. Many applications for perceptual hashes exist, some of them are summarized in [7]. The following itemization summarizes some of the most common of these application scenarios: – The Identification is used, to identify an unknown audio signal, for example a piece of music and to return media data like the author and title. This perceptual hash is also known as secure perceptual hash. In this process, the perceptual hash is computed from the known audio signal H(S) and stored in the database. If an unknown or modified audio signal S′ is presented, then H(S′) is computed and compared with the hash values stored in the database to identify S′. Exact match or threshold based comparison identifies a specific or known audio signal. – The Integrity Verification is an application, where the perceptual hashes are used to detect an alteration of the audio signal. If the original audio signal S is modified (S′) by a malicious or non-malicious function and if as a result H(S) 6= H(S′) is computed, then this alteration is detectable by using a fragile perceptual hashing function. If the computation of the perceptual hash is chosen with the knowledge of possible audio modifications, the perceptual hash could be designed to allow certain modifications and could be fragile for others. – As Watermark Support, perceptual hashes can be used, to compute the required secret key for the embedding and detection/retrieval function of the fragile audio watermark from the audio signal itself [10]. If an alteration of the marked audio signal occur, then the perceptual hash differs, the detection/retrieval fails and the alteration or tampering is detected. – In the application field of Monitoring, which is a more specific application of the identification approach introduced above, the perceptual hash is used to identify transmitted content via radio or TV. Thereby, the content distributer has an easy way to know, when by whom content was broadcasted. For the content owner it is important, because he needs to know, if the distributer has the right to broadcast or not or whether commercials are broadcasted or not. For the different application scenarios, different perceptual hashing algorithms with different required parameter sets exists. In [9] are three different perceptual hashing algorithms are compared with each other. The authors compute H∗(S), whereby S is the original audio signal and H∗() is one of the perceptual hash functions. Then, the audio signal S is modified by 11 different kinds of attacks and from the resulting modified audio signals, the perceptual hashes are computed H∗(S′). A comparison of H∗(S) with H∗(S′) identifies the robustness of H∗() against a specific attack. The evaluation presented in [9] is focused on the perceptual hashing algorithms and their robustness against selected attacks. The authors use a non-standardized evaluation strategy which makes it difficult for others to compare their own test results with those presented in [9]. To bridge this gap, we introduce profile based evaluation based on established evaluation strategies and we focus in this paper on the profile based robustness and fragility evaluation of perceptual hashes. This paper is organized as follows: Section 2 introduces the application profile. Thereby, in subsection 2.1, the transparency measure for the embedding function is defined and formalized. In subsection 2.2 the attacking function as part of the digital watermarking scheme is in focus and its transparency measure defined. Based on both definitions, in subsection 2.3 the transparency evaluation of a digital watermarking scheme in context of perceptual hashing as well as the robustness evaluation of a perceptual hashing function with a digital watermark scheme are introduced. The section 3 summarizes our approach and its impacts and draws conclusion for future work. 2 Application Profile: Design and Formalization In this section, the application profile PE/A−Perceptual Hash for the evaluation of the embedding and attacking function of a digital watermarking scheme and the evaluation of a perceptual hash function with a digital watermarking scheme are designed and introduced. The usage of a digital watermarking scheme and its general processes can be simplified as follows. An unmarked (mostly original) signal (S) is the source signal, where the watermark (w) is embedded by using an embedding function E. The result of it is the marked signal SE . It can be defined, that this process is done in a secure environment. The following step could be for example the distribution of SE over the Internet or storage of it to provide authenticity or integrity checks. These processes can been seen as an insecure part, where attacks (Ai,j ∈ A) occur on SE . After distribution of SE , the signal is defined as SEA because potential attacks could destroy or modify the watermark. A detecting function D tries to detect the watermark w or a retrieval function R tries to retrieve the embedded message m′. The detection/retrieval can be done in a secure or insecure environment, depending on the used application of the watermarking algorithm. The complete introduced scenario is also called life cycle of a watermark, because it begins with embedding and ends with detection/retrieval. The following Figure 1 introduces this life cycle and shows, where the secure and insecure parts are expected. Secure Part Insecure Part Insecure Part Secure or