论文信息 - Videos versus still images: Asymmetric sensor pattern noise comparison on mobile phones

Videos versus still images: Asymmetric sensor pattern noise comparison on mobile phones

Nowadays, the most employed devices for recoding videos or capturing images are undoubtedly the smartphones. Our work investigates the application of source camera identification on mobile phones. We present a dataset entirely collected by mobile phones. The dataset contains both still images and videos collected by 67 different smartphones. Part of the images consists in photos of uniform backgrounds, especially collected for the computation of the RSPN. Identifying the source camera given a video is particularly challenging due to the strong video compression. The experiments reported in this paper, show the large variation in performance when testing an highly accurate technique on still images and videos. Introduction Source camera identification is one of the most important topics in Image Forensics, considering that it can be applied for associating videos or still images with illegal content to the source camera and possibly to its owner. Nowadays, the most employed devices for recoding videos or capturing images are undoubtedly the smartphones. However, the large variety of imaging sensor and software with very different characteristics (e.g. resolution, image pre-processing, and file format) makes the source camera identification on mobiles very challenging, in particular when dealing with videos subject to strong compression. Our study focus on the source camera identification issue on mobile devices and in analysing the variations in performances when applied on videos and when comparing video versus still images. It is worth disambiguating between two main categories of source camera identification techniques. They are both based on the analysis of the traces left by the different processing steps in the image acquisition and storage phases. These traces mark the image with some kind of camera fingerprint, which can be used for authentication [1]. The first group of techniques tries to distinguish between different camera models by analysing acquisition artefacts produced by lenses or Color Filter Array (CFA) interpolation. The second, on a more challenging level, aims to distinguish between single devices, even different exemplars of the same camera model. The latter technique is based on the distinctive pattern due to imperfections in the silicon wafer during the sensor manufacturing. We adopt a well known technique for Sensor Patter Noise extraction, belonging to the second category described above, namely the Enhanced Sensor Patter Noise technique presented by Li in 2010 [10]. The most important aspect of our work is the asymmetric comparison of the SPN extracted from videos and still images. It is known that videos captured by mobile phone are strongly compressed and this has a severe impact on the SPN extraction. The results show the great gap in performances when using videos in place of still images for source camera identification. Experiments are carried out on a large image database especially collected for source camera identification on mobile devices. Performances are assessed in terms on Receiver Operating Characteristic (ROC) curve, Comulative Match Characteristic (CMC) curve, and Equal Error Rate (EER). Related Works As stated before, we adopt a technique, namely the Enhanced Sensor Patter Noise extraction, presented in [10] by Li. This technique is based on the observation that imaging sensors have various defects that produce a noise pattern in the pixel values [13]. The sensor noise is the result of three main components, that are the pixel defects, the fixed pattern noise (FPN), and the Photo Response Non Uniformity (PRNU). Geradts et al in [14] attempt at reconstructing pixel defects patterns by taking images with 12 black or green background with 12 different cameras. The defect points are then compared showing that each camera has distinct patterns also across the same model. However, not all camera models contain any defective pixels and some cameras eliminate them. Therefore, this method is not applicable to every digital camera [1]. FPN and PNRU are the two components of the so-called pattern noise, and depend on dark currents in the sensor and pixel non-uniformities, respectively [1]. In [6], Lukas et al. propose to analyse the sensor pattern noise (SPN) for camera identification, as it is a unique stochastic characteristic for both CCD and CMOS sensors [1]. They show that the SPN extracted from images taken by the same camera is more correlated than those extracted from different cameras. The SPN is estimated by computing the difference between an image I and its denoised version: n = DWT (I)−F(DWT (I)) (1) where DWT () is the discrete wavelet transform to be applied on image I and F() is a denoising function applied in the DWT domain. F(), is a filter proposed in appendix A of [6]. In a later study, Li [10] proposes to refine the previous method by enhancing the SPN. Li observed that the SPN can be contaminated by fine details or structures of the depicted scene, since both the image noise and details are located in high frequencies. This deviation might reduce the probabilities of matching with a reference. Li proposes to enhance the SPN estimation by weighting noise components in a way inversely proportional to their magnitude, in order to suppress information derived from non-smooth image parts. As a result, high classification accuracy is obtained also on small-sized image regions [1]. Figure 1. CMC and ROC curves for experiment (i) still images vs. still images and (ii) still images vs. videos. The first large and publicly available image database for benchmarking of source sensor recognition techniques has been proposed in 2010, namely the ”Dresden Image Database” [2]. It is composed by more than 14,000 images acquired with 73 cameras of 25 different models. It has been used in a number of works [4][9][3][5]. Another small database for blind source cell-phone model identification has been presented in 2008 by Çeliktutan et al. in [7]. It contains more than 3.000 pictures collected using 17 mobile phones of 15 different models. Database In order to perform our experiments, we collected a novel dataset of still images and videos, namely the SOCRatES database. In its current state, the database is made up of about 6.200 images and 680 videos captured with 67 different smartphones of 14 different makes and 42 different models. It also contains several pictures of uniform backgrounds for the RSPN extraction. However, RSPN extraction can be performed also on non-uniform-color images still obtaining optimal performances, as demonstrated in [11]. The acquisition has been performed in uncontrolled conditions. In order to collect the database, many people were involved and asked to use their personal smartphone to collect a set of pictures. The reason behind this choice is, on the one hand, to collect a database of heterogeneous pictures and to maximize the number of devices employed, and, on the other hand, to carefully replicate realistic acquisition conditions. A total of 90 photos and 10 videos have been collected for each smartphone: 50 are photo of the blue sky, or of another uniform color surface, needed for the RSPN computation; 40 pictures portray random scenes, avoiding privacy and copyright sensitive subjects. Ten short video clips are recorded with each device. Their duration varies from 2 to 5 seconds. Involved persons in the first acquisition session are mostly EURECOM students. A naming convention has been adopted to distinguish the images/videos captured with different devices, an ID number has been assigned to each different device, and to indicate the type of the acquired item, i.e.: “background picture”, “foreground picture”, “video”. Along with pictures and videos, annotation files describing the characteristics of the smartphones employed are provided. In particular, they list the smartphone model, the Operating System, the digital camera model, the photo resolution and the video resolution employed during acquisition. Thanks to this dataset we analyse the advantages and disadvantages of performing source camera recognition on mobile phones and in using videos versus still images. The database and its description will be soon made available the following URL: http://socrates.eurecom.fr Video vs. still images SPN extraction on mobile devices The problem addressed by this work is two-fold: (i) we first assess the performances of Li’s technique for source camera identification for the first time on a large database of images captured only by mobile devices; (ii) we analyse the problem of SPN extraction from strongly compressed videos, such as the ones generated by smartphones. SPN extraction from videos is a well known issue [12]. The sensor pattern noise is strongly impacted by compression and also, compared to photos captured by the same sensor and with the same resolution, the recorded scene is somehow cropped. It is observed that the resulting SPN comparison lead to a much lower correlation when comparing videos recorded by the same sensor. One possible way to mitigate the problem, is to pre-select the video frames to be used in the SPN computation, by taking into account mainly the I-frames [12], on which the impact of video compression is weaker. As mentioned before, Li proposes to enhance the weakest SPN components and to suppress the strongest ones that are more likely to correspond to scene details [10]. Different models are proposed in [10] to compute the Enhanced SPN (ESPN), we adopt the following: ne(i, j) = { e−0.5n 2(i, j)/α2 , if 0 <= n(i, j) −e−0.5n(i, j)/α2 , otherwise (2) where ne is the ESPN, n is the SPN, i and j are the indices Figure 2. CMC and ROC curves for experiment (iii) videos vs. still images and (iv) videos vs. videos. of the components of n and ne, and α is a parameter that is set to 7, as indicated in [10]. To know if a given picture/video frame belongs to a specific sensor, the extracted ESPN is compared with the Referenc

[1] Miroslav Goljan,et al. Digital camera identification from sensor pattern noise , 2006, IEEE Transactions on Information Forensics and Security.

[2] Luisa Verdoliva,et al. A study of co-occurrence based local features for camera model identification , 2016, Multimedia Tools and Applications.

[3] Michele Nappi,et al. Multimodal authentication on smartphones: Combining iris and sensor recognition for a double check of user identity , 2016, Pattern Recognit. Lett..

[4] Gerald C. Holst,et al. CCD arrays, cameras, and displays , 1996 .

[5] Giovanni Maria Farinella,et al. On Blind Source Camera Identification , 2015, ACIVS.

[6] Matthias Kirchner,et al. Unexpected artefacts in PRNU-based camera identification: a 'Dresden Image Database' case-study , 2012, MM&Sec '12.

[7] Chang-Tsun Li. Source camera identification using enhanced sensor pattern noise , 2010, IEEE Trans. Inf. Forensics Secur..

[8] Zeno Geradts,et al. Methods for identification of images acquired with digital cameras , 2001, SPIE Optics East.

[9] Bülent Sankur,et al. Blind Identification of Source Cell-Phone Model , 2008, IEEE Transactions on Information Forensics and Security.

[10] Jingyuan Zhang,et al. Source camera identification using Auto-White Balance approximation , 2011, 2011 International Conference on Computer Vision.

[11] Judith Redi,et al. Digital image forensics: a booklet for beginners , 2010, Multimedia Tools and Applications.

[12] Rainer Böhme,et al. The 'Dresden Image Database' for benchmarking digital image forensics , 2010, SAC '10.

[13] Min Wu,et al. Exploring compression effects for improved source camera identification using strongly compressed video , 2011, 2011 18th IEEE International Conference on Image Processing.