COMPARISON OF PHOTOGRAMMETRIC AND COMPUTER VISION TECHNIQUES-3 D RECONSTRUCTION AND VISUALIZATION OF WARTBURG CASTLE

In recent years, the demand for 3D virtual models has significantly increased in areas such as telecommunication, urban planning, and tourism. This paper describes parts of an ongoing project aiming at the creation of a tourist information system for the World Wide Web. The latter comprises multimedia presentations of interesting sites and attractions, for instance, a 3D model of Wartburg Castle in Germany. To generate the 3D model of this castle, different photographic data acquisition devices, i.e., high resolution digital cameras and low resolution camcorder, are used. The paper focuses on the comparison of photogrammetric as well as automatic computer vision methods for camera calibration and image orientation based on the Wartburg data set. 1. THE WARTBURG PROJECT WARTBURG CASTLE is situated near Eisenach in the state of Thuringia in Germany. Founded in 1067, the castle complex (Fig. 1) was extended, destroyed, reconstructed, and renovated several times. The Wartburg is a very prominent site with respect to German history, well-known because of the medieval Contest of Troubadours around the year 1200 , famous for being the place Martin Luther translated the New Testament into the German language about 1520, and important as meeting-place and symbol for students and other young people being in opposition to feudalism at 1820. Nowadays, this monument full of historical reminiscences is a significant cultural heritage landmark and, obviously, a touristic centre of attraction. Figure 1. Wartburg Castle Figure 2. ReGeo system (Frech & Koch, 2003) There is an ongoing research project funded by the European Union aiming at the development of a comprehensive online tourist information system of the Thuringian Forest area, based on a geo-multimedia database. The project ReGeo ("Multimedia Geo-Information for E-Communities in Rural Areas with Eco-Tourism" Frech & Koch, 2003) offers tourists essential and useful information about their holiday region, but also supports local business and administrators (Fig. 2). The touristic infrastructure and sights can be explored by mapguided quests as well as by alphanumeric thematic search. For geographical visualization, 2D tourist maps, aerial images together with vector data as well as 3D models and 3D sceneries are provided. To generate a 3D model of Wartburg Castle being a highlight of touristic interest in the Thuringian Forest, photogrammetric data acquisition and 3D object reconstruction have been carried out. Examples of 3D modeling and visualizing architectural objects can be found in the literature related to photogrammetry, surveying as well as computer vision. The "photogrammetric way" including a detailed and precise 3D object reconstruction is explained, e.g., in Hanke & Oberschneider (2002), Daskalopoulos et al. (2003), whilst computer vision techniques are emphasized, e.g., in Pollefeys et al. (2003), El-Hakim (2002). The different approaches have their advantages and limitations. This paper is intended to compare photogrammetric and computer vision methods for camera calibration and image orientation based on parts of the Wartburg data set. 2. OBJECT RECORDING The Wartburg is built on the top of a rocky hill. The terrain slopes steeply away on all four sides of the castle complex. Access to the inner courtyards is possible solely via a small drawbridge. The outer facades of the Wartburg could be photographed from only few viewpoints. Images taken from an ultra-light airplane had to be added to get a sufficient ray intersection geometry for the photogrammetric point determination. The inner courtyards were successfully recorded from the two towers, some windows, and from the ground. Three image acquisition devices were used: The Rollei d30 metric and Canon EOS D60 SLR cameras, and a Sony DCR-TRV738E camcorder (cf. Tab. 1 for tech. specifications). The 5 megapixel Rollei camera provides some features of a metric camera, such as fixed focal length and a rigid connection between lens and CCD chip inside the camera. In addition, two focusing stops can be fixed electronically, i.e., the interior orientation parameters at these two focus settings can be regarded as known over a period of time, if they were once determined by camera calibration. The d30 metric used within the Wartburg project was calibrated for the focal length f1 = 15 mm and f2 = 30 mm. The setting f2 was used for the images taken from the ultra-light airplane, whilst f1 was employed for all other images. The Canon EOS D60 providing a wide angle lens, 6 megapixel sensor and larger image size as the Rollei camera was used to record several parts of the courtyards which could be acquired only from a shorter distance. The camcorder served in addition to collect some overviews and short image sequences (movies) to be presented in the tourist information system. Rollei d30 metric Canon EOS D60 Sony DCR-TRV738E Number of pixels 2552 x 1920 3072 x 2048 720 x 576 Sensor format 9 mm x 7 mm 22 mm x 15 mm Lens (focal length) 10 mm 30 mm ( = 40 mm 120 mm for 35 mm camera) 20 mm (SLR) ( = 32 mm for 35 mm camera) 3.6 mm 54 mm ( = 48 mm 720 mm für 35 mm camera) Image data 6.4 MB uncompressed raw data per image 7.4 MB uncompressed raw data per image 0.8 MB Table 1. Technical Specifications of the cameras used in the project 3. PHOTOGRAMMETRIC RESTITUTION A subset of 10 Rollei images covering a part of the first courtyard was selected to compare the 3-D reconstruction process normally used in close-range photogrammetry with methods preferably applied in computer vision. The digital images were taken parallel to the object facade having relatively short distances between the camera positions. Image data were obtained by manual pointwise measurement within the AICON DPA-Pro and PhotoModeler software. Then, interior and exterior orientation parameters as well as 3-D coordinates of object points were determined by selfcalibrating bundle adjustment. Fig. 3 shows the result of the visualization of this part of the castle. Figure 3. Wartburg Castle: Part of the first courtyard 4. FULLY AUTOMATIC SEQUENCE ORIENTATION, AUTO-CALIBRATION, AND 3D RECONSTRUCTION The goal of this part was to investigate, whether a sequence of images, for which the only thing known is, that they are perspective and that they mutually overlap, is enough for a metric reconstruction of the scene. Additionally we were interested to compare the automatically computed camera calibration information to given calibration information. 4.1 Sequence Orientation Based on the Trifocal Tensor While other approaches use image pairs as their basic building block (Pollefeys, 2002), our solution for the fully automatic orientation of an image sequence relies on triplets which are linked together (Hao and Mayer, 2003). To deal with the complexity of larger images, image pyramids are employed. By using the whole image as search-space, the approach works without parameter adjustment for a large number of different types of scenes. The basic problem for the fully automatic computation of the orientation of images of an image sequence is the determination of (correct) correspondences. We tackle this problem by using point features and by sorting out valid correspondences employing the redundancy in image triplets. Particularly, we make use of the trifocal tensor (Hartley and Zisserman, 2000) and RANSAC (random sample consensus; Fischler and Bolles, 1981). Like the fundamental matrix for image pairs, the trifocal tensor comprises a linear means for the description of the relation of three perspective images. Only by the linearity it becomes feasible to obtain a solution when no approximate values are given. RANSAC, on the other hand, gives means to find a solution when many blunders exist. Practically, first points are extracted with the Förstner operator. In the first image the number of points is reduced by regional non-maximum suppression. The points are then matched by (normalized) cross-correlation and sub-pixel precise coordinates are obtained by least squares matching. To cope with the computational complexity of larger images, we employ image pyramids. On the coarsest level of the image pyramid, with a size of approximately 100 x 100 pixels, we use the whole image size as search space and determine fundamental matrices for image pairs. From the fundamental matrices, epipolar lines are computed. They reduce the search space on the next level. There, the trifocal tensor is determined. With it a point given in two images can be projected into a third image, allowing to check a triple of matches, i.e., to sort out blunders. For large images, the trifocal tensor is also computed for the third coarsest level. To achieve highly precise and reliable results, after the linear solution projection matrices are determined and with them a robust bundle adjustment is computed for the pairs as well as for the triplets. To orient the whole sequence, the triplets are linked. This is done in two steps. First, the image points in the second and third image of the nth triplet are projected into the third image of the n plus first triplet by the known trifocal tensor for the n plus first triplet. As the (projective) 3D coordinates of the nth triplet are known, the orientation of the third image in the projective space of the nth triplet can be computed via inverse projection. To obtain high precision, a robust bundle adjustment is employed. In the second step, 3D coordinates in the coordinate system defined by the nth triplet are determined linearly for all points in the n plus first triplet that have not been computed before. The solution is again improved by robust bundle adjustment. Starting with the first image, this incrementally results into the projective projection matrices for all images as well as in 3D points. After having basically oriented the sequence on the two or three coarsest levels of the image pyramid, finally, the 3D points are projected into all images via the