FROM DEPOSIT TO POINT CLOUD – A STUDY OF LOW-COST COMPUTER VISION APPROACHES FOR THE STRAIGHTFORWARD DOCUMENTATION OF ARCHAEOLOGICAL EXCAVATIONS

: Stratigraphic archaeological excavations demand high-resolution documentation techniques for 3D recording. Today, this is typically accomplished using total stations or terrestrial laser scanners. This paper demonstrates the potential of another technique that is low-cost and easy to execute. It takes advantage of software using Structure from Motion (SfM) algorithms, which are known for their ability to reconstruct camera pose and three-dimensional scene geometry (rendered as a sparse point cloud) from a series of overlapping photographs captured by a camera moving around the scene. When complemented by stereo matching algorithms, detailed 3D surface models can be built from such relatively oriented photo collections in a fully automated way. The absolute orientation of the model can be derived by the manual measurement of control points. The approach is extremely flexible and appropriate to deal with a wide variety of imagery, because this computer vision approach can also work with imagery resulting from a randomly moving camera (i.e. uncontrolled conditions) and calibrated optics are not a prerequisite. For a few years, these algorithms are embedded in several free and low-cost software packages. This paper will outline how such a program can be applied to map archaeological excavations in a very fast and uncomplicated way, using imagery shot with a standard compact digital camera (even if the images were not taken for this purpose). Archived data from previous excavations of VIAS-University of Vienna has been chosen and the derived digital surface models and orthophotos have been examined for their usefulness for archaeological applications. The absolute georeferencing of the resulting surface models was performed with the manual identification of fourteen control points. In order to express the positional accuracy of the generated 3D surface models, the NSSDA guidelines were applied. Simultaneously acquired terrestrial laser scanning data – which had been processed in our standard workflow – was used to independently check the results. The vertical accuracy of the surface models generated by SfM was found to be within 0.04 m at the 95 % confidence interval, whereas several visual assessments proved a very high horizontal positional accuracy as well.


INTRODUCTION
The process of archaeological excavation aims at a complete description of a site"s unique stratification.In practice, each single deposit has to be uncovered, identified, documented and interpreted.Since this can only be done within a destructive process, high resolution documentation techniques for three-dimensional (3D) single-surface recording (as defined by [1,2]) are essential.Among the wide range of possible documentation techniques, total stations are typically used to document the outline and topography of top and bottom surfaces of single deposits.While total stations have become standard tools for documenting archaeological excavations in many countries, a detailed 3D single-surface recording is time consuming, cost-intensive, and provides only a general trend of the topography when dealing with rough surfaces.Alternatively, terrestrial laser scanning (TLS) has been proposed as a particularly sophisticated method to produce an accurate and detailed surface model [1,2,3].Due to their high acquisition costs, for the time being they are rarely applied at archaeological excavations.Another option for fast 3D single-surface recording would be the ________________________________________________________________________________ Geoinformatics CTU FCE 2011 82 adoption of a photogrammetrical workflow.Until recently, however, this alternative was not taken into consideration by many archaeologists, because photogrammetry again was considered to require high expertise, and expensive equipment (hard-and software).For a few years, the research field of computer vision, having close ties to photogrammetry, is developing innovative algorithms and techniques to obtain 3D information from photographs in a simple and flexible way without many prerequisites.These are embedded in several free and low-cost computer vision software packages, which allow an extremely flexible and appropriate approach to model surfaces from a wide variety of imagery.The paper will outline how such a program can be applied to map archaeological excavations in a very fast and uncomplicated way, using imagery shot with a standard compact digital camera.In that way, the photographic record of the individual surfaces can be used to create digital surface models and orthophotos.In order to assess the accuracy of the method, the 3D surface models are compared to surface models generated by simultaneously acquired TLS data.

STRUCTURE FROM MOTION AND MULTI-VIEW STEREO
A lot of tools and methods exist to obtain information about the geometry of 3D objects and scenes from 2D images.
One of the possibilities is to use multiple image views from the same scene.Using photogrammetric techniques, an image point occurring in at least two views can be reconstructed in 3D.However, this can only be performed if the projection geometry is known, the latter expressing the camera pose (i.e. the external orientation parameters) and internal calibration parameters.A Structure from Motion (SfM [4]) approach allows to simultaneously compute both the relative projection geometry and a set of 3D points from a series of overlapping images captured by a camera moving around the scene [5,6].By detecting a set of image features for every photograph and subsequently monitoring the position of those points throughout the multiple images, the locations of those feature points can be estimated and rendered as a sparse 3D point cloud that represents the geometry/structure of the scene in a local coordinate frame [6,7].SfM algorithms are used in a wide variety of applications but were developed in the field of computer vision, often defined as the science that develops mathematical techniques to recover the three-dimensional shape and appearance of objects in imagery [6].Recently, SfM received a great deal of attention due to two SfM implementations that are freely available: Bundler [Ř] and Microsoft"s Photosynth [ř].In this study, the commercial package PhotoScan (from AgiSoft LLC) is applied.Besides the aforementioned SfM approach, PhotoScan comes with a variety of dense multi-view stereo-matching algorithms (see [10] for an overview).As these reconstruction solutions operate on the pixel values [11,12], this additional step generates detailed meshed models from the initially calculated sparse point clouds, hence enabling proper handling of fine details present in the scenes.In a final step, the mesh can be textured.At this stage, the reconstructed 3D scenewhich is still expressed in a local coordinate systemis by at least three manually measured Ground Control Points (GCPs) rotated and scaled in order to fit into the absolute coordinate frame.This means that the current approach just relies on one digital still camera, a computer, and a total station.

ARCHAEOLOGICAL CASE STUDY
In the past, similar approaches have been applied in digitizing archaeological sites (e.g.[13,14]).However, the SfM and multi-view stereo algorithms have been improved over time (see [12]).A rigorous comparison with simultaneously acquired TLS data was also not performed in this earlier work.To test the validity of the presented computer vision approach, a case-study was selected from an excavation in Schwarzenbach [15], a multi-period hill fort in the Federal State of Lower Austria, some 60 kilometres south of Vienna.Archaeological research has been going on since 1989 by VIAS-University of Vienna, including various multidisciplinary projects focusing on archaeological prospection, environmental archaeology, and experimental reconstruction of settlement structures.The site has also functioned as a key-excavation-area for the development of exhaustive digital documentation techniques for stratigraphic excavations [1,2] conducting GIS-based single surface documentation using a total station, digital photography, and TLS (to capture a detailed documentation of top and bottom surfaces and feature interfaces).

Scene reconstruction
Subject of this validity test is the top surface of the stratigraphic unit deposit SE608s.It has been documented in trench 6 during the 2008 excavation campaign and is part of a burnt Bronze Age rampart structure.This surface is particularly adequate because its topographic altitude variation is about 0.5 m and the presence of many, variously shaped, sized and oriented stones made the surface reconstruction challenging.Besides, the top surface of the deposit with its surroundings was scanned by a Riegl LMS-Z420i laser scanner.The scanner was placed about 7 m above the documented surface, yielding a scanning distance below 10 m.Two scanning positions were necessary to document the surface satisfyingly.

________________________________________________________________________________
Geoinformatics CTU FCE 2011 83 The imagery used in this reconstruction was shot in the summer of 2008 using a Sony Cybershot DSC-R1: a 10 MP digital bridge camera featuring a Carl-Zeiss Vario-Sonnar 2.8-4.8/14.3-71.5 mm T* zoom.Of those images, all the Exif (EXchangeable Image File)-defined metadata tags were available.To enable orthophoto production, the images were shot as vertical as possible: the photographer was standing on a stepladder, handholding a 2 m long pole on top of which the camera was mounted, reaching a varying camera altitude of 5 to 6 m above the surface.For this study, a small collection of ten images was used (see Figure 1A).It needs to be noted that none of those images was specifically acquired for the following approach, but the selected set of images nicely covers the area of interest.After importing all images into PhotoScan, feature points are automatically detected and described in all the source images.The approach is similar to the well-known SIFT (Scale Invariant Feature Transform) algorithm developed by David Lowe [16], since the features are very stable under viewpoint and lighting variations.Using these features, the SfM algorithm can relatively orient all the images and estimate the intrinsic camera parameters.The locations of the feature points result in a sparse 3D point cloud that roughly describes the scene in a local coordinate system (Figure 1B).In a second step, a dense surface reconstruction is computed.Because all pixels are utilized, this reconstruction step (which is based on a pair-wise depth map computation) enables proper handling of fine details present in the scenes and represents them as a 3D mesh (Figure 1C).Several algorithms are available to do this [10].Three of themwhich differ by the way the individual depth maps are merged into the final 3D modelare chosen to compute a total of fifteen digital surface models (see Table 1).In a third stage, every DSM is georeferenced by importing the coordinates of fourteen GCPs and indicating their position on the photographs (Figure 1D).Afterwards, a seven parameter similarity transformation converts the surface model into an absolute coordinate system.The maximum horizontal error reported between the computed coordinates and the GCP values acquired by total station was 7 mm.To enable an identical absolute georeferencing for every DSM, DSM 2 to 15 were computed using the images and GCPs embedded in the project file from DSM 1.By varying the reconstruction parameters, PhotoScan computed a new DSMwhich was separately storedwhile maintaining the GCPs position relative to each individual photograph.Although it is not necessary for the orthophoto or DSM output, the 3D models can be textured to get a more pleasing representation (Figure 1E).Finally, every DSM was exported as an ASCII file.

Spatial accuracy and precision assessment
Notwithstanding the 3D models are very easy to generate, it is prudent to evaluate their accuracy.Therefore, all fifteen DSMs were compared to TLS data, the latter being acquired by Riegl´s LMS-Z420i.The two scanning positions were absolutely georeferenced with Riegl Reflectors (cylinders).The position of the reflectors was measured with a total station and yielded an average absolute georeferencing RMSE of the TLS data of 0.011m.Finally, RiSCAN PRO 1.6.1 was applied to resample, clean and filter (octree) the TLS data to reduce the noise and smooth the point cloud to a final point spacing of 1.7 cm (this is our standard workflow that proved to be useful for previous scanning tasks).The georeferenced 3D point cloud was loaded into ESRI ® "s ArcGIS ® 10 together with the fifteen DSMs.Those DSM were exported from PhotoScan using a 2 cm grid spacing since previous research already indicated that large cell sizes can result in quite significant accuracy losses when dealing with complex terrains [17].Additionally, 2 cm seemed a feasible grid spacing considering the density of the used laser point cloud.For the accuracy assessment, a rectangular test area (4 by 4 m) was chosen in which the complete topographic surface variation was present.It was also verified that the point spacing was still 1.7 cm.In this area, all fifteen DSMs were sampled for their altitude value on the > 50,000 TLS point locations.As the TLS measurements were the basis for the comparison, they were handled as the true values.By treating the values of the DSMs as observed values, several metrics could be extracted from this dataset (Table 1): a maximum positive and negative altitude difference, the mean (μ) difference, the mean of all absolute altitude differences, the standard deviation (σ) and the Root-Mean-Square Error (RMSE).Since absolute accuracy defines how well the observed value corresponds to the true value, RMSE is often used to assess the horizontal and vertical positional accuracy.Because the standard deviation describes the amount of variation that occurs between all the successive measurements, this metric can be applied to indicate the precision (often called relative accuracy in the field of DSM).It should be noted that in this case, both metrics only provide information on the vertical component of the computed DSMs.To incorporate all possible uncertainties in the computed dataset (including those introduced by the GCPs), the final vertical accuracy values are expressed at the 95% confidence interval using the National Standard for Spatial Data Accuracy (NSSDA): 1.96 RMSEz [18].This shows that the most accurate surface (DEM 15) has an NSSDA vertical accuracy of 0.041 m, while a vertical accuracy of 0.045m is retrieved for DSM 10.These figures mean that 95% of all the computed 3D points have an error with respect to the true ground position that is smaller or equal to the stated accuracy metric.Regarding the fact that both the TLS and PhotoScan georeferencing is accurate to within about 1 cm and, additionally, the TLS data is characterised by a noise of ± 1-2 cm in the < 10 m range [19,20], the calculated RMSE is more or less falling in the typical random error range.Therefore, this test allows one to assume that the PhotoScan result has more or less the same overall accuracy as the TLS data set.

________________________________________________________________________________
Geoinformatics CTU FCE 2011 85 Additionally, a visual assessment of both vertical and horizontal positional accuracy is provided in Figure 2A, which displays a TLS-versus-PhotoScan difference grid and noticeably reveals the biggest differences (see also Table 1), which are in this example situated along some sharp edges.On the one hand, some of these errors are in accordance to the edge effects known from previous TLS research [19].On the other hand, our applied TLS workflow (i.e.merging scan positions, resampling and octree filtering) generated a variety of wrong points (certainly when compared to the original point cloud displayed in Figure 2A.This is clearly shown in the profile.Still, it is remarkable that the computer vision approach was able to retrieve these sharp forms quite well.In the flatter areas, the profiles also expose the lower noise of the PhotoScan DSM, although the surface might be slightly oversmoothed.The true surface is thus likely somewhere in the middle of both TLS and photographic approaches.Even when sub-centimetre accuracy is generally not of much importance in excavation recording, PhotoScan certainly proves its capabilitiesat least in this test areain detecting and modelling very small details.Finally, this comparison also revealed some shortcomings of our default TLS processing chain (data reduction in order to speed up processing), since the original point cloud (Figure 2A) represented the edges much better.

DISCUSSION AND PROSPECTS
During the last years, the demand for accurate and fast generation of 3D surface models has been increasing in several domains.Archaeology was no exception to this.However, since archaeologists often have to deal with cost constraints, using a laser scanner is not always feasible.In the previous section, it was clearly shown that one can acquire very accurate 3D information about archaeological interfaces using state-of-the-art computer vision approaches.It again needs to be stressed that the imagery using in this comparison was not specifically acquired for this type of approach.The amount of image overlap and the camera positions were not at all optimised for a digital surface reconstruction.
Still, the accuracy obtained can be considered sufficient for archaeological work.Besides, the workflow is very straightforward, only little familiarity with photogrammetry or computer vision is assumed and no expensive hard-or software is involved for the data acquisition.However, generating high-quality models from large datasets does require adequate computing resources.Finally, also old imagery can be reprocessed into accurate 3D surfaces and orthophotographs.To illustrate this, our approach was applied onto a set of six 1.6 MP handheld oblique images (Figure 3A).Those were shot more than ten years ago using a Canon digital compact camera (PowerShot Pro70) and represent a Late Neolithic pit (feature interface SE30i) found on the multi-period open settlement site of Platt in Lower Austria, 70 km north of Vienna [20].Apart from the pixel values, no other data were preserved, meaning that PhotoScan had no initial focal length values to start from.As in the previous case study, four total station-measured GCPs were visible in each image, as well as some in-situ measured breaklines and surface points.As Figure 3B indicates, the 3D model retrieved from these archived images is still very useful and more than sufficient for visualisation of the feature interface.Only small parts of the interface are lacking, since the bottom was not everywhere equally well covered by digital photographs.Notwithstanding, enough digital information was initially captured to allow the production of an accurate orthophotograph.Figure 3C shows the rectified photograph that was originally calculated from one of the oblique images using a simple projective transformation.When overlaid with the total station breakline measurements, one can see the big deviations due to topographic displacements and lens distortion.Comparing this result with the output produced by PhotoScan more than a decade later (Figure 3D) again highlights the potential of the latter approach.These results should in no way be interpreted as a statement that TLS should be replaced by image-based modelling approaches in excavation work.First of all, we were able to generate similar results with both techniques which are usable for archaeological interpretation.Secondly, TLS has proven its reliability over years.Although the current examples prove SfM algorithms to be a very valid alternative for 3D single-surface recording, it has to be stressed that this approach is obviously not perfect.When dealing with very large photo collections, highly oblique images or photographs that have a dissimilar appearance, erroneous alignment of the imagery can occur.Besides, it should be clear that high quality reconstructions with large image files are very resource intensive.A multicore processor, a decent amount of RAM (minimum 8 GB), a 64-bit operating system andmost importantly -a high-end graphical card are minimum requirements for successful processing.Table 1 also gives a short overview of the processing times recorded during the reconstruction of the aforementioned DSMs.Notice how the stepwise increase of output quality comes with a serious time penalty.Luckily, the metrics of Table 1 show that even lower-quality DSMs were more than sufficient to digitally represent the uncovered surface for archaeological documentation.

CONCLUSION
In this paper, the goal was to present an inexpensive approach to fast and accurate 3D surface recording.The method is mainly based on several computer vision techniques and is very straightforward to execute and integrate in the general excavation methodology.Moreover, it also offers the enormous advantage that there are just standard photographic recording prerequisites.Apart from a sufficient amount of sharp images covering the scene to be reconstructed and at least three GCPs to transform the reconstruction into an absolute coordinate frame, no other information is needed (although Exif metadata informatione.g. even GPC coordinatescan be utilized).Besides, only a minimal technical knowledge and user interaction are required.Finally, this approach can also work in total absence of any information about the instrument the imagery was acquired with.To illustrate this, archived data from previous excavations of VIAS-University of Vienna have been chosen to model feature interfaces after which they were examined for their usefulness in terms of archaeological visualisation and extraction of metric information.To evaluate their geometric accuracy, the 3D models have been compared to simultaneously acquired total station and TLS data.Although the imagery had been shot before the development of this approach, the DSMs generated by PhotoScan showed only small derivations from those produced by our standard TLS-workflow and can therefore be considered as useful for our excavation purposes.While it needs to be stressed that obtaining millimetre accuracy is not an archaeological aim in itself and it willfor most archaeological excavationsnot deeply change our fundamental understanding of the past when compared to more conventional registration methods, archaeologists should always strive to document an excavation as detailed and accurately as reasonably possible, since it is a one-time and very destructive event.The lack of financial means to apply an on-site laser scanner or the technical expertise required to use photogrammetrical approaches have often been considered the main hindrances in reaching appropriate 3D excavation documentation, even these days.Thanks to the world-wide availability of digital still cameras and the integration of state-of-the-art computer vision and photogrammetry algorithms in a user-friendly software package, all the tools are now available to overcome the previous constraints and establish a straightforward, low-cost workflow for excavation recording that can be executed by technically low-trained archaeologists.The presented case-studies already showed that both image-based and TLS approaches have their drawbacks and advantages.However, they can both be considered valid techniques for fast and accurate 3D single-surface recording.Even though future investigations under different controlled conditions are necessary to assess the image-based modelling more thoroughly and quantify whether and under which conditions SfM approaches are a reliable documentation technique for archaeological excavations.

Figure 1 :
Figure 1:One of the ten images (A) out of which PhotoScan calculated the camera poses (B), a sparse 3D point cloud (B) and a surface model (C).The latter can be georeferenced using GCPs (D) and textured (E).

Figure 2 :
Figure 2: (A) Difference grid between PhotoScan DSM 10 and the TLS data that were processed by our standard workflow; (B) The profile A-B indicated in (A) shows the differences between the original point cloud, the Photoscan DSM and the DSM extracted by our standard workflow from the TLS data.

Figure 3 :
Figure 3: (A) One of the Platt pit mages; (B) The surface and camera poses recovered by PhotoScan; (C) a rectified pit image and the PhotoScan (PS) orthophoto, both overlaid with measured breaklines (see text).

Table 1 :
Most important processing parameters and all computed metrics for the fifteen DSMs.All computations were performed using an Intel ® Core™ i7-980X Processor, NVIDIA ® "s GeForce ® GTX 580 and PhotoScan Professional 0.8.1 beta running on a Microsoft ® Windows™ 7 Ultimate 64-bit machine.