Comprehensive approach for building outline extraction from LiDAR data with accent to a sparse laser scanning point cloud

The method of building outline extraction based on segmentation of airborne laser scanning data is proposed and tested on a dataset comprising 1,400 buildings typical for residential and industrial urban areas. The algorithm starts with setting a special threshold to separate roof points from bare earth points and low objects. Next, local planes are fitted to each point using RANSAC and further refined by least squares adjustment. A normal vector is assigned to each point. Similarities among normal vectors are evaluated in order to assemble planar or curved roof segments. Finally, building outlines are formed from detected segments using the α-shapes algorithm and further regularized. The extracted outlines were compared with reference polygons manually derived from the processed laser scanning point cloud and orthoimages. Area-based evaluation of accuracy of the proposed method revealed completeness and correctness of 87% and 97%, respectively, for the test dataset. The influence of parameters like number of points per roof segment, complexity of the roof structure, roof type, and overlap with vegetation on accuracy is evaluated and discussed. The emphasis is on point clouds with the density of 1 or 2 points/m2.


Introduction
Increasing demand on 3D information and variety of its applications ranging from architecture, engineering, real estate, telecommunication or tourism and development in technologies like airborne laser scanning (ALS) and multi image processing have triggered research on algorithms for derivation of 3D building models from point clouds and on related issues such as accuracy assessment, transferability of the algorithms to different datasets, fusion of Li-DAR and image data.A comprehensive review of about 100 papers dealing with building extraction from ALS data published in the last two decades, challenges and possible research trends can be found in [17].Point density belongs to the issues discussed.
Increasing point density increases also the accuracy of extracted building outlines [17].The extraction algorithm mentioned in [16] requires conversion of an original point cloud to a raster that is further processed in the eCognition software (object based image analysis, [18]); densities lower than 5 points/m 2 show very low accuracy in the presented case.On the other hand, sparser LiDAR point clouds are acquired for nationwide mapping and therefore their use for building extraction and modelling has been investigated (e.g.[14,4,2]).This article focuses on automated process of building outline extraction applicable on a sparse LiDAR point cloud with the point density about 1-2 points/m 2 .Inputs include an original irregular point cloud containing only 3D coordinates of collected points plus digital terrain model (DTM) of the entire area.The algorithm is based on segmentation in the parameter P. Hofman and M. Potůčková: Building outline extraction from LiDAR data domain (the categories of approaches for the segmentation of surface features can be found e.g. in [9]).To derive building roof surfaces, similar approach as shown in [6] is used; however, random sample consensus (RANSAC) algorithm [3] and similarity between point attributes (normal vectors) in local neighbourhood are utilized.In such a way, the proposed algorithm is applicable to lower point densities and non-planar surfaces.The influence of parameters like number of points per roof segment, complexity of the roof structure, roof type, and overlap with vegetation on accuracy is evaluated and discussed.Development of the algorithm was initiated by the Czech Office for Surveying, Mapping, and Cadastre.

Data and test sites
The laser scanning point cloud was acquired using the LMS Q680 scanner of the RIEGL Laser Measurement Systems GmbH [12].The overlap of the scanned lines was 50 %.The average point cloud density was 1.5 points/m 2 and the estimated accuracy in the point elevation was 0.1 m.
In order to cover the most common roof types in the Czech Republic, two different urban areas, municipalities of Ctiněves (50°22'29" N, 14°18'26" E) and Pardubice-Polabiny (50°03'05" N, 15°45'40" E), were chosen (Figure 1).Ctiněves is a small village in the Ústí nad Labem Region featuring typical rural architecture.Smaller buildings with mostly gabled roofs are often partially overshadowed by vegetation.Polabiny form part of regional capital Pardubice.There are large blocks of flats and industrial buildings with flat roofs as well as residential areas comprising detached houses with more complicated roof constructions.In total, 1400 buildings and building complexes were processed.

Methodology
The proposed solution of building outline extraction is based on processing of irregular point cloud representing roof structures.The extracted outlines therefore correspond to roof outlines and not to groundplans.Moreover, it is assumed that roofs consist of planar (flat, gabled, hipped roofs) or curved (spherical, cone roof) segments and that any roof can be assembled P. Hofman and M. Potůčková: Building outline extraction from LiDAR data from such segments.A data-driven approach is applied; no pre-defined building models are used.Figure 2 shows a general workflow of a developed methodology.First, points on bare earth and points with low height are filtered out.Second, a plane fitting to its neighbourhood and a corresponding normal vector are assigned to each point on artificial and natural objects.
Based on similarity of normal vectors, points are divided to segments.Segments boundaries are geometrically linked together to form a roof outline.Finally, the outlines obtained are regularised.A detailed description of each processing step follows.The proposed methodology and its implementation require setting of several parameters.Their values are in the current implementation set automatically depending on the point cloud density and type of urban area.The values were determined empirically and tested so that the algorithm was transferable among datasets with different densities and types of buildings.

Pre-processing
Prior to the roof outline extraction, a dataset representing only bare earth points (DTM) is required as an input.Existing filtering algorithms, e.g.[7,1], are nowadays used operationally (e.g.software packages TerraScan [15], SCOP++ [20], LasTools [11]) and will not be discussed further in this text.The program SCOP++ [20] was applied for filtering of the test data.Any DTM derived by other means could be utilised provided its resolution and accuracy will not worsen the accuracy of derived roof outlines.

Object detection
Building extraction from an original irregular point cloud can be a demanding computational operation.Thus, reducing the dataset to building candidates is helpful.With the use of DTM, the height (above ground) of each point is calculated.Points with height less than 2 m are excluded from further calculations.Remaining point clusters are delineated with closed, in general non-convex polygons by means of the α-shape algorithm [8].The next processing step aims at eliminating clusters that do not represent buildings but high vegetation, pylons, cars or their combination.

P. Hofman and M. Potůčková:
Building outline extraction from LiDAR data

Plane fitting
Points belonging to one plane roof segment show the same or very similar direction of normal vectors.In the case of curved surfaces, e.g.conical or cylindrical ones, the direction of normal vectors continuously change.Based on these conditions, points not corresponding to buildings can be filtered out.On the other hand, the accuracy of determination of plane parameters and its vectors is essential for successful extraction of buildings.
In our approach, a plane is fitted to surrounding of each point.To decrease the computational time, surrounding is defined with a distance threshold that depends on an average point density in the selected area.In the case of the tested dataset, a bounding box with a side of 5 metres was used.Such an approach brings 30 to 40 points at the start of plane fitting.
The chosen size of the bounding box was sufficient for a reliable definition of a plane even at the corners of the roof segments where in average 5-10 points laying at the same plane were found.First, plane parameters are calculated using RANSAC [3] and subsequently refined by means of least squares adjustment.
Fitting a plane to points lying in the middle of a planar roof segment without additional objects (e. g. chimneys, dormers) is trivial.Nevertheless, there are often other roof constructions or overlapping vegetation or a point falls on the edge of two or more surfaces.Thus, in order to find a plane fitting to most of the points in a given surrounding and to exclude outliers, RANSAC is applied.In addition to the evaluated point, two random points are iteratively selected, plane parameters are calculated, and distances of all points from this plane are evaluated.Assuming that at least one fourth of points in the evaluated point surrounding falls into a searched plane, the number of iterations is determined so that the probability of selecting three initial points from this plane is not lower than 99.9 %.In the case of tested dataset, the number of iteration was set to 110 according to the Formula 1 [3].The used parameter values are in parentheses.
k number of iterations (according to the parameters below equal to 107; set to 110) p expected probability of fitting a correct plane (99.9 %) w minimal expected number of inliers in the evaluated sample of points (0.25) n number of sought points (2, the third point was the evaluated one) Only points with a distance smaller than a set threshold (in our case 0.1 m, i. e. the accuracy in height of the dataset) are accepted.The solution which features the highest number of points (the highest score) falling into the distance threshold is taken as the final one.Plane parameters and the normal vector assigned to the evaluated point are recalculated by the least squares adjustment using all points fulfilling the threshold condition.The plain fitting process is shown in Figure 3.

Roof face segmentation
Points that match the criteria mentioned above can be further segmented and grouped into planar roof faces based on similarity in the direction of their normal vectors -points belonging to one plane are attributed with nearly parallel normal vectors corresponding to local planes formed in the point neighbourhood (Figure 4).Due to the presence of curved surfaces with continuously changing slope and exposition, the search for similar normal vectors is not performed at once for the whole building but only in the close proximity of the point.Thus, also points with even opposite directions of the normal vectors can form one segment provided there is not any discontinuity, i.e. a difference in the angle between the normal vectors exceeding the given threshold.Such local determination in similarity of normal vectors also allows for discriminating roof planes having the same slope and orientation but being physically separated.
Segments that include a low number of points to form a plane in a reliable way (less than 5 in our case) are excluded from the further processing.Thus, after this step, only points forming roof surfaces remain.

Outline extraction and regularization
The method of building outline extraction published in [8] is used in the next step.First, the building outlines are derived by means of the α-shapes algorithm that enables extracting also non-convex shapes and holes inside polygons.Next, irregular shapes are simplified using the adopted sleeve-fitting algorithm which preserves critical points.Finally, the outlines are modified to the most common rectangular building shapes.The dominant building direction P. Hofman and M. Potůčková: Building outline extraction from LiDAR data is calculated from the existing outlines.If the orientation of a single line does not significantly differ from the dominant direction (or the direction perpendicular to that), the line is transformed to that required direction (see also [8] or [21]).Figure 5 shows an example of the outline extraction and regularization result.

Evaluation approach
The automatically extracted building outlines obtained by the above described approach were compared with outlines derived manually from colour orthoimages (0.25 cm ground sampled distance) and a surface model calculated from the test point cloud.Thus, the stated accuracy and reliability values of outline extraction express only quality of the applied method and do not reflect absolute errors in the data.In order to evaluate the quality of outline extraction, an automated area-based approach is applied.Its advantages in comparison with other evaluation methods are discussed in detail in [10].
The outline of each reference building is overlaid with extracted building outlines.Overlaying areas are divided into three groups (see also Figure 6): • True positive (TP): Areas of the reference building that are correctly detected by the automated process.
• False negative (FN): Areas of the reference building that were not detected by the automated process.
• False positives (FP): Areas detected by the automated process that do not match any reference building.Based on these areal values, two quality measures are computed: • Completeness: Comp = TP/(TP + FN), i.e. ratio of correctly detected building area to the area of the reference building • Correctness: Corr = TP/(TP + FP), i.e. ratio of correctly detected building area to the total detected building area In addition to the area-based method, the object-based evaluation with a mutual overlap threshold of 70 % and weighting by building area [13] is also applied.
In addition to the overall quality, also influence of different parameters of an input point cloud, buildings, and surrounding conditions were evaluated, namely number of points per P. Hofman and M. Potůčková: Building outline extraction from LiDAR data roof segment, level of noise in the data, building size, type and complexity of the roof structure, and presence of vegetation.Only the area-based method was used for this evaluation.

Overall quality
The quality of building outline extraction achieved by application of the algorithm described on 1,400 buildings or building blocks is summarised in Table 1.The object-based approach only shows whether the building was roughly detected; compared to the area-based evaluation, however, it does not express the geometric similarity between the reference and extracted building outlines.While the object-based completeness shows that 94 % of reference buildings were successfully detected and 99 % of all extracted building outlines match the reference buildings, the area-based quality measures express that 87 % of building area were extracted correctly and only 3 % of building area do not overlay reference buildings.Considering that only spatial coordinates of laser points were available (without additional information, e.g.multiple echoes or intensity) and the point density was rather low, the building outline extraction of the tested dataset was successful.

Quality in relation to building and point cloud parameters
Success rate of the extraction chiefly depends on the building size, specifically on the size of roof faces.Figure 7a demonstrates the relation between area-based correctness and completeness values and an average number of points per roof segment.Similarly, Figure 7b shows the relation between the same quality parameters and the size of reference building.First, relation of completeness and observed parameters will be discussed.Correctness will be analysed separately at the end of this section.
It is obvious that the completeness much depends on the number of points per roof segment.This number is influenced by the density of the original point cloud, building size, and complexity of the roof deck.The role of building size works similarly, it is practically a subset of the first parameter studied.The trends observed are not surprising due to the fact that the proposed approach is data-driven and detection of single roof segments is crucial in the whole extraction process.In the case of small buildings, reliability of an automated decision (whether a point cluster creates a smooth and continuous surface) is very low.With the increasing number of points per roof segment the success rate increases rapidly.Starting from 30 points per roof segment, the building outline extraction can be considered satisfactory; completeness exceeds 65 %.The quality of building extraction is highly dependent on the number of points per roof segment and on the building size, respectively.Thus, further parameters were studied on buildings larger than 100 m 2 .
Noise in the dataset can be described as σ 0 (standard deviation of the unit weight) resulting from least squares estimation of a best fitting tilted plane in a point neighbourhood.The software package OPALS was used for calculating σ 0 values [19].As expected, the dependence of completeness on σ 0 is relatively high (see Figure 8a).If a normal vector was assigned to a point with an incorrect height, it would produce larger deviations from the vectors in its surrounding.On the other hand, increasing the threshold in the roof segmentation step would rise the number of false positives and it would decrease the correctness value.Figure 8b documents that the results are not strongly influenced by adjacent or overlapping vegetation due to the applied filtering approach using RANSAC that effectively filters out high portion of outliers.
Figure 9a shows dependency of the completeness on the complexity of the roof expressed as the number of roof planes/surfaces.No trend was observed in this case which corresponds with the principle of the proposed algorithm.The local, data-driven approach does not consider P. Hofman and M. Potůčková: Building outline extraction from LiDAR data any building as one unit in the detection phase and it is not limited with a pre-defined set of models which is the case of the model-driven approach.Any roof is considered as a union of an arbitrary number of either planar or curved surfaces.The type of the roof does not influence the extraction success rate as well (compare Figure 9b).The proposed method can detect traditional gable or (half-)hipped roofs as well as modern flat and shed roofs.Due to local determination of similarity between normal vectors, curved surfaces are detected with the same quality.
No strong relation was observed between correctness values and the studied parameters.
Excluding negligible inaccuracies at the edge of the buildings, false positives appeared mostly on vegetation (close and distant), locally comprising clusters behaving as a continuous surface.Such surfaces were mostly small and isolated.Thus, majority of them was excluded from further calculations.If such cluster appeared in a close proximity of a building, vegetation was considered as another segment of the roof.Therefore, correctness values slightly decreased in the case of buildings with adjacent or overlapping vegetation (Figure 8b).This problem could be minimised by utilising intensity information or by fusion with colour imagery.

Comparison to related work
Similar approach for roof plane detection was published in [6].By means of linear regression the authors fitted local planes to scanned points and determined local roughness and normal vectors.The points on vegetation were filtered out based on roughness values and the building was segmented to roof planes according to normal vectors.It was not possible to use this approach on the test dataset due to much lower point density, 1.5 points/m 2 compared to 17 points/m 2 , and higher percentage of outliers.Thus, linear regression was not sufficient in the case of our dataset.Moreover, in [6] problem with missing breaklines between roof planes was mentioned; a normal vector corresponding to a point on a breakline does not match normal vectors of any adjacent roof planes.These problems were solved by utilizing the RANSAC algorithm that is able to eliminate outliers and chooses only one roof surface/plane for points on breaklines.Finally, the solution when the roof surface was not planar but generally curved was not included in [6].

Conclusion
The proposed methodology for building outline extraction shows promising results.It is fully automatic and based only on geometric attributes of the laser point cloud, i.e. on spatial coordinates of the points.Moreover, it is suited for datasets with a lower point density (1.5 points/m 2 in the case of our test point clouds).
Completeness of 97 % and correctness of 87 % was achieved in two test areas comprising rural, industrial, and urban types of buildings.The success rate was similar in the case of all roof types studied regardless of their complexity.The influence of adjacent or overlapping vegetation was low.The major influence on resulting extraction quality was observed for the size of roof faces in relation to the point density.
In order to increase the number of detected small buildings, higher point cloud density is required.On the other hand, increasing point density also brings higher level of noise in the laser point cloud [5].Thus, successful practical application of the proposed method requires more tests that would be carried out on datasets with different point densities.

Figure 1 :
Figure 1: Subsets of point clouds from the test sites (a) Ctiněves and (b) Polabiny.For the purpose of visualization the point clouds were automatically classified in SCOP++ (dark green -ground, light green -vegetation, red -roofs, white -not classified).

Figure 2 :
Figure 2: General workflow of the proposed building (roof) outline extraction algorithm.

Figure 3 :
Figure 3: Local plane fitting and refinement: (a) 2D section of an original point cloud with an evaluated point in green, (b) selected point surroundings, (c) fitting plane after using the RANSAC algorithm with a distance threshold, (d) final plane refined by means of least squares adjustment.

Figure 4 :
Figure 4: Point segmentation based on similarity of normal vectors in close proximity: (a) 2D section of an original point cloud and direction of normal vectors assigned to individual points, (b) clustering based on parallelism of normal vectors -points belonging to the left (red) and right (blue) planes, and two points (black) which direction of normal vectors exceed the threshold with respect to their neighbouring points, (c) result of segmentation -points belonging to two roof faces as an input for the final building outline extraction.

Figure 5 :
Figure 5: Outline extraction and regularization.(a) Outline of a reference building (red polygon) derived by manual editing.(b) Outline of a cluster after applying 2 m height threshold (green polygon).(c) Outline after the roof segmentation (blue polygon).(d) Final regularized building outline (cyan polygon).

P.Figure 7 :
Figure 7: Dependency of correctness and completeness accuracy measures (a) on the number of points per roof segment and (b) on the size of the building.

Figure 8 :
Figure 8: Dependency of correctness and completeness accuracy measures (a) on the noise level and (b) on overlapping vegetation.

Figure 9 :
Figure 9: Dependency of correctness and completeness accuracy measures (a) on building complexity expressed as a number of roof segments and (b) on roof type.

Table 1 :
Correctness and completeness of area-and object-based evaluation methods for building extraction.Results of the above mentioned building outline extraction algorithm applied on 1,400 buildings in the Ctiněves and Polabiny test areas.