AUTOMATED BIM ENTITY RECONSTRUCTION FROM UNSTRUCTURED 3D POINTCLOUDS

Laser scanning is used widely in architecture and construction to document existing buildings by providing accurate data for creating a 3D model. The output is a set of data points in space, so-called point cloud. While point clouds can be directly rendered and inspected, they do not hold any semantics. Typically, engineers manually obtain floor plans, structural models, or the whole BIM model, which is a very time-consuming task for large building projects. In this contribution, we present the design and concept of a PointCloud2BIM library [1]. It provides a set of algorithms for automated or user assisted detection of fundamental entities from scanned point cloud data sets, such as floors, rooms, walls, and openings, and identification of the mutual relationships between them. The entity detection is based on a reasonable degree of human interaction (i.e., expected wall thickness). The results reside in a platform-agnostic JSON database allowing future integration into any existing BIM software.


Introduction
The Building Information Modelling (BIM) brings excellent opportunities for the efficient and more automated design process of building structures and maintenance and reconstruction planning of existing structures. Obtaining a digital BIM representation of the latter is often a challenging task, namely when no up-to-date (or none at all) floor plans are available.
Laser scanning is a convenient and accessible technology to document the existing infrastructure. The output is a dense set of data points in space, socalled point cloud. See Figure 1 for an example of a laser-scanned building. Each point carries its spatial coordinates, and some scanners also capture the RGB color value and intensity of the scanned point. While point clouds can be directly rendered and inspected, they carry no additional information about the scanned objects. There are no available tools to convert the raw point data into a BIM representation with a reasonable degree of automation, although research in this field is highly emerging [2,3].
Currently, the civil engineers have to manually identify the BIM entities from graphical visualization of point cloud and create their digital representation in BIM model. The long-term goal of the project is to develop an efficient and user friendly tool for semiautomated conversion of point clouds to BIM entities while maintaining user interaction. The idea is based on automated identification of entities, where each entity is labeled with estimated degree of identification reliability. The individual identified entities are presented to the user who can accept all suggested entities recognized with reliability higher than defined threshold and manually accepting/modifying remain- ing suggestions. The presented paper describes the concept of PointCloud2BIM library [1] for point cloud manipulation and BIM entity identification.

Pre-processing of the point clouds
Laser scanners produce very dense point clouds containing tens or hundreds of millions of points. The distance between the scanned points often ranges in millimeters. Therefore the point cloud data processing must be handled carefully in order to achieve competitive computation times and memory efficiency.

Storage
Point information such as the XYZ coordinates, RGB color, or intensity value is stored in an uncompressed binary file. The main advantage is that binary data can be directly read into C++ structures instead of the costly parsing of text files line-by-line. All the other project data reside in the project's JSON file, which serves as a database of point cloud metadata and other entities. The JSON format is widely supported across platforms and programming languages and was selected to allow smooth integration into existing software. An empty project file is created automatically before the first point cloud import.

Downsampling
While storing the points as binary data offers high reading speeds, with hundreds of millions of points, reading all the values into memory might not be possible on a standard computer with limited memory. For example, storing 100 million XYZ coordinates (3x64 bit float) occupies 2,4 GB of memory, which is more than half of a commonly available 4GB.
Reducing the number of points to work with is inevitable. In our case, the points were downsampled in a 3D voxel grid using the efficient implementation of a C++ hash map [4]. Reasonable selection of a grid step of say 1 × 1 × 1 cm reduces all the points within the 1 cm 3 cube to a single point with weighted coordinates.
Result of downsampling to a 1 × 1 × 1 grid is summarized in the Table 1 One can see that just by downsampling to a 1 cm 3 grid, the size of points goes down to at least 30%.

The normal vector of downsampled points
Further processing can take advantage of determining the normal vector at each downsampled point (voxel). This is done in the three following steps. A single point represents each voxel by averaging the coordinates of the inner points. There are up to 26 (3 × 3 × 3 − 1) voxels surrounding each voxel if all the neighbors are activated. Therefore, the total number of points n for the normal evaluation is at most 27.
The centroid of the n points is calculated as a weighted average of the XYZ coordinates.
The 3 × 3 covariance matrix of the point XYZ coordinates relative to the centroid is calculated using the formula for the covariance of A and B below.
The smallest Eigenvector of the covariance matrix is the plane normal [5]. The direct solver of eigenvalues was implemented for the 3 × 3 matrix using the arccos function, as described in [6].

BIM entity extraction
The overall strategy of point cloud transformation into the digital BIM representation of the building is based on a gradual segmentation of floors, rooms, walls, and openings. The overview of this complex process is demonstrated in Figure 2

Floor segmentation
The floor segmentation is based on the histogram of vertical coordinates. From the histogram, the ceiling and floor levels are extracted, corresponding to locations with significant location frequency. More accurate results can be obtained by only considering points with vertical normal in the calculation. The ceiling level can also be input manually by inspecting the point cloud in a suitable tool like Autodesk ReCap.

Room segmentation
The ceiling level of the floor is known either from the histogram calculation or from user input. All the ceiling plane points and points within 30 cm below (as suggested in [3]) are projected onto a 2D grid (floor plan) resulting in a pixel-like image. The step of the grid should be at least twice the step used for the initial downsampling, or the grid will contain missing points as some points may not be appropriately captured.
Region growing [7] is applied to join adjacent pixels into separate regions. Backfilling of each of the found regions is performed to include columns and skylights within the room. Knowing the pixel count and pixel area (step x step), area of the subject rooms or whole floor can be estimated.
Finally, the original point cloud is iterated, and points inside of the found regions (in terms of X, Y coordinates) and floor (given Z coordinate range) are selected. Separate point clouds of each room are stored in the project file. An example of room segmentation results can be seen in Figure 3.

Vertical plane segmentation
The wall planes are identified in each room using the modified Random sample consensus (RANSAC) [8].
Only points with horizontal normal are considered for plane finding, a slight deviation of 5 degrees is allowed to account for deviations in construction.
Each wall finding starts with hypothesis generation. Three non-collinear points A, B, C are randomly selected from the point cloud for which the four plane coefficients a, b, c, and d are calculated using the two vectors ! AB and ! AC To satisfy the plane equation ax + by + cz + d = 0, plane coefficients are calculated as follows After establishing the hypothesis, more points are selected randomly, and the distance from an arbitrary point to the plane is calculated. Points within sufficient distance are considered in the support set. For the first 100 accepted points, the plane coefficients are refined using the algorithm described in Section 2.3.

Wall segmentation
Wall segmentation is performed on each of the vertical planes found in the previous step. Only walls of constant thickness are supported. Therefore, comparing the angle between every two planes and the distance between the two subject planes is sufficient to mark wall segment candidates.
For each wall candidate, its points are projected to a single 2D plane. Such a transformation can be done simply without the transformation matrix. The normal vector is taken from plane segmentation by calculating the average value of all wall planes. Local X-axis is selected by picking two plane points A, B, and clipping their vector 's Z coordinate. By knowing the local X-axis and the normal vector, the Y-axis vector can be calculated using the cross product. This is sumarized below.

vol. 30/2021
Automated BIM entity reconstruction The wall local axes X loc and Y loc are then normalized, and for each wall point P, the local coordinates are calculated using the dot product. Wall candidates are filtered based on the user's overlap requirements, as partially overlapping planes are likely to belong to a common wall.

Opening segmentation
A naive way of finding openings in walls (on two parallel planes) is to find overlap of missing points on both planes. However, such treatment produces a lot of false results. For example, two cabinets on each side of the wall in the same place do not form an opening.
Points present between the walls can be taken into consideration to improve the accuracy of opening detection. By a combination of inverse region growing and boundary detection, openings with a sufficient density of intermediate points can be identified.
By calculating the bounding box of an opening in the local plane coordinates, dimensions can be calculated for rectangular openings (i.e., doors and some windows). Doors are recognized by comparing their bottom coordinates to the corresponding floor level. Opening visualized by their bounding box can be observed in Figure 6.

Demonstration on a sample point cloud
The example of a simple office building with just a single floor with eight rooms is selected to demonstrate the capabilities of the presented approach. As the whole floor was unfurnished, the overall floor envelope has little to none missing points except for the areas behind doors and heating elements. The original data set consists of 85 316 410 points is shown in Fig. 1. At first, the original point cloud was downsampled into 12 358 178 using a 1 × 1 × 1 cm grid. By inspecting the point cloud, we identify the floor and ceiling levels, and we use all the points in between ceiling level and 30 cm below for room finding. The room finding performed on a 2D floor plan grid with a 2 × 2 cm step yields all the eight identified rooms, visualized in Figure 3. In the same step, we also identify all the 93 vertical planes, as can be seen in Fig. 4. The planes assigned to their source rooms are then passed to the next step for wall matching even if some of the identified planes might be out of our interest (i.e., doors or noise).
In Figure 5, we can observe all the matched walls on the subject floor. While all the inner walls were identified, the exterior walls are missing as the exterior scan of the office building was not available. However, some of the external walls are obtained thanks to the scan capturing the window frames' internal and external planes. These reveal the estimated window positions as well as exterior wall location, excluding its wall thickness.

Conclusions
The library was implemented in C++ based on the previously formulated methodology. It can be inte-grated into any BIM software to assist the conversion of point clouds into digital BIM representation. The library API is documented in the Reference Manual [9]. The library also works as a standalone toolkit without any dependency on BIM software. A set of command-line executables is available to perform the individual steps introduced above. Each of the tools operates on top of a common database-like file (JSON) with point data stored separately in binary files.
A single floor of an unfurnished office building was successfully segmented into separate rooms (compare Figures 1 and 3). Plane matching results can be seen in Figure 4 with the 25 distinct wall segments considering the exterior windows as walls. The success of opening detection can be examined in Figure 6 where six doors were identified and eight exterior windows (two of which are found twice).

Future work
Our main goal is to create a robust and easy-tointegrate solution of converting the raw point cloud files into the corresponding BIM entities. While our library provides enough functionality to process, analyze, and visualize the point clouds and calculated results, more research is desired, especially in the area of opening detection and overall confidence rate evaluation of the reconstructed BIM entities.
Further research is needed to integrate the library into some of the industry-leading BIM applications (i.e. Autodesk Revit). Such integration is required to validate and refine our methodology using a variety of point clouds from different environments.