Extension of mathematical background for Nearest Neighbour Analysis in three-dimensional space

Proceeding deals with development and testing of the module for GRASS GIS [1], based on Nearest Neighbour Analysis. This method can be useful for assessing whether points located in area of interest are distributed randomly, in clusters or separately. The main principle of the method consists of comparing observed average distance between the nearest neighbours rA to average distance between the nearest neighbours rE that is expected in case of randomly distributed points. The result should be statistically tested. The method for twoor three-dimensional space differs in way how to compute rE. Proceeding also describes extension of mathematical background deriving standard deviation of rE, needed in statistical test of analysis result. As disposition of phenomena (e.g. distribution of birds’ nests or plant species) and test results suggest, anisotropic function would represent relationships between points in three-dimensional space better than isotropic function that was used in this work.


Introduction
The purpose of this work is to outline the way how to implement Nearest Neighbour Analysis (NNA) in three-dimensional space into the geographical information systems (GIS) environment.At first, the article summarizes derivation of mathematical background in case of isotropic phenomenon.In the next part, there is described the module of open source software GRASS GIS [1] that was developed on the base of these relationships.Finally, the results of moduleś tests are analysed.
NNA helps to assess whether points located in tested area are distributed randomly, in clusters or separately.It can be useful in biology to monitor behaviour of plant or animal populations [2], in stellar statistics, in chemistry to analyse atomic structures, etc.In GIS, NNA may be helpful in answering questions in biology (mentioned above) or in solving social problems (e.g.crime analysis).In case of analysing vertically divided phenomena (as artifacts in archaeological trench, birds' nests, behaviour of plant or animal populations in 3D space), three-dimensional distance should be considered in NNA.This distance depends also on the difference of elevation between objects.
If we assume the phenomenon to be isotropic, it is possible to express NNA as function of distance r.It is more probable that natural phenomena are anisotropic -they behave Geoinformatics FCE CTU 11, 2013 Stopková, E.: Extension of mathematical background for NNA in 3D space differently in horizontal and vertical direction.NNA then could be a function of horizontal distance and vertical difference of elevation (or zenith angle).In another case, the phenomenon may be anisotropic in three dimensions.

Keynote
NNA in three-dimensional space was outlined in proceedings [3] and [4].The main principle of this work is based also on [2] that describes NNA for two-dimensional space.The goal of this article on theoretical level is to complete the idea of statistical testing in three-dimensional space.
There are N points located in the area of interest.Their average isotropic distance between the nearest neighbours r A is expressed using arithmetic mean.If these points are randomly distributed, average isotropic distance of the nearest neighbours in three-dimensional space is expected to be (according to [4], formulas (674), (675)): where n is density of points in volume unit and Γ( 43 ) is the gamma function1 .This formula is based on the probability that the nearest neighbour of a point is located on the boundary of its spherical surrounding with radius r.More detailed explanation of the topic can be found in [4] or in [3].
The measure of degree to which observed points are randomly distributed R is expressed as the ratio of observed and expected distance [2].It may acquire these values: R = 1 → points are distributed randomly, because r A = r E R = 0 → points are identical, because r A = 0 R = max → it is necessary to derive maximal value of ratio R. According to [2], if points are located on a hexagonal pattern, R in two-dimensional space reaches maximal value where k means the number of segments in circle of infinite radius with a center in the observed point.The results of module testing (Chap.6), as well as the results obtained using analytical tool Average Nearest Neighbor [6], do not correspond with this value.Determining of R maximum will be purpose of further work (Chap.7).

Test of statistical significance
Derivation of standard deviation of the average distance expected in case of randomly distributed points σ r E in three-dimensional space belonged in the purposes of this article.σ r E is needed to enumerate test statistic of Student's t-test of significance of the mean [7] that helps to assess the statistical significance of the deviation from a random distribution of points (clustering or, on the other hand, separation).Method of derivation (Appendix A) is analogical to method for two-dimensional space [2].
Null hypothesis H 0 : r A = r E means that points are randomly distributed in the space (according to [8]).Alternative hypothesis H a : r A < r E ∨ r A > r E and statistical tables (e.g.[9]) of the cumulative standard normal distribution are base for determination of critical values W : If null hypothesis about randomness of distribution in point dataset is rejected with positive values of the test statistic c, the points are assumed to be separated (patterned).If the values of the test statistic c are negative and the null hypothesis is rejected, the points are assumed to be clustered.

Implementation of 3D Nearest Neighbour Analysis
The theoretical background described in the previous chapters was implemented in the module v.nn_spatial_stat developed in the environment of open-source software GRASS GIS [1].
The module contains functionality of NNA in two-dimensional space (implemented also in the analytical tool [6] of ArcGIS [10]) and solution for objects in three-dimensional space.
The most significant complications during development were connected with determining of Minimum Bounding Box (MBB) 2 .
In two-dimensional space, the density of points depends on the area of surface where the points are located.According to [11], the area could be set up by user or it is possible to use area of Minimum Bounding Rectangle (MBR).Analogically, the density of points in three-dimensional space could be determined using volume of box set up by user or volume of MBB.
Methods how area (volume) of MBR (MBB) can be determined are principally quite similar: -Coordinates of convex hull3 that covers the point set must be obtain.In 3D case it is necessary to know also reference of each vertex to the faces.Partially modified functions of the module v.hull [13] that enables to build convex hull in new vector layer were used.Output of modified functions is represented by the list of coordinates of vertices (or faces), not by new vector body.
-The coordinates of vertices must be transformed to coordinate systems -which x axes are parallel to lines between neighbouring vertices in 2D case where σ is bearing of the line, -which xy planes are parallel to planes of faces of the convex hull in 3D case where the angles σ x , σ y may be expressed: and x, y, z are coordinates of vertices belonging to triangular face of convex hull.
Transformation matrix is based on rotation of x and y axes: -Extent of transformed coordinates should be determined, -Area / volume of the extent must be counted and the values should be compared to obtain the smallest one.This value becomes input in determining the density of points.

Testing of the module v.nn_spatial_stat
For points located in two-dimensional space, the module was tested comparing numerical results with outputs of the analytical tool Average Nearest Neighbor [6] that is part of the Spatial Statistics toolbox in the software ArcGIS 10.1 [10].3D variant of NNA is not implemented in any of accessible softwares.That was the reason to verify numerical results by comparing them with values computed by scripts in the software Mathematica [14] and Matlab [15].

Testing in two-dimensional space
The module was tested on various sets of synthetic data.Configuration of observed points in the area of interest 20 km x 20 km was designed to represent all possible cases: randomly distributed points, separated points and clustered points.
The samples of randomly distributed points were generated in the software GRASS GIS [1] using the module v.random [16].Except numerical accuracy, process speed was tested too.The condition of maximized separation in two-dimensional space is accomplished by points arranged in the pattern of equilateral triangles, i.e. the nearest neighbouring points are located around the observed point in the shape of regular hexagon (proofed in [2]).Two datasets were generated, seven points arranged in the hexagon with centre and many points arranged in the hexagonal pattern.Coordinates of the points were computed in the Matlab [15] environment.Tables 2a and 2b

Testing in three-dimensional space
Nowadays, there is no accessible tool for process NNA in three-dimensional space, so it is not possible to compare the results of the module with outputs of any verified software.That is the reason to control the results using parts of code scripted in mathematical software Mathematica [14] or Matlab [15].This method helped to verify numerical accuracy and to repair few bugs.

The most complicated task was verification of volume of Minimum Bounding Box (MBB).
This value is based on coordinates of vertices belonging to faces of convex hull.Coordinates of Stopková, E.: Extension of mathematical background for NNA in 3D space convex hull are transformed to coordinate systems with plane of x and y axes parallel to plane of each face.Output is volume of the smallest box extending the transformed coordinates.
Functions of module v.hull [13] have been used to obtain vertices belonging to faces of convex hull.These functions were modified to output not new vector layer with convex hull but matrices containing coordinates of vertices and faces.Numerical accuracy of transformations and determining of volume of MBB were verified comparing results of module to outputs computed by Matlab [15].This method enables also to verify coordinates exported of convex hull that we suppose to be created correctly.
Other functions of the module were tested while debugging or they were tested as part of 2D NNA functionality.For example, the function for computing average distance of the nearest neighbour r A that is identical for 2D and 3D case (only input z coordinates differ, they are zeros for two-dimensional space).Numerical accuracy of formulas for determining expected average distance between the nearest neighbours in set of randomly distributed points r E , ratio R and test statistics c were tested comparing to results obtained by Mathematica [14] while deriving of formulas and debugging.
The results of testing randomly distributed points are summarized in tables 4a and 4b.The same datasets as while testing in 2D space were used but z coordinates were considered.n = 2000 v.nn_spatial_stat Matlab [15]  6.407 Table 5b: The results of testing of the module v.nn_spatial_stat in three-dimensional space using clusters of points with longer distances from local centres

Outline of future work
It will be appropriate to enlarge testing of the module adding sample of points with maximal separation in three-dimensional space.Analogically to two-dimensional space where neighbouring points are arranged in hexagon around observed point, it will be necessary to find convex body with vertices located on equilateral triangles and plane cutting its vertices and center should also be composed of equilateral triangles.Body fulfilling condition that distances between vertices and centre definitely cannot be composed only of regular hexagons (proofed e.g.[17]: vertex of body is intersection of three polygons (faces) and sum of interior angles must be less than 360 • ).Except that, in case of putting few pieces of these bodies together, there should be no empty spaces between them.Analysis of properties of truncated regular Plato's bodies described in Timaeus [18] or semi-regular Archimedes' bodies [19] will be purpose of future work.
The next item is to develop the mathematical background to better model fact that most of the phenomena may behave differently in horizontal and vertical direction.This analysis will be based on derivation of average distance of the nearest neighbours expected in case of

Table 1a :
The results are summarized in tables 1a and 1b.The results of testing of the module v.nn_spatial_stat in two-dimensional space using 2000 randomly distributed points Stopková, E.: Extension of mathematical background for NNA in 3D space

Table 1b :
[6] results of testing of the module v.nn_spatial_stat in two-dimensional space using 5000 randomly distributed points According to the test statistic c ∈ (−1.96; 1.96), it can be assumed that in both cases the points were randomly distributed.Differences in the values of MBR area may be caused by the different way of storing data in computer memory, as the results of next experiments show.Significantly shorter processing time (compared with the analytical tool Average Nearest Neighbor[6]) may be reached, because the module does not generate report with graphical outputs.

Table 2b :
[15]arize the results of the tests.The results of testing of the module v.nn_spatial_stat in two-dimensional space using maximally separated points (hexagonal pattern) Because of the test statistic c ∈< 2.58; ∞), null hypothesis about random distribution of the points is rejected on the confidence level α = 0.01.The points are separated.Clusters of points were created locating points around each of sample of n 0 randomly generated points.Coordinates of new points were computed in Matlab[15]using bearings with step 36 • and random distances (table 3a, table3b).

Table 3a :
The results of testing of the module v.nn_spatial_stat in two-dimensional space using clusters of points with shorter distances from local centres

Table 3b :
The results of testing of the module v.nn_spatial_stat in two-dimensional space using clusters of points with longer distances from local centres Null hypothesis about randomly distributed points may be rejected on confidence interval α = 0.01 because c ∈ (∞; −2.58 >.Negative values of the test statistic c indicate that the samples are clustered.The sample in table 3a in which clusters were generated using random distances with normal distribution N (30, 10) is characterized by significantly lower value of test statistic c as the sample in table 3b in which distances with normal distribution N (300, 1000) were used.