Geostatistical Methods in R

Geostatistics is a scientific field which provides methods for processing spatial data. In our project, geostatistics is used as a tool for describing spatial continuity and making predictions of some natural phenomena. An open source statistical project called R is used for all calculations. Listeners will be provided with a brief introduction to R and its geostatistical packages and basic principles of kriging and cokriging methods. Heavy mathematical background is omitted due to its complexity. In the second part of the presentation, several examples are shown of how to make a prediction in the whole area of interest where observations were made in just a few points. Results of these methods are compared.


Introduction
Spatial data, also known as geospatial data or shortly geodata, carry information of a natural phenomenon including a location.This location allows us to georeference the described phenomenon to a region on the Earth.It is usually specified by coordinates such as longitude and latitude.By mapping spatial data, a data model is created.We will focus on a raster data model which provides value of the phenomena at each pixel of the area of interest.Mapping is a very common process in sciences such as geology, biology, and ecology.Geostatistics is a set of tools for predicting values in unsampled locations knowing spatial correlation between neighboring observations.Making use of geostatistics requires difficult matrix computations briefly described in chapters Ordinary Kriging and Multivariate Geostatistics.In order to make our predictions easier, we are going to use methods from R geostatistical packages introduced in chapter R Basics.The best known geostatistical prediction methods are called kriging (for univariate data set) and cokriging (for multivariate data set) -examples of their use are shown in chapter Example of Kriging and Cokriging in R.

R Basics
There any many applications implementing geostatistical methods.Most of them are complex GIS and most of them are commercial.This does not hold for project R. R is a language and environment for any statistical computations and creating graphics.R is available as Free Software under the terms of the Free Software Foundation's GNU General Public License.R is multi-platform, easy to learn, and with a huge amount of additional packages that extend its functionality.For instance, such packages serve for special branches of statistics such as geostatistics.

First Steps in R
After downloading and installing R, open the default R console editor, figure 1.A standard prompt > appears at the last line.When this prompt is shown, R is ready to accept a command.Another prompt exists (+) which is for multiple row commands.There are two symbols for assigning a value to a variable: <-and =.When working in the R Console, the command is executed right after hitting Enter.If the user wishes to create a set of commands that are saved and can be used later, it is necessary to create a script (File -New script).For executing commands from a script, select all commands to be executed and press Ctrl+R.# P r i n t column Id T [ 2 ] # P r i n t 2nd column a s data .frame T [ [ 2 ] ] # P r i n t 2nd column a s v e c t o r T [ 2 , ] # P r i n t 2nd row T [ 3 , " Brand " ] # P r i n t 3 rd row i n Brand column T$Price # P r i n t P r i c e column i n t o a row v e c t o r colnames (T) # P r i n t column names colnames (T) <− c ( 1 , 2 , " x " ) # Rename columns # P l o t t i n g x <− 1 : 5 ; y <− x^2 p l o t ( x , y ) # P l o t data p o i n t s ( x , y , pch ="+") # Add new data ( u s e + symbols ) l i n e s ( x , y ) # Add a s o l i d l i n e t e x t ( 1 0 , 1 2 , " Some t e x t " ) # Write t e x t on p l o t a b l i n e ( h=7) # Add a h o r i z o n t a l l i n e a b l i n e ( v=6, c o l =" r e d " ) # Add a v e r t i c a l l i n e All necessary manuals and package documentation are stored in the Comprehensive R Archive Network (CRAN) [5].

Geostatistical Packages in R
A surprisingly large number of packages implementing geostatistical principles have been released.We are going to use only a few of them, the most common ones -geoR, gstat, and sp.However, for those who wish to explore more packages, a list of some spatial packages with a brief description is provided below.These packages were developed by different communities, therefore their functionality overlaps sometimes.Some of them are out of date and are not recommended for use anymore.Please refer to online documentation for more information on each package and its methods description.
geoR -is probably the most important package for geostatistical analysis and prediction.
geoRglm -extends functionality of geoR package.It is designed for dealing with generalized linear spatial models.
gstat -provides users with vast number of methods for both univariate and multivariate geostatistics, variogram modeling, and very useful plotting functions.intamap -provides classes and methods for automated spatial interpolation.
fields -is a package with functionality similar to the gstat package.It is useful for curve, surface and function fitting, manipulating spatial data and spatial statistics.A covariance function implemented in R with the fields interface can be used for spatial prediction.This package also includes methods for visualization of spatial data.
RandomFields -provides methods for simulation and analysis of random spatial data sets.
It also provides prediction methods such as kriging.
sgeostat -is an object-oriented framework for geostatistical modeling in S+ There are many other packages dealing with spatial data in some way.A description of all of them is beyond the scope of this paper.For more information please refer to [4].

Spatial Statistics Basics
Spatial statistics is a set of statistical tools where location of data is considered.The main goal of geostatistics is to make a prediction of data x i (i = 1, 2, . . ., n) within an area of interest A where sample observations Z i have been made.Each observation Z i is dependent on values of a stochastic process S(x) of spatial continuity in corresponding points x i .
Functions in the geoR package are based on a Gaussian model.According to [8], chapter 3: Gaussian stochastic processes are widely used in practice as models for geostatistical data.These models rarely have any physical justification.Rather, they are used as convenient empirical models which can capture a wide range of spatial behaviour according to the specification of their correlation structure.
Please, refer this book for more information on Gaussian processes.

Univariate Geostatistics
There are several statistics of spatial data that serve for general overview of the data set, show potential outliers among the observations, and describe the distribution.These features are shown in examples in section Analysis of Univariate Data.Now, let's focus on the best known geostatistical method for prediction -kriging.There are many kinds of kriging.Each type determines a linear constraint on weights implied by the unbiasedness condition2 .We are going to focus on ordinary kriging that assumes a constant but unknown mean.
In order to predict the phenomenon in the unsampled locations, we need to specify the spatial dependence.A geostatistical tool describing this dependence is called a variogram (figure 2).
From now on, we assume isotropy in our data, then the variogram is so-called omnidirectional variogram.It is defined as the variance of the difference between field values at two locations across realizations of the field [6].When shown as a plot, the x-axis represents the distance h between two observations.The maximum size h should be set such that we can expect two observations in this distance independent.The variance is depicted on the y-axis and is defined as: where Z(x i ) is an observed value of a random field and h is a distance between two observations.If the data are anisotropic, h becomes a vector and we need more variograms, each for a different angle.A variogram determined directly from the measurement is called empirical.For a prediction, we need to create a theoretical variogram that fits the empirical one as good as possible.The necessity of having a theoretical variogram lies in its continuity, so we can -range -a value of a variogram increases with increasing distance h up to a certain distance.Further than this, the variogram does not change much and we expect two observation independent behind this range, -sill -the upper value of a variogram, -nugget -the value of a variogram for zero h is strictly zero, nevertheless for the shortest distance h the variogram is computed, its value jumps from zero to a certain value (a nugget).This is called a nugget effect and it is caused, for instance, by an error of a measurement.

Ordinary Kriging
Since we have a model of spatial dependence (i.e.we know the formula of our theoretical variogram), we can predict the phenomenon in an unsampled location.Let us call this location x 0 , then where λ α is a weight for value Z(x α ) at x α .
Ordinary kriging is aliased BLUP (best linear unbiased predictor) and therefore the following conditions hold: -a sum of weights is equal to 1 (guarantees the unbiasedness of the prediction), -a variance of estimation errors is minimal.

Geoinformatics FCE CTU 8, 2012
There is one thing left to determine for the prediction -the vector of weights.
, where µ is a Lagrange parameter and C ij is a covariance between Z(x i ) and Z(x j ).A relationship between a covariance and a variogram is following: where C(0) is the sill of the variogram model.
More detailed mathematical description is out of the scope of this paper.For more, please refer to [1,2,7].

Multivariate Geostatistics
Natural phenomena from one region can show some measure of dependency between each other.In such case, we can take one variable for prediction (primary) and the other variable(s) (secondary) to enhance the prediction.This is applied in cases where obtaining data of the primary variable is expensive, technically very difficult, or for any other reason we have an insufficient number of obtained data.In that case, we can look for some dependent variables in the region which we can measure in a much easier or cheaper way.Beside other advantages, we can reveal extreme values of the primary variable at locations where its measurement have not even been made.

Covariables Dependency
We assume to have only one secondary variable from now on.In case we have the covariables measured exactly at the same locations, we can easily tell the strength of their dependency by computing a correlation coefficient and/or by plotting a scatterplot.A scatterplot is a figure with axes corresponding to values of variables, one axis for each variable.In case we do not have measurement at the same locations, the best way to reveal the dependency is to compare variograms of the variables.We use so-called coregionalization when a crossvariogram is created [1].

Cross-variogram
A cross-variogram describes correlation between covariables and is given by: where Z 1 and Z 2 are primary and secondary variables.In some cases (e.g. in R methods), a pseudo cross-variogram is computed.There are inconsistent opinions on its use [1] (p. 150), however, its advantage is to gain much more points for an empirical cross-variogram.The pseudo cross-variogram is given by: Geoinformatics FCE CTU 8, 2012

Ordinary Cokriging
Since we have the variogram and cross-variogram models, we can use ordinary cokriging for prediction.A value of a primary variable in an unsampled location is given by the following equation.
where S 1 and S 2 are sets of samples for the primary and secondary variables respectively.
The following hold: -the sum of weights λ 1α is equal to 1 and the sum of weights λ 2α is equal to 0 (guarantees the unbiasedness of the prediction), -a variance of estimation errors is minimal.
A relationship between a cross-variogram and a cross-covariance is: Then, ordinary cokriging system in matrix form is given as: , where C 11 and C 22 are covariance matrices of primary and secondary variables respectively, and C 12 is a cross-covariance matrix.
For more detailed mathematical explanation of ordinary cokriging including proves, please follow [1,2].

Sampling Density and Location of Primary and Secondary Variables
There are several cases of how the covariables can be measured: -samples of both, the primary and secondary variable, are obtained at exactly identical locations -this case is not very often because we either have a sufficient data set for the primary variable and so the secondary is not necessary to include in a prediction, or we do not have enough samples of the primary variable for creating a valid prediction by ordinary kriging and the secondary variable will not provide us with more useful information about it, -the secondary variable is measured with higher density and all the primary variable samples overlap in location with the secondary variable -this is one of the most common cases of use of ordinary cokriging that leads to the best results; the primary variable measurement is not dense enough to make a good ordinary kriging prediction, so a significantly dependent secondary variable substantially increases the density of sampled locations and enhances the prediction of the primary variable, -the secondary variable is measured with higher density and the covariables sample locations do not overlap -another very common case, however negatively effected by worse determination of the model of coregionalization which leads to not-so-good improvement of the prediction (compared to the previous case), -number of samples of the secondary variable is smaller then number of samples of the primary variable -this case is useless for getting better prediction, -the secondary variable samples correspond with all locations for prediction of the primary variable (very dense sampling) -for such a case another type of kriging is recommended -kriging with external drift [7]; the more dense the samples of the secondary variable, the harder the prediction to process when using ordinary cokriging.
See figure 3 for examples.

Spatial Statistics in R
For purposes of this chapter, sample data set from [2] has been used.These data are derived from a digital elevation model (DEM) of Walker Lake area (Nevada, USA) and are available in the gstat package, hence anyone can obtain the same data set a get to the same results.

Sample Data Set
The Walker Lake DEM has been modified for the sake of generality.There are 1.95 million points in the original data set.These points were divided into blocks of 5 by 5 points and final values were derived from them.There are two variables which we are going to use: -U variance of the 25 values given by equation where x 1 , x 2 , . . ., x 25 is elevation in meters, 3-V is function of mean and variance There are 78 000 values in a grid of 260 by 300 points.470 points were chosen across the area to represent measurement.

Analysis of Univariate Data
A sample of 100 values in regular grid of 10 by 10 points is used for a following basic data description.
We can obtain a lot of useful information when we arrange the data according to some order, plot them or make some summary statistics.Beside other things, this is good for searching for outliers and errors in the measured data set.Some methods used later in this paper, works better for normally distributed data.We can tell whether the data are normally distributed from a plot with the measurement on the xaxis and cumulative frequency on the y-axis.In case of normal distribution, the points are arranged into a line.Example of this plot is in figure 5  For further data description, we look at the summary statistics such as the minimal and maximal value, mean, median, mode, quantiles, standard deviation, variance, interquartile range, coefficient of skewness, coefficient of variation etc.. First five mentioned statistics tell us about the location of important parts of the distribution.Next three values signify the variability of the distribution.The coefficient of skewness and coefficient of variation describe the shape of the distribution and help to reveal potential erroneous observations.
We can obtain the basic statistics using the summary function.Summary statistics for V is listed in the following example.
: 1 4 5 .0 0 # v a r i a n c e > var = sum ( (V− 9 7 .5 ) ^2 ) / l e n g t h (V) 6 8 9 .6 9 # i n t e r q u a r t i l e r a n g e > IQR = 116.25 −81.75 3 4 .5 # c o e f f i c i e n t o f s k e w n e s s > CS = sum ( (V− 9 7 .5 ) ^3 ) / s q r t ( var )^3/ l e n g t h (V) −0.771 # c o e f f i c i e n t o f v a r i a t i o n > CV = s q r t ( var ) / 9 7 .5 0 . 2 6 9 The coefficient of skewness is, in this case, negative which means the distribution rises slowly from the left and the median is greater then the mean.The closer the coefficient of skewness to zero, the more symmetrical the distribution.Hence the difference between median and mean is getting smaller.
The coefficient of variation is quite low.If this value is greater then 1, a search of erroneous observations is recommended.

Variogram
In order to plot an empirical variogram, we need to set a proper distance for the lag (x-axis on the plot).When the lag is too small, the variogram would go up and down despite its theoretical increasing trend before the range distance and constant trend for distance larger than the range.When we set the lag too large, we gain just a small number of values (breaks) on the variogram curve and we would not see the important characteristics of the variogram such as range, sill etc..
In our example we set the lag for 10 m.A variogram cloud (all pairs of points) and an empirical variogram with given lag for the V variable is in figure 7.These variograms were created by variog function.The theoretical variogram is modeled with lines.variogramfunction or with an interactive tool eyefit.In our example in figure 8 we set the maximal distance to 100 m, the covariance model as exponential, the range to 25 m, the sill to 65000, and nugget to 34000.

Analysis of Multivariate Data
Since we wish to take advantage of spatial dependency of a primary and a secondary variable, we need to analyze the data sets.The goal is to examine whether the covariates are dependent enough so the secondary variable can improve prediction of the primary variable.The first thing we can try is to compare the shape of histograms.Very similar shapes (i.e.similar distribution) indicates a certain degree of dependency.
By using cor(U,V) function in R we can get a correlation coefficient (in this case 0.837).Its value is always within the interval −1, 1 .The closer to zero, the less dependent the data sets are.
In order to compare two distributions, we can visualize so-called q-q plot (qqplot function in R).Each axis represents quantiles of one data set (see figure 9).If the plotted data are close to y = x line, the variables are strongly dependent.If the data make a straight line that has a different direction than y = x, the variables still have similar distribution but with different mean and variance.
Another graphical tool for testing the dependency of two spatial data sets is so-called scatterplot.Pairs made of primary variable value and secondary variable value at the same location are visualized as points in this plot.The result is a cloud of points (see figure 10 for our example on U and V data).The narrower the cloud, the higher the degree of dependency.
A scatterplot has one another big advantage -outliers and measurement errors lie outside the cloud.We can then easily check these points and in case they are wrong we would take them out of the data set.The dependency of two variables can be approximated by linear The plot from the previous example is in figure 11.An alternative for a linear regression can be a graph of conditional expectation where one variable is divided into classes (such as when we create a histogram) and a mean of the other variable is calculated within these classes, see figure 12.
Since we explored the data sets, did basic geostatistical analysis and determined the spatial continuity and covariables dependence, we might proceed to prediction.From now on, we are going to use a new data set that is more suitable as an example for prediction by (co)kriging.

Example of Kriging and Cokriging in R
In the following part of this paper, we are going to make two predictions -one using only primary variable on its own and ordinary kriging method, and the other using secondary variable and ordinary cokriging method.We are going to compare these two methods using some graphical and tabular outputs.

Data Description
The phenomena we use in this example are simulated random fields in a square region of size of 50 pixels (i.e.2500 pixels/values in total).We randomly4 select some values and state them for measurement.After the prediction is made, we can easily compare the results with the original data set.This is not how it works in reality -we do not have values of the variable at each location of the region, that is why we do the prediction.However, for educational purposes, comparison of predicted and real values is a good way to show how these methods work and how well they work.
Simulation of Gauss Random Fields was chosen to create our phenomena by method grf in R.This method is able to create a random raster which can represent continuous spatial phenomenon.Gaussianity of the spatial random process is an assumption common for most standard applications in geostatistics.However non-Gaussian data are often provided.How to deal with this sort of data is decribed in detail in [9].
In our paper, two fields were created by function grf, each representing one variable (called A and B).A is our primary variable for which the prediction will be made.B is just an auxiliary variable for forming the secondary variable -C.C is strongly correlated with A, the correlation coefficient is about 0.93.All three fields are shown in figure 13.The R code of creating and plotting these three fields is following: l i b r a r y ( geoR ) l i b r a r y ( g s t a t ) s e t .s e e d ( 1 ) # c r e a t e s r e g u l a r g r i d o f 50 by 50 p i x e l s # t h e c o v a r i a n c e p a r a m e t e r s a r e sigma ^2 ( p a r t i a l s i l l ) # and p h i ( r a n g e parameter ) A = g r f ( 5 0 ^2 , g r i d =" r e g " , cov .p a r s=c ( 1 , 0 .grid is much more dense (see later in figure 18).Let us have a look at some analyzing graphical tools -histograms of samples are shown in figure 14, q-q plots are shown in figure 15, and a scatterplot is shown in figure 16.According to these plots, we can conclude that the samples have normal distribution and the distributions are quite similar which confirms the strong correlation of the variables.

Prediction Using Ordinary Kriging
Use of ordinary kriging in R is very simple.Once we determined the theoretical variogram we can proceed to the prediction.See the following code:    This is all we need to do to get prediction in unsampled locations when input is only the primary variable A. The results are shown in figures 18 and 19.Let us have a look how the process changes when we wish to include the secondary variable.

Prediction Using Ordinary Cokriging
A detailed description of how to process ordinary cokriging prediction in R is decribed in [3].
We already concluded that the variables A and C are spatially dependent.The most difficult step in prediction by ordinary cokriging is to set a linear model of coregionalization (in other words, to describe the spatial dependence between the covariables).We need to fit the samples into proper variogram and cross-variogram models.Follow the example in the code below:  RMSE stands for root mean square error: The best result presentation is visualization of the predictions (figure 18) and the prediction errors (figure 19).It is obvious that the cokriging prediction describes the regions with extreme values more precisely.However, we can see that the kriging prediction did a good job too.It is thanks to relatively sufficient number of samples and (more importantly) their proper layout.It is only on us to decide whether this prediction is accurate enough or not.
If not, we need to provide the prediction with samples of another variable that is highly correlated with the primary one and that has more dense sampling.The question is whether the improvement is worth the cost of the secondary variable data set.Let us pay attention to the errors figure, particularly on the middle map with real errors.We can see that in case of ordinary cokriging a red cloud of errors appeared in the middle.This is a somewhat negative impact of the C samples.Let us recall that the C variable is derived not only from A but also from B variable (figure 13) that has a large region of negative values exactly in the place where the red cloud of errors appeared.This region effected the C samples as well as the final prediction of A. This may have a dangerous impact on the prediction when using a secondary variable.This is why the degree of dependency of the covariables has to be really high.

Conclusion
Both methods, ordinary kriging and ordinary cokriging, were shown to lead to a successful prediction.As we expected, the gain of the secondary variable was obvious.However, we always need to consider the cost of obtaining it and a the quality of the prediction without it.We did much more combinations of covariables during this project that were not mentioned in the paper.We worked with yet another variable that was not so correlated to the primary one.The results in that case were not good which we expected.We tried different sample layouts for primary and secondary variable.The biggest gain in prediction was achieved when the primary data set was so sparse that prediction by ordinary kriging was almost impossible to process (we cannot create the variogram).By adding the secondary variable, the prediction gave us quite decent results.We also tried to use the same primary variable as in this paper and the secondary variable just with the difference in sample locationsthey did not overlap with the primary variable samples (their count was still about three times higher than number of samples for primary variable).This is the case where we cannot tell how good the spatial dependency of the covariables is and so it is harder to create the linear model of coregionalization.Results of such prediction were not that good as in the case presented in this paper, however we still managed to enhance the prediction of the primary variable.
This paper was originally made for educational purposes.It shows how to do basic spatial data analysis and how to predict values of some phenomenon in unsampled locations.Two methods were described -ordinary kriging and ordinary cokriging.Readers of this paper were provided with a step-by-step prediction process in R environment.
The results are shown in the R Console.The # symbol is used for comments.The following brief list of examples is the basic overview of commands to get you started in R. Please refer to countless internet tutorials for more advanced examples.# Help and p a c k a g e s h e l p .s t a r t ( ) # Load o n l i n e HTML h e l p h e l p ( f u n c t i o n ) # Show o n l i n e h e l p f o r " f u n c t i o n " ?f u n c t i o n # Show o n l i n e h e l p f o r " f u n c t i o n " h e l p .s e a r c h ( " keyword " ) # Open RGui d i a l o g f o r p a c k a g e s / f u n c t i o n s # . . .o r c l a s s e s c o n n e c t e d t o " keyword " q ( ) # Quit R l i b r a r y ( ) # L i s t downloaded l i b r a r i e s l i b r a r y ( package ) # Load " package " i n s t a l l .p a c k a g e s ( package ) # Download " package " # A s s i g n i n g v a l u e s a <− 3 b = 8 x <− c ( 5 , 2 , 7 ) # Vector y <− 2 * x^2 # E v a l u a t e e l e m e n t s o f v e c t o r x 1 by 1 , # . . .d i m e n s i o n o f x and y matches y # P r i n t c o n t e n t o f a v a r i a b l e [ 1 ] 50 8 98 # . . .r e s u l t 1 : 5 # C r e a t e s e q u e n c e [ i n t 1 s t e l e m e n t o f x x [ 1 : 4 ] # P r i n t 1 s t through 4 th e l e m e n t o f x 5 2 7 NA # . . .r e s u l t ( when out o f range , # . . .NA v a l u e i s p r i n t e d ) x [ l e n g t h ( x ) ] # P r i n t l a s t e l e m e n t o f v e c t o r # Input t a b l e from a f i l e # −−− f i l e data .t x t −−− Id P r i c e Brand 1 3 .5 G o l d f i s h Volfová A., Šmejkal M.: Geostatistical Methods in R d f i s h # −−−−−−−−−−−−−−−−−−−−− # Read t a b l e , a r e l a t i v e path can be used # \ n e e d s t o be doubled T = r e a d .t a b l e ( "C: \ \ Data \\ data .t x t " , h e a d e r=T) T # P r i n t t a b l e T T [ " Id " ]

Figure 3 :
Figure 3: Location of primary and secondary variables.Order of the examples corresponds with chapter Sampling Density and Location of Primary and Secondary Variables, red dots stand for primary variable samples, blue represents secondary variable samples.

Figure 12 :
Figure 12: Conditional expectation of V within classes defined on U values.

# c r e
a t e a g r i d f o r t h e p r e d i c t i o n g r = data .frame ( Coord1=A$coords [ , " x " ] , Coord2=A$coords [ , " y " ] ) g r i d d e d ( g r ) = ~Coord1+Coord2 # a s s i g n c o o r d i n a t e s t o v a r i a b l e A Geoinformatics FCE CTU 8, 2012

# c r e
a t e a g s t a t o b j e c t g # ( n e c e s s a r y f o r c o r r e c t u s e i n f o l l o w i n g methods ) # v a r i b l e s A and C a r e saved i n c l a s s data .frame # add A and C t o o b j e c t g g <− g s t a t (NULL, i d = "A" , form = data ~1 , data=dataFrameA ) g <− g s t a t ( g , i d = "C" , form = data ~1 , data=dataFrameC ) # e m p i r i c a l variogram and c r o s s −variogram v .c r o s s <− variogram ( g ) p l o t ( v .c r o s s , p l=T) # add variogram t o o b j e c t g # vmA_fit i s p r e v i o u s l y c r e a t e d variogram model g <− g s t a t ( g , i d = "A" , model = vmA_fit , f i l l .a l l=T) #c r e a t e l i n e a r model o f c o r e g i o n a l i z a t i o n g <− f i t .lmc ( v .c r o s s , g )

Figure 18 :
Figure 18: Ordinary kriging and ordinary cokriging for A and C (left upper -real values of A, right upper -samples (red -A, blue -C), left lower -ordinary kriging, right lowerordinary cokriging).

Figure 19 :
Figure 19: Prediction errors.Upper row for ordinary kriging, lower row for ordinary cokriging; Left: Variation of prediction, middle: Real estimation errors, right: Absolute values of estimation errors (circle -A, plus -C). 1.

Table 1 :
Frequency table for V .
, it was created in R by calling qqnorm function.Outliers and erroneous data, if some, can be observed in this plot.

Table 2 :
Basic statistics of primary and secondary variable.