An Algorithm for Investigating the Structure of Material Surfaces

The aim' of this paper is to surnmnrize the algorithm and the experience that haue been ach'ieued in the inuestigation of grain structure of surfaces of certain materiak, particularly from sarnples of gold,. The main parts of the algori,thm to be discussed are: 1 . a,cquisition of input data, 2. localization ofgrain region, 3. represmtation of grain size, 4. represmtationofoutputs(postprocessing).


I Introduction
Data describing the grain structure and shape of a surface of a material sample are obtained using STM (a type of microscope), see Section 3. STM has been selected on basis of experience gathered from many publications on this topic.
For example, in scanning of gold islands grown on MoS2 sur- faces STM proved that this approach is powerful, [7], [9][10].
The input data for this algorithm are taken from STM in the form of a gray-scale image.The aims of the algorithm are: o to localize boundaries between adjacent grains and to local- ize grain regions (for one grain there is one and only one region), o to depict a graph of the found probability distribution of grain size.
The localization of grain boundaries may be applied to an investigation of any materials with a surface stnrcture similar to a grain structure.The studied materials have to fulfill basically the condition to be conductive, as STM is based on the principle of a tunnel current between the microscope and the sample.For the localization, segmentation techniques are used in a wide range of classical image analysis problems.For example, the task of segmenting cell nuclei llom cytoplasm is a classical image analysis problem, which may prove to be crucial to the development of successful systems that auto- mate the analysis for detection of cancer of the cervix, [4][5][6].
As soon as the boundaries between grains are localized, they may be used for various calculations.Within the scope of the algorithm described in this paper, the data describing the boundaries are used for calculating the probability distribution of grain size.The gray-scale images, providing information on the grain structure of the sample surface, are often very noisy and have to be smoothed (i.e., their quality should be improved) with the use of various filters.Unfortunately, the use of such filters leads to the partial loss of important detailed information.This partial loss of information and high demands on calculations lead to an inaccurate resulting description of the grain boundaries.For this reason, the data representing the grain boundaries are used for calculating the probability distribution of the investigated parameters of the grain structure.If the quality of the grey- 54 -scale images were improved, the results of the algorithm described in the following text would be enhanced.
The grain size is expressed by the grain radius (it is as- sumed that grains can be approximated by a sphere).The outputs ofthe algorithm are represented by a graph.The gray-scale images suffer fiom technical deficiencies of the microscope, but they are not relevant for the calculations.
The images of the grain structure of gold samples are used for illustrating the algorithm.
2 Transformations applied to the gray-scale images This section describes important transformations, such as the square median filter, Gaussian low-fiequency filter and local equalization, which are widely applied to gray-scale images.Further information about transformations and digital image processing in general is presented in many books, e.g., [-2].
2.1 Square median ft.lter When applying a square median filter (further in the text referred as the "square median filter") to the image, one has to carry out the following steps for each point (pixel)   The entire buffer will represent the image after transformation.
The result of the application of the square median filter is seen in Iig.l.

Gaussian low-frequency ftlter
The application of the Gaussian filter to the gray-scale image involves convolution of the image with the matrix filled with the values given by the following-functior,, .((-0.si'(""t". of diffusion.value of difrusion))(tr"Xl l))), where [x,1] indicare indices ro rhe marrix and [0,0] is the cen_ ter of the matrix.
The Gaussian filter removes the high frequencies formed due to mistakes in scanning.The result of the application of the Gaussian filter is illustrated in F-ie. 2. If /r .zr is the size of neighborhood of a given point for local equalization (or the image size for global equalization), then rve substitute the intensity of rhe cenrer of neighborhood (or each intensity ranged fiorn 0 to 255 for global equal- ization) by such an intensity obtained from rhe fbllon'ing formula: . rvhere I is the given intensity, h,a are given real numbers, A is a sought intensity for the above-mentioned substitution, and the sum on the right hand side of the above formula is the discretized integral on rhe left hand side of the formula.
The result of applying local equalization is shown in fig. 4. 3 Technology used for acquisition of gray-scale images The gray-scale images (input data to the algorithm) of a grain structure are collected by STM in the case ofconductive materials.In the case of different materials, a diflerent micro- scope has to be used.The basic parts of STM are: a standard scanner with nvo piezoceramic tubes having outer and inner electrodes and with a scanning tip, and the carriage with a material sample.
The scanning of the sample surfaceworks on the principle of tunnel current between the sample and the microscope.
'l-he voltage is fed to the carriage.The level of the tunnel cur- rent depends on the amount of voltage and on the distance between the sample surface and the scanning tip.The closer the point at the sample surface is to the scanning tip, the stronger the current.The magnitude of the currenr intensity is stored to the grav-scale intage that serves as an input to the algorithrn.
The scanned area is fixed.The voltage on the outer labels of the piezoceramic rubes of STM is controlled by the size of' the scanned area.
Detailed information about the technology used and about the microscope used is presented in manyjournals and books, see, e.C., [7], tgl, il0l. 4Characteristics of the gray-scale images (input data) Technical imperfections of STM result in relatively noisy images rvith undesired artefacts, for example: the reaj srain boundaries vanish and unreal boundariei arise, and irany artificial local extremes in the image are generated (because of the resulting roughness of the intensity distribution).The rough boundaries are formed parricularly by breakdown vi- brations of the microscope-scanning tip.
It may be assumed from reality (rvhen ovals approximate grains) that the local intensity peaks are in accordance with the grain peaks, and the local intensity minima coincide with the grain boundaries.T'he intensity values between rhe peaks and the boundaries gradually decrease from the local intensity peaks to the local intensity minima.
The average grain size depends on the size ofthe scanned area.The smaller the scanned area of the sample, the bigger the grains are in the image, and there are fewer grains in the image.The larger the scanned area, the smaller the grains are in the image, and the grearer the number of grains in the image.'fhe images are distributed to the following three categories: o Category I (big scanned area, small grains), o Category 2 (the intermediate case), r Category 3 (small scanned area, big grains).The boundaries between categories are defined experi- mentally, as illustrated in Fig. 5, Frg. 6, and Fig. 7.
The local average intensity near the points varies with their position in the image, and with the grain size (see .
The part of the algorithm for calculating the probability distribution of the grain size assumes that the grains in the sample have a non-ideal sphereJike shape.If this shape can- not be assumed for any reason, the method of description of the local curvature cannot be used by the procedure estab- lished in Section 6.

The algorithm
The input to the algorithm is a gray-scale image (input image), which represents the surface of a trial sample, see Fig. 8, for example.
The first step of the algorithm is to determine the param- eters for the pre-processing of the input image.This involves assigning the input image to category 1,2, or 3as shown and defined in the forthcoming Section 5. l.
The next step of the algorithm is to investigate the local intensity extremes of the input image and use the local in- The localization begins at all local intensity peaks.Intensity minima serve as stop conditions of the localization.
The data of the grain boundaries are used for calculating the probability distribution of grain size in the final step.
There are two methods defined for the calculations: the method of the nvo most distant points and the method of description of the local curvature of grain boundaries.
Output data with the probability distribution may be represented by a graph, or by a simple data summary.
The overview of the algorithm is shown in Fig. Theaim of this algorithm step is ro match the input image to the above-defined caregory (recall that we consider thrie categories, cf.Frg.5-Z), and to select the size of the square neighborhood for subsequent use in the pre_processing ofthe input image and in the search for local-peals in thislmage.
The neighborhood may also be determined manually with respect to the value entered by the user _ the average size of grains in the image may be improved by hand.tion 2.1).The image is then to some exrenr free of disturbed grain boundaries and free of artificially generated bound_ aries.It has been shown experimentally inut it i, more conve- nient to run the "square median filter'.several times in a small neighborhood than only once in a larger neighborhood.
After applying the "square median filter',, ,,local equaliza_ tion" is applied to equalize the local brightness changei in the rmage and to uni$ and emphasize information about the grain.boundaries and grain peaks.Before the application of "lor:al equalization", it is possible only to claim tirat potential peaks ofgrains (as starting points for the localization ofgrain boundaries) are local intensity maxima.This statement is weaker than after applying "local equalization".After apply_ ing "local equalization", it can be claimed that the local inien_ sity peaks from a certain intensity are potential grain peaks.
It has been shown experimentally thar,,global eluaHzation', (for more details see 2.3) applied ro rhe image after ,,local equalization" delivers bemer results.Fig. 12 shows the image after application of the "square median filter', and ,,local equalization".In this step of the algorithm, local intensity peaks in the image are searched (after pre-processing of the input image).
The peaks are raken as the basis of grains for segmentation of grains by the flood-growing process.Local peaks are searched in the square neighborhood of a defined size (defined within the categorization of the image).

Search for grain boundaries in the image
This section describes in detail the part of the algorithm for localizing grain boundaries.The transformations (of images) mentioned in this secrion are described in detail in Sec- tion 2.More information about algorithms for localizating boundaries betlveen objects is presented in many books and journals, see, e.g., tl-21, t4-61, t8l.The search for boundaries consists of two steps: First, the local intensity peaks revealed in the previous algorithm step are raken as the basis for potential grains, and the grains are segmented by the flood-growing process with u a s] 75 1{n r25 150 175 :00 t25 t50 inEnsiti/ The category is determined fiom the hisrogram of the input image after applying "local equalization'; (neighbor- hood 7 -for more details see Section 2.3) to the input image.
It has been shown experimenrally that smaller grains have a smoother course in the resulting histogram.-fhe coefficient of the smoothness of the resulting histogram is calculated as the sum of the differences of neighboring values in the histogram.The category of the image (and the necessarl,size of the square neighborhood used by subsequent algorithm steps) is determined.
5.2 Pre-processing of the input image trirst, the input image is partially freed of noise by using the "square median filter" (for detailed information see Sec- the criterion of decreasing intensity of neighboring points. This concept is applied because the grains are approximately oval.The tops of the ovals have the highest intensity in the region of the grain (as they are closest to the microscope tip during microscope scanning).By contrast, points on the boundaries between grains have a locally minimal intensity (as they are furthest fiom the microscope tip).
In the second step, some areas must be linked togethe6 as many undesirable intensity peaks could be due to artefacts that were not eliminated during pre-processing of the image.As regards artificial intensity peaks, during segmentation the grain may be divided into a lalge number of smaller areas (than the real region of the grain).The grains are linkedwith the use of experimentally defined criteria.
Segmmtation rf goi^ with Jlood-growing In this step the algorithm may be summarized as follows: l.The local intensiry peaks are taken as one-point-areas of grains.These peaks have certain intensities (grainlnt).
2. The global intensity counter is initialized at the maxi- mum intensity.The counter can be denoted as globlnt (globlnt =255).
3. AII areas of grains are cyclically tested until at least one neighbor of the point ("trial poinC') in each area is found that does not yet belong to an area and which has intensity equal to the counter globlnt value.Let us denote this found neighbor as the "eligible point".This "eligible point" is added to the area of the above-mentioned "trial point" and the cycle is repeated until the end of one loop.The cycle is stopped when there is no "eligible point".

4.
Globlnt is decreased by one.If the globlnt counter is greater than the lowest intensity 0 (0 is taken as the image background), the algorithm continues fiom step 3, otherwise the algorithm for the flood-growing process is stopped.Now let us show that the algorithm fulfills the following: o each point in the image is added to a grain, o the algorithm realizes the concept of the flood-growing process ofthe grains from the found peaks, o no grain overlaps another grain in the actual implementation of the algorithm described in this paper and partial areas ofgrains touch at the real boundaries ofthe grains.
Local intensity peaks are selected in the whole range of the image, so the intensities of the neighbors of the local peaks may only be lowe6 or the same, but cannot be higher.
The points on the boundary between nvo grains are local minima.The points between the boundary of a grain and the peak ofthe grain belong to the region ofthe grain, and their intensities are between the intensity of the peak and the intensities of the points on the boundary.Points belonging to no areayet in the image are added to areas when they are neigh- bors of an existing area and their intensiry is equal to or greater than the globlnt counter value.The value of the globlnt counter is decreased in each loop according to the steps of the algorithm.These substances demonstrate that points with equal or lower intensities than the intensities in the scope ofthe given grain are added to the area ofthe given grain and the nvo grains are in contact at their common in- tensity boundary.Thus, the areas of the grains are segmented 58 with the flood-growing process from peaks to intensity min- ima in the scope of the grain with the criteria of decreasing intensity.Now we will show that in an ideal image one area should not overlap another area.Let us take a particular grain (cur-  rent grain).If the points of the neighboring grain were added (in an ideal image) to the current grain, the intensities of the attached points would not be lower than or equal to the intensities in the current grain.The function of intensity between the peak and the boundary would not decrease, but it would rise, and this fact would be contradictory to the assumptions and to the implementation of the algorithm.Another benefit for the algorithm is that the meeting of two-grain regions is at the boundary where the intensities are equal to a value of globlnt.All neighboring points have a higher intensity, so that at the time when the boundary is reached, the neighboring points have been added to an area, and they cannot be added to another area (at least in an ideal image).A real input image can contain small mistakes in the course of the intensity function.
As globlnt ranges from 255 to 0 and as each point in the image is tested to be a neighbor with an intensity equal to the globlntvalue ofan existing region ofa grain (in each phase of the cycle, if the point is not yet a member of a region), at last at globlnt = I each point (with an intensity different fmm 0) must be added to the region of the current grain.Points with zero intensity are taken for the background of the image.
Iig. 13 shows the output of this step of the algorithm.Linking of segments of grains (in the serue of specifed criteria) Due to technical imperfections of the scanning micro- scope, there are many artificial local intensity peaks in the input image.For this reason, the algorithm finds more areas than desired (corresponding to realiry).Calculations of the algorithm can divide the region of the grain into several smaller areas.From the previous text we know that in the range of one grain the average intensity of the points inside the region ofthe grain is greater than the average intensity on the boundary of the grain, and is lower than the intensity of the peak.Because ofthe assumed oval shape of the grains and the application of "local equalization" on the image it is possi- ble to determine, from the intensities of the areas boundaries, whether the boundary between two areas is artificial.The algorithm and the criteria can be categorized as follows: l.First, two areas are found in such a way that they comply best with the following criterion.This is the basic criterion for linking the segment areas of the grains.For each n',ro areas, the intensities of all points on their shared bound- 3V (on the boundary belonging ro one grain and on the boundarT belonging to the othir grain) are received and the average of all the intensities ls calculated.The two areas.tharhave the highest average comply best with the crlterlon.This highest average is considered and compared with the experimentally given consranr limir.This limit deter- mines the value of the average when linking two areas (with the boundary average above the limit) and not link_ ing (with the boundary average below the limit).If rhe highest average is lower than this limit, the aleorithm is stopped, as the other lower averages are also lower than the limit.As long as the average iJgrearer than the limit, the algorithm passes to the next step.The determination of the limit based on experimental observations is without loss of generality, as unificarion (with the help of .,localequalization") of important intensities in the siope of the grain does not grearly depend on rhe given imagc.
Finally, the auxiliary crirerion for linking rhe sesment arcas of the grains is tested.'l'he points on the c.irmon bour"rdary of the two selected areas of the srains are considered, and the averages of the intensities of their boundary points for each area are nol{ calculated sepa- rately.As each grain has its own boundary, the boundary benteen two grains is double pixeled.One boundary belongs to one grain and the second to the other grain.
In section I, the average is calculated together.The two averages are compared.If the values of the averages are similar, the two areas are linked to a single area and the algorithm returns to section l.If the "aiues of rhe aver- ages are different, "local equalization" fails and these areas should not be linked.-l'he {act that these t$,() areas are not to be selected in tl-re fbllou'ing stages is marked in the memory, and the algorithm rerurns to section l. l-ig.14 is a sketch of the fir.ral grain boundaries.
6.1 Method of description of the local curaature of grain boundaries This method assumes that grains can be approximated by spheres.In this method, the size of a grain is-measured by its radius.
If this assumption is fulfilled, we can suppose that in verti- cal scanning of sample surfaces, the resulting boundaries are almost circular in shape.If the grains overlip, the resulting boundaries are composed ofconcave and convex curves and form a closed curve (at least in the case of an ideal image).The concave parts of the curves are formed by cuts ofihe mutual overlaps.Consequently, only the points of the convex parts of the boundaries are considered for calculations of the radius of the sphere grain.These are points on the real sur- face ofthe grain, and not points generated by overlapping.
Theory for calculating the radius of a sphere (circte)   If the boundaries betrveen grains are approximated by an ideal circle (with radius r), then the points of this circle can be parametrically described by the following function: where I is the parameter of 0 to 2xr and x (t) = r .cos(tlr), ) (t) = r .sin(tl r) .
Cun/ature cut'u at the point of function f (t) =lx(t), y(t)l is defined as: c u rulf 1 t)f= # r,, =l# r,r, 0'"t .6 Calculation of the probability distribution of grain size Data on the gr-ain boundaries are usecl for calculations of the probability distribution of grain size.Jwo methods are suggested and implemented for implementing these calculations: . method of description of the local curvature of grain boundaries, o method of the two most distant points.
The results of these calculations may be illustrated in the form of a graph or written in standard output.
The extent of rhe curvarure at points *(t) for I = 0 to 2nr is: Vunlk(t\ll=.E, ro, = E-.' L\/r' Vr' \lcunLft(t)ll Algorithm for calculating the radius of a sphere (circle) For each point on the boundary of a grain the following steps are carried out: l.Differentiating the boundary cun'e twice, the resulting gladients are used for determiningwherher the trial point is in the convex or the concave part ofthe boundary curve.
2. If the point is in the convex part of the boundary curve then the curvature value is calculated.The curvature value is calculated with the use of numerical methods based on n neighboring points (Lagrange, Tschebysheff;.The radius then follows from the above-defined formula and is stored in an auxiliary bufler.
The most extensive value of the radii is calculated fiorn the auxiliary buffer.The most extensive value is declared as a resulting radius of the grain and pur inro the resulting graph of the probability distribution of grain size.
Remarks on implementation: It is necessary to smooth the boundary curve ofthe grain before calculating the local cur- vatures, as calculations of the local curvature are based on the second diflerentiation (using n neighboring points), which are numerically ill posed.
Curvature at the point of the h(t) for f(t) is given by: c untfh(t)l= I--+,i"r1'.],The parametric curve is determined in n-dimensional space by the formula for x(l) = [rt(l) , . .., ,c"(t)), where f is a parameter of the length of the curve.The principle of smoothing involves multiplying the second derivative at the trial point x(t) by parameter I and adding the result to the point itself: where I is entered as a very small number (0.01).

Eualuation of this mcthod
This method provides good results on ideally generated circles, but in practice, in many cases, the results are worse.
The input images are often very noisy and calculations of the second derivative fail, i.e., they are numerically ill posed.
6.2 Method. of the two rnost distant points This method consists in determining the radius of a grain as the maximum of the distances between each hvo points in the domain of the grain.The maximum distance is stored into the probability distribution graph of grain size.If the grains are approximated by spheres (circles), the distance is divided by nvo to get the sought radius.

Eualuntion of this mcthod
This method provides good results in practice.It may fail only when the grains overlap excessively.This is not a fiequent case, especially because of the assumption that the grain surfaces are spherical.

Results of tests
To test the algorithm, a C++ program has been created.
Images with the grain stmcture of the surface of gold samples are used as input images for the algorithm.The generation of ideal circles is also inserted to the program.When imple- menting the algorithm, emphasis is put on the functionality ofthe program and on speed ofcalculation.The processing of one image takes about 20 seconds, and the boundaries between grains are found to be almost correct.Worse results are obtained when calculating the probability distribution grain size.The tests were performed with approximately 100 different images.
In the final evaluation attention should be paid mainly to the following stages of the algorithm: o categorization of the input images, .pre-processing of the input images, o search for grain boundaries, o calculation of probability distribution of grain size.
As the input images are very different, they are distributed to the three categories (see Section 4).Each category is assigned coeffrcients for pre-processing the input image, local transformations and searrhing for the grain boundaries.The input image can be taken back to one of the categories manually by the user or automatically by the algorithm.
Pre-processing of the input images is performed by means of standard transformations.In this stage of the algorithm, 60 emphasis is put on enhancing the quality of the input images.
The input images are very noisy and include many artificial artefacts.Without this stage, the algorithm would perhaps fail.
The stage ofsearching boundaries between grains works very well, even with very poor quality input images.The boundaries correspond to the real boundaries that can be seen.This is good for the algorithm because the following stage of calculating the probability distribution is very de- pendent on the results of this stage, especially in the case of the method of description of the local curvature.
The graph of the probability distribution of grain size given by the method of the two most distant points is shown in Fig. 16.Iig.l7 is the same graph, for the method of description of the local curvature.The input image for the algorithm is taken from Fig. l5 with ideal circles.trig.l6 and Iig.l7 show that the method of two most distant points

Fig. 1 :
Fig. 1: Image after applying the square median filter

Fig. 8 :
Fig. 8: View of the gray-scale input image tensity extremes for the localization of grain boundaries.
Fig. 9: Chart of the algorithm 30

Fig
Fig

Fig
Fig. I I : Histogram of an input image in Caregory 3 Auto m atic c ate goriz atio n

Fig
Fig. I5: Image with ideal circles (radius 25 pixels) in the image: o get intensities in the square neighborhood (with the given size) of the point, o find the median value of the intensities.ostore this median value to the bufler at the appropriate co- ordinate that belongs to the point.