MODIFICATION OF HIERARCHICAL AGGLOMERATIVE CLUSTERING WITH COMPLETE LINKAGE FOR OPTIMAL USAGE WITH PASSIVE OPTICAL NETWORKS

Today, passive optical networks (PONs) represent a modern solution for high-speed FTTx subscriber lines and networks. Due to the fact that optical fibres and cables are deployed in last-mile network segments. Unfortunately, the costs of trenching and installation of optical cables in cities are still very high today. However, the application of mathematical methods and algorithms for designing optimum network solutions and topologies might significantly decrease the overall capital expenditures (CAPEX) of the whole process. This article introduces an efficient method based on agglomerative clustering for an optimization of deployment of PONs in practice. The paper describes its application and proposes the necessary modifications of the clustering method for the environment of PONs. The proposed algorithm uses the technique to cluster and connect the adjacent optical terminals and splitters of PONs and to calculate the optimum locations of these units in order to achieve the most optimum network solution and to minimize the summary capital expenditures.


Introduction
Nowadays, the demands for transmission speed and overall transmission capacity of access networks are rapidly increasing [1,2]. One of the most promising network solutions for building modern FTTx lines and networks are passive optical networks (PONs) [3]. A typical PON consists of one Optical Line Termination (OLT), which acts as a central unit, and Optical Network Terminals (ONTs) or Units (ONUs), which are located at the end-point users and connect these users to the PON itself [3,4]. The Optical Distribution Network (ODN) is then composed of all optical elements between OLT and all ONTs including optical fibres, patch cords, connectors, and especially passive optical splitters [3,4]. Since all these components are typically passive (no power consumption, no management), no optical signal regeneration, routing nor switching is possible between OLT and ONTs, therefore, the ODN must always be properly designed and optimized [3,4]. However, there are also economical aspects and criteria, which need to be addressed during the process of planning of the ODN and its topology in practice [5,6]. Today, the costs of deployment of optical fibres and cables, especially the trenching costs and the installation costs, are still very high. The last aspect, which needs to be addressed during the designing and planning of realistic ODNs and PONs in practice, are the locations and placements of all optical components in real networks and situations [7,8]. Generally, the ODNs are built mainly in cities, suburbs, residential areas, industrial and commercial objects, because the optical fibres are mostly trenched or installed along the roads, pavements [9], in drains and subways, using existing street lights, telephone poles, etc. It is also necessary to adequately place passive optical splitters and the rest of the network elements. It is evident that the designing of passive ODN in order to meet all the network criteria for specific PON version is an important process, which needs specific calculations and designing methods [10]. There are various existing techniques described in numerous scientific papers focused on the optimization and planning of PON topologies and networks, the most important ones will be briefly discussed in the following section. This paper, however, is focused especially on clustering of ONTs in order to obtain optimum locations of passive optical splitters and optimum network topology. Through performing a correct cluster analysis of ONT units and by designing their proper clusters, the optimum placement of passive splitters can be achieved and the overall network topology planned. The solution described in this paper is based on a hierarchical agglomerative clustering algorithm with implemented modifications and innovations. The proposed algorithm is primarily focused on the number of ONTs in formed clusters in order to optimize the splitting ratio of splitters. As the passive splitters greatly increase the summary attenuation in the PONs, the optimization of their positions and splitting ratios also play an important role. The proposed algorithm was also tested through typical realistic scenarios and the results are included in this paper as well.

Literature review
This section brings a short review and comparison of existing and proposed techniques and algorithms for designing optimum PON topologies and networks. Various existing studies have investigated methods for the optimization of network deployment including optical network and PON deployment. One of the very first methods was to apply a heuristic algorithm together with a minimum spanning tree algorithm in order to optimize the fibre length in the PON deployment, as presented in [11] and further improved in [12]. However, this solution does not use real map data and the computation time of heuristic method is also quite long, therefore, large scenarios must be divided into smaller ones, as illustrated in [8]. Another approach is to use ant-colony algorithms, as described in [13], and their combination with a metaheuristic method and updated techniques in [14]. In [14] and [15], an antcolony method is used to find optimum locations of passive splitters to solve the optimization task of passive optical networks. However, the complex solution of the whole network topology is not provided, and moreover, no further optimization criteria are implemented for a multi-parametric optimization process. Similarly, [16] and [17] propose the application of genetic algorithms and genetic oriented techniques, however, the optimization is oriented towards optimum optical topology without the possibility of CAPEX minimization option or attenuation minimization decision.
In order to offer a complex PON optimization, [9] and [18] provide the combination of various algorithm techniques and methods to solve this problem. In [9], all end-users in the designed PON topology are first separated into clusters, then the clusters are connected to the OLT using splitters. Similarly, in [18], the hierarchical cascading technique of passive splitters is presented together with the application of clustering algorithms. In [19] and [20], surveys dealing with the recent PON evolution and innovations can be obtained. Nevertheless, none of the above referenced papers deals with the optimization of splitting ratios and with the clustering of the ONT units based on their optimum number in order to obtain an optimized splitting ratio solution. The solution presented here is partially based on initial ideas already presented in [21], however, this paper is focused exclusively on the optimization of splitting ratio performed through a hierarchical agglomerative clustering technique and on proposing an innovative algorithm.

Proposed hierarchical agglomerative clustering algorithm
The main contribution of this paper is the designing of an innovative hierarchical agglomerative clustering algorithm with a modified complete linkage. These connections are updated throughout the running of an algorithm resulting in updating clusters of ONT units in order to optimize the splitting ratios of passive splitters in these clusters. A passive optical splitter is a key component of PONs, providing splitting of optical signals in downstream direction towards all its outputs, while it combines the optical signals together in the upstream direction. The internal structure of each passive splitter is composed of basic Y-junctions, while each Yjunction forms a passive splitter with a ratio 1:2. By cascading these junctions, a resulting splitting ratio can be achieved. Due to that, the splitting ratios of passive splitters are typically powers of 2, e.g., 1 : 2, 1 : 4, 1 : 8, 1 : 16, 1 : 32, 1 : 64, 1 : 128 and 1 : 256. Due to that, the optimum number of ONT units in each cluster should be close to the power of 2 if possible. The following section brings the description and proposal of an innovative algorithm technique. The basic hierarchical agglomerative clustering algorithm is modified and amended with a complete linkage functionality.

Description of existing hierarchical agglomerative clustering method
The hierarchical agglomerative clustering algorithm is a hierarchical method of a gradual merging of existing clusters until the desired number of clusters is reached. There are several methods of the process, which clusters should be merged, these methods are called linkage criterion. This criterion is used to determine the distance between two clusters. The following list contains some of the best-known criterions, the list is far from complete, as there are many existing types of linkage criterions: Complete linkage. It uses the maximum possible distance between two clusters, i.e., distance between two farthest points in two sets.
Single linkage. in contrast to the complete linkage, single linkage uses the smallest distance between two clusters.
Centroid. This linkage uses distances between two centres of the two clusters.
Average linkage. It uses the average distance between each point in one cluster to each point in the other cluster.
Another important parameter for hierarchical clustering is the metric. This means, which distance function will be used for evaluating the distance between points, most commonly used are Euclidean or Hamming. The algorithm for the hierarchical agglomerative clustering starts with all points being single clusters each. In each step, the clusters with the smallest distance according to the selected linkage and metric criterions are merged. This process continues until the initial limit of number of clusters is reached.

Modified linkage for hierarchical agglomerative clustering algorithm
This section introduces the main contribution of this paper. Our goal was to optimize the resulting numbers of ONT units in all clusters, so that the number of units would be close to the number of ports of passive optical splitter (splitting ratio). Since the splitting ratio is usually the power of 2, the optimal number of ONT units including a reserve in the form of one port for potential updates and topology modifications in each cluster should be as close as possible to the nearest higher power of 2. The whole algorithm is described in the following flowchart in Figure 1. The modified linkage for hierarchical agglomerative clustering algorithm in Figure 1 is recursive so that it is possible to have a feedback allowing for changing the parameters in the previous step. This includes mainly the parameters s and β, which will be described later.
The first step of the algorithm is the decision, whether the current number of clusters corresponds to the total number of points to merge. The algorithm starts with the number of clusters corresponding with the number of ONT units in the entire PON, the recursive function is called with the current number of clusters plus one. This process stops when the current number of clusters is the same as the number of ONU units to merge. When this number of clusters is reached, the main body of an algorithm begins. First, all possible cluster merges are sorted based on the distance between two farthest points in clusters to be merged (complete linkage). The list of these merges is further filtered, based on the metrics calculated through the following set of equations. In the first recursive step, the value β is set to 1 (meaning that the merging probability is 100 %). The threshold value τ is calculated in each recursive step as The filtering process returns a boolean value s, which holds the result of the filtering, if no value is left after the filtering, the s is false, otherwise, it is true. If the filtering was not efficient and s holds false, then the algorithm returns, decrements the value β by t and again checks the value of current clusters. This mechanism returns the algorithm to its beginning until β reaches the threshold value τ or s holds the false value. The decision of merging two clusters (filtering process) is constrained by the threshold γ calculated as γ = µ(τ + 1), where µ is the minimum distance between two points in the whole dataset. Constraining the number of elements in the list according to the threshold τ as well as according to the number of units in merged cluster so that they only contain allowed amount of ONT units plus the reserve based on the bounds. This lower bound (lb) and upper bound (ub) are calculated as ub = 2 n − 1, where n is the nearest greater power of 2 and the number of units in a cluster; and where β is an acceptance criterion from the input of the process and the part 2 n − 1 represents the greatest smaller power of two to the number of ONT units in the cluster. Finally, the result of an algorithm (number of clusters, number of ONT units in clusters, cluster distances and β) is then used to calculate the resulting vol.

no. /
Modification of Hierarchical Agglomerative Clustering  attenuation of the topology A, the summary length of optical fibres l f and the total trenching distance l t . Based on these values, the solutions obtained during the recursive runs of proposed algorithm can easily be compared and the best one identified. Once the solution with the desired values A, l f , l t is reached, or the best solution based on their comparison is achieved, the algorithm ends. Both the threshold value τ as well as parameter t influence the results of the algorithm. The parameter t represents a granularity of the algorithm steps and through numerous testing and calculations, its value was set to 0.05.

Results and discussion
The following section contains a discussion of the results and their comparison with the existing hierarchical agglomerative clustering algorithm. For the following comparison, the realistic data and maps from OpenStreetMaps.org were exported and two different examples were selected. The first one in Figure 2 illus-trates a typical situation in the historical city centre (urban area), while Figure 3 contains a typical scenario for a rural area with 3 villages. In both figures, the position of Optical Line Termination (OLT) is illustrated using a violet square, the positions of passive splitters are marked using black rounds, while ONT units are illustrated by red circles. All paths (roads, streets, walkways, etc.) are represented by blue lines, while the resulting optical topology (optical fibres, cables) by red lines. These topologies were obtained by previously presented methods and algorithms described in a previous article [21] with the implementation of the presented innovative hierarchical agglomerative clustering with a complete linkage. The PON topology designing presented in [21] is based on innovative techniques using the Dijkstra's algorithm, metrics and proposed hierarchical clustering method with the complete optimum linkage described in this paper.
In order to provide a comparison with existing methods for different scenarios and topologies, the proposed algorithm and methods were used for several situations including rural, urban and city areas and were compared with the existing methods [5,21,22]. Based on these results, a statistical processing and comparison was performed, and its results are presented within the following Table 1. Rural areas represent typical village regions, urban maps illustrate typical city parts containing block of flats and houses, while centre areas consist of historical city centres. The existing methods presented in [5,21,22] are based on existing agglomerative clustering techniques, while the proposed method presented within this paper is referred to as Proposed in Table 1. The summary length of optical fibres l f and the total trenching distance l t [m] are obtained directly as the results of the algorithm in Figure 1. The maximum attenuation A max [dB] is calculated as the maximum of attenuation A between the central OLT unit and the ONT units in the designed topology. Generally, the attenuation A in the PON can be calculated as where the sum A S [dB] represents the sum of the attenuation of all passive splitters located between the OLT unit and the selected ONT unit, the sum n pe A pe [dB] represents the sum of the attenuation of n pe passive elements each with an attenuation A pe (connectors, splices, passive filters, etc.), the sum αl total represents the summary attenuation of optical fibres with an attenuation factor α [dB/km] and length l total [km] and A res [dB] stands for the attenuation reserve, which covers the attenuation compensation of aging, temperature fluctuations, deformations of optical fibres, etc. The attenuation of a symmetrical splitter A S can be calculated according to where N is the number of output ports of a splitter (1 : N is its splitting ratio), A U [dB] represents the uniformity of one single Y-junction cascaded in the internal structure of a splitter and A C [dB] is the additional attenuation of connectors or splices used to connect the splitter in the PON topology. In all calculations, A U is considered as 0.3 dB. In (5), the attenuation factor α is considered as 0.4 dB/km meeting the ITU-T G.652D optical fibre type, A res as 1 dB and A pe in (5) and (6) as 0.4 dB, which represents an attenuation of one standard optical connector of a type SC/PC or SC/APC. Based on the results in Table 1, it is evident that the proposed algorithm provided the best solution in vol.

no. /
Modification of Hierarchical Agglomerative Clustering almost all scenarios. The only situation, in which the existing method [21] provided a better result, was one of the urban scenarios. Evidently, the proposed hierarchical agglomerative clustering method with modifications presented in § 3.2 performed the lowest summary fibre length, total trenching distance and the maximum attenuation in almost all the situations. Due to that, the CAPEX of this solution should be the lowest compared with all existing methods. Moreover, the proposed algorithm provided the best solution for all three different scenario types (city centre, urban, rural), therefore, it can be used for any of these situations.

Conclusion
This paper is focused on planning and designing an optimum network topology of passive optical networks by using advanced algorithms and techniques. The article presents an innovative hierarchical agglomerative clustering algorithm with a modified complete linkage ability. The algorithm is presented by a flowchart in § 3.2 (Figure 1), while § 4 contains its comparison with several existing methods and techniques. These results were obtained for various different scenarios and situations based on realistic map data obtained from OpenStreetMap.org and processed in Matlab environment. The performance of all methods and algorithms is compared through elementary output parameters and results, especially the maximum attenuation, summary fibre length and the total trenching distance. These parameters form the resulting capital expenditures (CAPEX) when building the network in practice. It is evident that the presented algorithm always provided the best solution, except for a single scenario. The presented algorithm provides the best results for all three typical types of scenarios in practice, the city centre area, the urban area and the rural area. The algorithm will be further evaluated and modified within the following research. For example, the comparison in Table 1 was performed for a fixed number of ONT units in a network, therefore, the following step is to implement the variable number of units in network topology.