COMPARISON OF FETI-BASED DOMAIN DECOMPOSITION METHODS FOR TOPOLOGY OPTIMIZATION PROBLEMS

. We critically assess the performance of several variants of dual and dual-primal domain decomposition strategies in problems with ﬁxed subdomain partitioning and high heterogeneity in stiﬀness coeﬃcients typically arising in topology optimization of modular structures. Our study considers Total FETI and FETI Dual-Primal methods along with three enhancements: k -scaling, full orthogonalization of the search directions, and considering multiple search-direction at once, which gives us twelve variants in total. We test these variants both on academic examples and snapshots of topology optimization iterations. Based on the results, we conclude that (i) the original methods exhibit very slow convergence in the presence of severe heterogeneity in stiﬀness coeﬃcients, which makes them practically useless, (ii) the full orthogonalization enhancement helps only for mild heterogeneity, and (iii) the only robust method is FETI Dual-Primal with multiple search direction and k -scaling.


Introduction
Topology optimization has became an indispensable tool in the design process, allowing for optimal distribution of a material within a provided space with respect to given criteria (such as minimization of compliance, maximization of output response of mechanisms, or increasing the lowest eigenfrequency) under given constraints (typically limiting the available material volume).In Solid Isotropic Material with Penalization (SIMP) method [1]-one of the most common approaches to topology optimization-the material distribution is parameterized with a scalar relative density field ρ(x), with 0 ≤ ρ ≤ 1, which consequently governs the distribution of stiffness parameters in a domain such that where p is a given penalization coefficient (usually p ≥ 3), helping the optimization procedure to achieve the desired "0-1" design without regions of intermediate densities, and E 0 and E min are stiffness tensors of the solid material and voids, respectively.Note that while the stiffness tensor of the bulk material is physical, void stiffness serves only numerical purposes of avoiding an indefinite Hessian matrix.On one hand, it should be small enough in order to model voids properly, on the other hand too small values lead to ill-posed state problem.The optimal design is then usually sought-for in an iterative manner, alternately solving the state equations with a fixed densities and updating design density variables according to the sensitivities computed for a current design.
Being a matured technology, topology optimization is now widely used in industrial applications with millions of unknowns, and first applications reaching billions of unknowns are emerging [2].Consequently, such scales of optimization problems often prohibit the use of direct solvers in favour of iterative ones, which are further combined with parallelisation.Domain decomposition (DD) techniques thus seem promising candidates; however, the presence of high contrast in coefficients due to the stiffness parameterization, recall Eq. ( 1), may cause slower convergence and consequently poor performance of iterative solvers based on DD, as has been already shown in [3].
To make the task even more challenging, in this contribution, we focus in particular on modular topology optimization problems, recently introduced in [4,5], in which the main domain is composed of a limited number of square modules.The multiple occurrence of individual modules can thus be readily exploited to accelerate calculations; on the other hand, the modularity automatically introduces fixed, regular partitioning, which-as will be shown later-is detrimental for the performance of DD-based solvers.Admittedly, the modular grid constitutes only the coarsest partitioning and more refined divisions of each module following e.g. the distribution of a material with the module can be performed, but we did not follow this possibility in our current study.
After recalling the fundamentals of two Finite Element Tearing and Interconnecting methods in Section 2, we review several modifications aimed at improving the performance and robustness of the solu-PREPRINT tion strategy; see Section 3. Finally, in Section 4, the performance of the original methods and their modifications is assessed on two test sets: three academic problems from the literature and three geometries arising from three distinct snapshots of modular topology optimization of the emblematic Messerschmitt-Bölkow-Blohm beam.

Dual and dual-primal domain decomposition methods
Characteristically for domain decomposition methods, we assume that the original domain Ω is partitioned into N s mutually disjoint subdomains Ω s with s = 1 . . .N s .The solution to the system of linear equations which arises from e.g.numerically discretized partial differential equations valid in Ω, is then sought for via a series of subdomain-wise problems with an additional constraint ensuring continuity of the solution {u s } s=1...Ns across subdomain boundaries.Two major branches of domain decomposition methods can be distinguished based on the manner in which the aforementioned continuity is enforced: primal decomposition methods, such as the Schur Complement method [6], satisfy the continuity by construction of an approximation space, while the dual methods, e.g.Finite Element Tearing and Interconnecting (FETI) method [7], impose the continuity requirement as an equality constraint enforced via Lagrange multipliers.Finally, as its name suggests, the Finite Element Tearing and Interconnecting Dual-Primal (FETI-DP) method [8] adopted in our work combines both approaches.
In the following two subsections, we briefly summarize the essentials of both dual and dual-primal methods.

Total-FETI
The Total FETI method (T-FETI), introduced by Dostál and co-workers [9] and adopted here as a representative of the dual domain decomposition methods, is closely related to the original FETI method [7] with the main difference being the way the Dirichlet boundary conditions (BC) are imposed.In T-FETI, these BC are handled similarly to enforcing the continuity across subdomain boundaries, i.e. the constraints posed on the domain-wise displacements u s read as where u collects the domain-wise unknowns such that u = u 1 T , u 2 T , . . ., u Ns T T (5) and B combines the individual B s 's accordingly.Thus, in contrast to the standard FETI, constraint (4) features a right-hand side vector c that is in general nontrivial; in addition to the zero entries pertinent to the cross-boundary continuity, it also contains values related to the Dirichlet BC.As a result, the number of rigid body modes of each domain is only problem specific (e.g. three for two-dimensional mechanics problems) and factorization can be recycled for all occurrences of one module irrespective of whether the occurrence is supported or floating.Combining Eqs. ( 3) and (4) leads to where K is a block-diagonal matrix composed of K s and f arises from f s concatenated similarly to u in Eq. ( 5).
Assuming the λ is known and provided that f − B T λ is orthogonal to the nullspace of K, u can be uniquely determined up to rigid body modes of individual domains expressed as R α, where R is a matrix containing the nullspace vectors of each domain and α is a vector of rigid body modes coefficients.Without diving into technical details, for which we refer an interested reader to e.g.[10], the original formulation can be recast into the dual form with where K † denotes the Moore-Penrose pseudo-inverse of K. Problem ( 7) is traditionally solved with a projected preconditioned conjugate gradient method.

FETI-DP
In Finite Element Tearing and Interconnecting Dual-Primal (FETI-DP) method [8], the domain-wise displacement field u s is decomposed into two parts: (1.) boundary degrees of freedom u s c whose continuity is directly enforced in the primal unknowns, and (2.) remaining degrees of freedom u s r that contain both internal degrees of freedom, which do not directly communicate with other domains, and boundary degrees of freedom whose continuity is enforced via equality constraints, similarly to Eq. ( 4).
Without loss of generality, we assume that u s is ordered such that Furthermore, we consider a global vector u c that stores the shared boundary degrees of freedom (DOFs) and a Boolean matrix B s c such that PREPRINT vol.no./ FETI methods for topology optimization problems The equality constraint ensuring continuity of the remaining u s b then reads where-analogously to Eqs. ( 4) and ( 5)-u r and B r are concatenations of u s r and B s r , respectively.Expressing the equilibrium (3) and the constraint (10) in terms of the quantities introduced in Eqs. ( 8)-( 10) leads to with Finally, due to the domain-wise nature of the second row in Eq. ( 11), u s r can be computed in parallel from the known u c and λ; hence, plugging the expression for u s r in terms of the remaining quantities into the first and the third rows of Eq. ( 11) leads to the final dual-primal form where Note that DOFs in u c are chosen such that there is no need for a pseudo-inverse of K rr in contrast to T-FETI.
In FETI-DP, DOFs related to corner nodes in a regular subdomain partitioning typically constitute u c .Problem ( 12) is traditionally solved iteratively with a preconditioned conjugate gradient method with u c being condensed out, resulting in a problem dependent solely on λ [8].

k-scaling
The merit of both formulations described above is that they can be conveniently parallelized thanks to their additive structure.In both, the solution is sought with an iterative conjugate gradient method considering only the first block of the corresponding equations ( 7) and ( 12), while the remaining blocks are incorporated either via projection (T-FETI) or by condensating out the primal unknowns (FETI-DP).Distinctively for domain decomposition methods, the iterative procedure also benefits from a coarse problem preconditioner F−1 of either F, recall Eq. ( 7), or F rr , Eq. ( 12), constructed as a sum of local inverses where Ss either Based on a mechanical reasoning, Rixen and Farhat [12] proposed k-scaling, which compensates for different stiffness coefficients across subdomain boundaries.In their approach, Bs = W s B s with the entries of the diagonal matrix W s obtained as where K s ii denotes the component of the stiffness matrix pertinent to the ith row of B s and, similarly, K r ii is the corresponding coefficient of the stiffness matrix of the adjacent Ω r .In the case of multi-domain constraints (which typically happen for corner nodes in T-FETI), the authors of [12] advocate for the use of pair-wise constraints despite the introduced redundancies.Consequently, only the denominator in Eq. ( 15) changes such that it sums the stiffness contributions from all related domains; see [12,Eq. 68].Also note that Eq. (15) naturally falls back to multiplicity scaling in the case of domains composed of the same homogeneous material.

Full orthogonalization
For practical applications, F-orthogonalization of the current (projected) preconditioned search direction PREPRINT w, which appears in the classical Preconditioned Conjugate Gradient method used to solve problems (7) or (12), with respect to the search directions w from the previous iterations is often recommended, e.g.[11].Consequently, this modification necessitates storing the previous directions and their F-norms.However, the faster convergence caused by this modification usually compensate for both the increased memory requirements and the orthogonalization-related calculations.

Rank-Revealing Simultaneous FETI
Finally, the last enhancement investigated in this work, is the simultaneous variant of both T-FETI and FETI-DP [11].This modification was particularly developed to handle heterogeneous problems by exploiting the additive structure of the local preconditioners.Instead of considering only one preconditioned search direction w in each iteration, it defines an individual search direction for each subdomain.Note that the original search direction can be restored with w = W 1, where 1 = 1, 1, . . . 1 T ∈ R Ns and W collecting these individual directions as its columns.
As a result, each iteration i requires solving a system of N s × N s equations related to minimization and conjugation steps; however, the matrix ∆ i as defined in Algorithm 1 may be ill-conditioned, e.g.due to a linear dependence of the search directions, and its pseudo-inverse might need to be constructed.Here, we employ the rank revealing Cholesky factorization [13] with permutation matrix N to extract only the linearly independent search directions, which are subsequently F-orthonormalized; see lines 9-12 in Algorithm 1, which summarizes the strategy for T-FETI.
With this modification, we also automatically employ the full orthogonalization described in the previous subsection; lines 18-21 in Algorithm 1.Note that in Algorithm 1, P is the projection matrix P = I − G T GG T −1 G enforcing the increments of λ to satisfy G λ = e.

Academic problems
We test both approaches on two sets of illustrative examples.The first suite comprises three test problems studied, e.g., in [11] and shown in Fig. 1.Unlike Fig. 1, however, we keep the regular partitioning and do not investigate the effect of irregularity and bad aspect ratios of the decomposition.We included these problems primarily to verify our implementation against the convergence reported in [11] and also to compare the performance of all considered variants in these academic problems against the problems arising in topology optimization.
The first test is a laminated beam consisting of nine repeating square subdomains, each composed of seven alternating layers of stiff (blue) and compliant Algorithm 1: Rank-Revealing Simultaneous FETI [11] 1 r 0 = P T (d − Fλ 0 ) 2 Z 0 = . . ., Bs Ss ( Bs ) T r 0 , . . ., ∀s ∈ {1, . . ., N s } (green) material.The second test problem is a square grid of 3 × 3 subdomains, again, with alternating layers.This problem features the corner cross-point and hence different variants of enforcing the continuity of solution is expect to deliver different performance results.Finally, the last test problem is a 4 × 4 grid of subdomains with a stiff inclusion immersed in each subdomain.Unlike the previous two problem, there are no jumps in stiffness coefficients and hence the convergence should be the fastest.All structures were clamped on the left-hand side and subjected to tangential and tensile normal traction on their righthand sides.
The two solution strategies (T-FETI vs. FETI-DP) in combination with the scaling options (stiffness vs. multiplicity) and the option for full orthogonalization or simultaneous search directions yield 12 variants in total.Figure 2 plots the history of the residual error defined as where the r i and z i are the projected residual and its preconditioned counterpart from Algorithm 1, respectively.
In accordance with expectations, the third problem with stiff square inclusions was indeed the easiest to solve and the performance of all variants did not vary significantly except for the most advanced si-PREPRINT vol.
no. / FETI methods for topology optimization problems multaneous variants of both T-FETI and FETI-DP which converged in fewer than half of the iterations needed by the other variants.On the other hand, once the heterogeneity in stiffness coefficients reached subdomain boundaries (the first and the second test problems), the performance of the unmodified variants deteriorated rapidly.However, the importance of k-scaling (solid lines) in contrast to the simple multiplicity scaling (dotted lines) arises only in the second example due to the presence of the cross-points; in the third problem for instance, the k-scaling falls back to multiplicity scaling as mentioned in Section 3.1.

Topology optimization problems
The second test suite comprises several snapshots from the modular topology optimization [4] of a beam fixed at both bottom corners and loaded by a unit force acting at the midspan of the top edge; see Fig. 3.The beam is composed of 96 square modules of 16 types (depicted with distinct colours in Fig. 3).The modular composition was directly reflected in the beam's partitioning into 96 subdomains.Each subdomain was discretized with 30 × 30 bilinear quadrilateral finite elements, which resulted in 184,512 DOFs in total and over 10,000 DOFs in the interface problem of both T-FETI and FETI-DP.
In particular, we tested the described solution strategies on the 4 th , 8 th , and 30 th iteration snapshots of the topology optimization.Distribution of the base material, parameterized by the relative density ρ (recall Eq. ( 1)), in these snapshots is shown in Fig. 4. As the topology optimization progresses, the initially uniform density gradually concentrates to the most efficient locations, which leads to a highly heterogeneous problem with significant jumps in stiffness coefficients along subdomain boundaries, making the problem difficult to solve with iterative methods.
Similarly to Section 4.1, Figure 5 summarizes the convergence history of all 12 variants for the three topology optimization snapshots from Fig. 4. The later stages of the topology optimization in particular  render challenging problems for T-FETI and FETI-DP solvers; while in the first snapshot, all methods except for T-FETI with full orthogonalization successfully converged, in the last snapshot, only the simultaneous FETI-DP with rank-revealing factorization succeeded.

Discussion
The numerical tests clearly demonstrate that the linear systems of equations arising in modular topology optimization are indeed challenging for domaindecomposition based iterative methods.The main culprit is the potentially unfavorable partitioning into T. Medřický, M. Doškář, I. Pultarová, J. Zeman  subdomains, which leads to coefficients jumps across subdomain boundaries, and most importantly the high contrast in stiffness coefficients.These effects are partly present in the academic test problems, however, they are significantly more pronounced in the topology optimization problems up to the point at which most of the considered solution variants break or do not converge in a reasonable time.
The only variant successful in all tests was the simultaneous FETI-DP with rank-revealing Cholesky decomposition.With incorporated k-scaling, this variant was also the fastest to converge, even though the convergence rate of T-FETI with rank-revealing was usually comparable when it converged.
On the other side of the convergence spectrum were the original variants of T-FETI and FETI-DP.However, while featuring the slowest convergence, those methods proved to be relatively robust, unlike the modifications with full orthogonalization, which reduced the needed iterations only for mildly heterogeneous problems and broke completely for the final test problem.We conjecture that the rapid increase in the monitored residual in Fig. 5c is caused by the loss of numerical accuracy due to round-off errors during the orthonormalization.

Summary
In this work, we investigated twelve variants of dual and dual-primal domain decomposition methods with particular emphasis on applications to highly heterogeneous problems with predefined partitioning which arise in modular topology optimization problems [4,5].We considered Total-FETI [9] and FETI-DP [8] as the base methods, for which we tested two scaling PREPRINT vol.
no. / FETI methods for topology optimization problems approaches (k-scaling and multiplicity scaling), full orthogonalization, and simultaneous search directions with rank-revealing decomposition.We tested the variants on two suites of two-dimensional elasticity problems: first one composed of three academic problems, the second suite comprising snapshots of modular topology optimization.All variants were compared in terms of the number of iterations needed to reach a desired residual .We are aware that such a comparison does not paint the whole picture and comparing the total wall-clock time would be optimal; however, such a comparison would be heavily biased by the implementation and available hardware.Based on our results, we conclude that the k-scaling should be a default setting when dealing with nonhomogeneous problems thanks to its significant impact on the convergence rate and negligible computational overhead.Our results also show that the original formulations of both T-FETI and FETI-DP exhibit very slow convergence in the presence of severe heterogeneity in stiffness coefficients and hence performance enhancements are needed.The simultaneous search directions with the rank-revealing Cholesky decomposition reduced the number of needed iterations the most and should be therefore preferred to full orthogonalization alone.Finally, FETI-DP proved to be more robust than T-FETI as it was the only method capable of reaching the desired residual in all test problems.
Following these conclusions, our current research interest lies in (i) an adaptive choice of the nodes whose compatibility should be enforced directly in primal variables (i.e., not necessarily choosing only the corner nodes for the primal part of FETI-DP) and (ii) identification of boundary modes responsible for the bad convergence and eliminating from the iterative process in the spirit of [14].

Figure 1 .
Figure 1.Three academic problems taken from [11]: a) the laminated beam, b) the 3 × 3 grid of periodic, layered subdomains, and, c) the 4 × 4 grid of peridodic subdomains with immersed stiff square inclusion.

Figure 2 .
Figure 2. Convergence history of 12 variants of the solution strategies for the three academic problems shown in Fig. 1: a) a laminated beam, b) a 3 × 3 grid of laminated subdomains, and, c) a 4 × 4 grid of subdomains with stiff square inclusions.FO denotes full orthogonalization enhancement, RRS stands for the simultaneous variant with rank-revealing decomposition.In all graphs, results with the k-scaling enhancement are plotted in solid lines while the dotted lines represent the multiplicity scaling.The dashdotted line mark the desired residual threshold from Algorithm 1.

Figure 3 .
Figure 3.An illustration of the beam problem arising in modular topology optimization.The distinct colours represent 16 different types of 96 modules, which constitute the beam's subdomains Ω s .

Figure 4 .
Figure 4. Three snapshots of modular topology optimization constituting the second suite of test problems: a) 4 th iteration, b) 8 th iteration, and c) 30 th iteration.

Figure 5 .
Figure 5. Convergence history of 12 variants of the solution strategies for the three snapshots of modular topology optimization from Fig. 4: a) 4 th , b) 8 th , and c) 30 th iteration of the topology optimization.Similarly to Fig. 2, FO denotes full orthogonalization enhancement, RRS stands for the simultaneous variant with rank-revealing decomposition.In all graphs, results with the k-scaling enhancement are plotted in solid lines while the dotted lines represent the multiplicity scaling.The dash-dotted line mark the desired residual threshold from Algorithm 1.
[11]optimal Dirichlet preconditioner is used, or• Ss = K bb or Ss = diag K bb if a computationallycheaper approximations of S s is used, denoted as a lumped or super-lumped preconditioner, respectively,where diag • extracts only the diagonal part of •.Note that in FETI-DP the unknown index sets i and b are subsets of r only, since the unknowns in set c are handled directly.In the simplest setting, one can assume Bs = B s ; however, Bs can be arbitrarily scaled provided that[11]