Generic Platform for Failure Recovery in Survivable Trees

V. Dynda

Abstract


Failure recovery is a fundamental task of the dependable systems needed to achieve fault-tolerant communications, smooth operation of system components and a comfortable user interface. Tree topologies are fragile, yet they are quite popular structures in computer systems. The term survivable tree denotes the capability of the tree network to deliver messages even in the presence of failures. In this paper, we analyze the characteristics of large-scale overlay survivable trees and identify the requirements for general-purpose failure recovery mechanisms in such an environment. We outline a generic failure recovery platform for preplanned tree restoration which meets those requirements, and we focus primarily on its completeness and correctness properties. The platform is based on bypass rings and it uses a bypass routing algorithm to ensure completeness, and specialized leader election to guarantee correctness. The platform supports multiple, on-line and on-the-fly recovery, provides an optional level of fault-tolerance, protection selectivity and optimization capability. It is independent of the the protected tree type (regarding traffic direction, number of sources, etc.) and forms a basis for application-specific fragment reconnection. 

Keywords


fault tolerance; failure recovery; tree restoration; distributed algorithms

Full Text: PDF

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

ISSN 1210-2709 (Print)
ISSN 1805-2363 (Online)
Published by the Czech Technical University in Prague