LADC 2005

SUMMARY OF SELECTED TUTORIALS

Tutorial 1:

Software Architecture for Dependable Systems

Authors: Rogerio de Lemos (University of Kent at Canterbury, United Kingdom)
Paulo Guerra

Summary: Architectural representations of systems have shown to be effective in assisting the understanding of broader system concerns by abstracting away from details of the system. The dependability of systems is known as the reliance that can justifiably be placed on the service the system delivers. Dependability has become an important aspect of computer systems since everyday life increasingly depends on software. Although there is a large body of research in dependability, architectural level reasoning about dependability is only just emerging as an important theme in software development. This is due to the fact that dependability concerns are often left until too late in the process of development. In addition, the complexity of emerging applications and the trend of building trustworthy systems from existing, untrustworthy components are urging dependability concerns be considered at the architectural level. Hence the questions that the software architectures and dependability communities are currently facing: what are the architectural principles involved in building dependable systems? How should these architectures be evaluated?

Tutorial 2:
Fault-tolerant Techniques for Concurrent Objects

Authors: Rachid Guerraoui (EPFL, Switzerland)
Michel Raynal (IRISA, Université de Rennes, France)

Summary: Devising wait-free resilient implementations of concurrent objects from fault-prone base objects is a fundamental challenge of computer science. Wait-free means that any process that invokes an operation eventually receives a reply after executing a finite number of its own steps, even if other processes are arbitrarily slow or even failed. Resilience means that the implementation of the concurrent object behaves correctly despite the failure of up to t base objects (t being a threshold parameter a priori defined). The tutorial surveys different techniques to build wait-free resilient implementations of concurrent objects. Three complementary classes of techniques are presented: (1) fault-tolerance ``by replication'', (2) fault-tolerance ``by diversity'', and (3) fault-tolerance ``by oracle'', respectively. The first is the well-known redundancy technique and its applicability depends on the kinds of faults that the objects can suffer. The second consists in combining the base objects with objects of other types (type refers here to a programming language notion: the type has to be powerful enough to allow implementing resilient objects). This technique basically relies on the universality of consensus objects. The third technique relies on the information we can obtain about the operational status of the processes.
The aim of the tutorial is to make people familiar with practical and theoretical fault-tolerance techniques and concepts to build resilient concurrent objects. To illustrate the techniques, the tutorial uses algorithms from the literature or devises new algorithms. A simple framework to derive a family of consensus algorithms tolerating process crash failures and asynchronous periods, will be presented. This framework is based on two independent abstractions, Alpha and Omega, that cleanly address orthogonal issues: Alpha is devoted to consensus safety, while Omega is devoted to consensus liveness. Implementations of the Alpha abstraction in shared memory, storage area network, message passing and active disk systems will be presented, leading to directly derive consensus algorithms suited to these communication media. (Interestingly, the algorithms derived from the framework can be viewed as variants of the Paxos seminal consensus algorithm of Lamport. In this sense, this part of the tutorial can be seen as guided visit to variants of Paxos algorithms that have appeared recently in the literature.)

Tutorial 3:
Agreement Protocols in Environments with Temporal Uncertainties (in portuguese)

Author: Fabíola G.P. Greve (UFBA, Brazil)

Summary: Agreement protocols are fundamental for the design of dependable systems. They ensure consistent cooperation among distributed entities, helping both to keep the continuity of services in spite of failures and to enhance performance. Consensus is the greatest common denominator among all agreement problems. It allows a set of processes to agree on a common output value. Theoretical advances have been reached, thanks to the consensus problem solutions through the use of unreliable failure detectors, which have been proved to be essential in solving many other agreement problems in environments with temporal uncertainties. Such advances have been exploited in order to (i) find efficient solutions to agreement problems, (ii) identify minimal synchronous conditions for their solution and (iii) characterize more precisely their behavior (blocking or progression) in presence of network disturbs. From a software engineering view point, consensus-based protocols give rise to simple and modular solutions. Basic components (consensus, reliable broadcast, atomic broadcast, failure detector, etc.) are identified in order to construct richer ones (group membership, view synchrony, atomic commit, etc.). These components are in turn the fundamental pieces of middleware for reliable distributed programming.
This tutorial presents a survey of the latest advances in solving agreement in environments with temporal uncertainties. Firstly, recent theoretical results regarding the solutions of agreement problems as well as their algorithms are presented. Afterwards, it is shown how these algorithms are combined to build services for fault-tolerant middleware. These are group and replication management systems. Finally, through an example of task allocation in a computational grid, it is shown how these protocols and middleware could be used in both the design and the implementation of dependable applications.

Contact address:
LaSiD/DCC - Distributed Systems Laboratory / Department of Computer Science
UFBA - Federal University of Bahia
Av. Adhemar de Barros, s/n - Campus de Ondina, Prédio do CPD
Salvador, Bahia, Brazil - CEP 40170-110
Tel.: +55 71 3263 6142 Fax: +55 71 3263 6145 Email: ladc2005@ufba.br