







 |
SUMMARY OF SELECTED TUTORIALS
Tutorial 1:
|
Software Architecture for Dependable Systems
|
Authors: | Rogerio de Lemos (University of Kent at Canterbury, United Kingdom)
Paulo Guerra |
Summary: Architectural representations of systems have shown to be effective in
assisting the understanding of broader system concerns by abstracting
away from details of the system. The dependability of systems is known
as the reliance that can justifiably be placed on the service the system
delivers. Dependability has become an important aspect of computer
systems since everyday life increasingly depends on software. Although
there is a large body of research in dependability, architectural level
reasoning about dependability is only just emerging as an important
theme in software development. This is due to the fact that
dependability concerns are often left until too late in the process of
development. In addition, the complexity of emerging applications and
the trend of building trustworthy systems from existing, untrustworthy
components are urging dependability concerns be considered at the
architectural level. Hence the questions that the software architectures
and dependability communities are currently facing: what are the
architectural principles involved in building dependable systems? How
should these architectures be evaluated?
|
|
Tutorial 2:
|
Fault-tolerant Techniques for Concurrent Objects
|
Authors: | Rachid Guerraoui (EPFL, Switzerland)
Michel Raynal (IRISA, Université de Rennes, France) |
Summary: Devising wait-free resilient implementations of concurrent
objects from fault-prone base objects is a fundamental challenge of computer science.
Wait-free means that any process that invokes an operation eventually receives a
reply after executing a finite number of its own steps, even if other processes are
arbitrarily slow or even failed. Resilience means that the implementation of the
concurrent object behaves correctly despite the failure of up to t base objects
(t being a threshold parameter a priori defined). The tutorial surveys different
techniques to build wait-free resilient implementations of concurrent objects.
Three complementary classes of techniques are presented: (1) fault-tolerance
``by replication'', (2) fault-tolerance ``by diversity'', and (3) fault-tolerance
``by oracle'', respectively. The first is the well-known redundancy technique and
its applicability depends on the kinds of faults that the objects can suffer.
The second consists in combining the base objects with objects of other types
(type refers here to a programming language notion: the type has to be powerful
enough to allow implementing resilient objects). This technique basically relies
on the universality of consensus objects. The third technique relies on the information
we can obtain about the operational status of the processes.
The aim of the tutorial is to make people familiar with practical and theoretical
fault-tolerance techniques and concepts to build resilient concurrent objects.
To illustrate the techniques, the tutorial uses algorithms from the literature or
devises new algorithms. A simple framework to derive a family of consensus algorithms
tolerating process crash failures and asynchronous periods, will be presented.
This framework is based on two independent abstractions, Alpha and Omega, that cleanly
address orthogonal issues: Alpha is devoted to consensus safety, while Omega is
devoted to consensus liveness. Implementations of the Alpha abstraction in shared
memory, storage area network, message passing and active disk systems will be
presented, leading to directly derive consensus algorithms suited to these
communication media. (Interestingly, the algorithms derived from the framework
can be viewed as variants of the Paxos seminal consensus algorithm of Lamport.
In this sense, this part of the tutorial can be seen as guided visit to variants of
Paxos algorithms that have appeared recently in the literature.)
|
|
Tutorial 3:
|
Agreement Protocols in Environments with Temporal Uncertainties (in portuguese) |
Author: | Fabíola G.P. Greve (UFBA, Brazil) |
Summary: Agreement protocols are fundamental for the design of dependable
systems. They ensure consistent cooperation among distributed
entities, helping both to keep the continuity of services in spite
of failures and to enhance performance. Consensus is the greatest
common denominator among all agreement problems. It allows a set
of processes to agree on a common output value. Theoretical
advances have been reached, thanks to the consensus problem
solutions through the use of unreliable failure detectors, which
have been proved to be essential in solving many other agreement
problems in environments with temporal uncertainties. Such
advances have been exploited in order to (i) find efficient
solutions to agreement problems, (ii) identify minimal synchronous
conditions for their solution and (iii) characterize more
precisely their behavior (blocking or progression) in presence of
network disturbs. From a software engineering view point,
consensus-based protocols give rise to simple and modular
solutions. Basic components (consensus, reliable broadcast,
atomic broadcast, failure detector, etc.) are identified in order to
construct richer ones (group membership, view synchrony, atomic
commit, etc.). These components are in turn the fundamental pieces of
middleware for reliable distributed programming.
This tutorial presents a survey of the latest advances in solving
agreement in environments with temporal uncertainties. Firstly,
recent theoretical results regarding the solutions of agreement
problems as well as their algorithms are presented. Afterwards, it
is shown how these algorithms are combined to build services for
fault-tolerant middleware. These are group and replication
management systems. Finally, through an example of task allocation
in a computational grid, it is shown how these protocols and
middleware could be used in both the design and the implementation
of dependable applications.
|
|
|
 |