Tolerância a Falhas em Sistemas Distribuídos

Programa de Pós-Graduação em Informática - PPGI


Tópicos:

    1. Fundamentos de Sistemas Distribuídos

    2. Fundamentos de Tolerância a Falhas

    3. Comunicação confiável

    4. Gerência de Replicação

    5. Acordo distribuído

    6. Recuperação

    7. Sistemas e Plataformas Tolerantes a Falhas

Horário de aula:
         Quinta-feira 16:30-20:30h
Bibliografia recomendada:
  1. Artigos dos seguintes congressos e periódicos:
    DSN, LADC, SRDS, SBRC, DISC
    IEEE Computer, IEEE Trans. on Distributed System, IEEE Trans. on Dependable Computing
  2. Pankaj Jalot
    Fault Tolerance in Distributed Systems
    Prentice Hall, 1994
  3. Kennet P. Birman
    Building Secure and Reliable Network Applications
    Manning Publications, 1996

Material de apoio:

Slides

    Apresentação
    Fundamentos
    Slides artigo Schneider

Introdução ao curso de tolerância a falhas em sistemas distribuídos

  1. A. Avizienis, J.-C. Laprie and B. Randell, Fundamental Concepts of Dependability. Research Report N01-145, LAAS-CNRS, April 2001. (Citeseer | local) / 6 pages. (Original Site | local) / 6 pages.
    Comments: This paper is a review; read it to brush up on terminology and re-orient yourself to the big picture.
  2. Kishor S. Trivedi, Dong Seong Kim, Arpan Roy, Deep Medhi. Dependability and Security Models. Keynote Paper on 7th International Workshop on the Design of Reliable Communication Networks (DRCN 2009), Washington, DC, October 2009. (Original Site | local)

Serviços Tolerantes a Falhas

  1. Fred B. Schneider, Implementing Fault-Tolerant Services using the State Machine Approach: A Tutorial. Ney York: Cornell University, Department of Computer Science, TR 86-800, Nov 1986 (revised on Jan 1990). (local) / 32 pages.
    Comments: This paper is good to understand how we can build fault tolerance in distributed systems.
  2. Rachid Guerraoui and André Schiper, Software-Based Replication for Fault Tolerance. Los Alamos: IEEE, IEEE Computer, April 1997. (Original Site | local) / 07 pages.
    Comments: This paper presents in a short and direct way what means to plan fault tolerance in distributed systems.
  3. Michel Raynal and Mukesh Singhal, Mastering Agreement Problems in Distributed Systems. Los Alamos: IEEE, IEEE Software, July/August, 18(4), 2001, p. 40-47. (Original Site | local).
    Comments: This paper discusses the agreement problems in a short and direct text. In few pages we can look at the main important problems in this area.
  4. R. Bharath; M. Dumas; M. E. Kurul, Adaptive Fault Tolerance in Distributed Systems. Mar, 2001. (local) / 10 pages.
    Comments: This paper made a good review of it and may be used as seminal paper to start some study in this direction.

Back