Fault Tolerance

Fault tolerance

Fault tolerance is the ability of a system to continue operating properly in the event of component failure. It allows for graceful degradation, where the system can still operate at a reduced level, and can be achieved through anticipating exceptional conditions and duplication. Reversion to a safe mode may also be necessary if the consequences of a system failure are catastrophic.

1 courses cover this concept

15-440 Distributed Systems

Carnegie Mellon University

Fall 2020

A course offering both theoretical understanding and practical experience in distributed systems. Key themes include concurrency, scheduling, network communication, and security. Real-world protocols and paradigms like distributed filesystems, RPC, MapReduce are studied. Course utilizes C and Go programming languages.

No concepts data

+ 34 more concepts