Designing Resilient Distributed Systems with TypeScript: Handling System-Level Failures
Learn how to design distributed systems that recover gracefully from system-level failures using TypeScript. This beginner-friendly guide covers error handling best practices and patterns to build resilient distributed applications.
Distributed systems consist of multiple independent computers (nodes) working together. Because nodes communicate over networks and run independently, system-level failures like network glitches, hardware crashes, or service unavailability can happen anytime. To build resilient distributed systems in TypeScript, it's crucial to anticipate these failures and handle them gracefully.
In this article, we'll cover beginner-friendly error handling techniques in TypeScript that help distributed systems keep working even when some parts fail.
### 1. Use Try-Catch for Safe Error Handling
When calling distributed services or making network requests, your code should catch errors to avoid crashing the whole program. TypeScript’s `try-catch` helps safely handle failures.
async function fetchData(url: string): Promise<string> {
try {
const response = await fetch(url);
if (!response.ok) {
throw new Error(`Network error: ${response.status}`);
}
return await response.text();
} catch (error) {
console.error('Failed to fetch data:', error);
throw error; // rethrow or handle gracefully
}
}### 2. Implement Retries with Exponential Backoff
Transient failures happen often in distributed systems. To handle temporary issues, try retrying requests with increasing delays (exponential backoff) before giving up.
async function retryFetch(url: string, retries = 3, delay = 500): Promise<string> {
try {
return await fetchData(url);
} catch (error) {
if (retries === 0) throw error;
console.log(`Retrying in ${delay}ms... (${retries} left)`);
await new Promise(res => setTimeout(res, delay));
return retryFetch(url, retries - 1, delay * 2);
}
}### 3. Use Circuit Breaker Pattern
If a service is consistently failing, retrying may waste resources. Circuit breakers stop requests after multiple failures and open a 'cooling' period to allow recovery.
class CircuitBreaker {
private failures = 0;
private threshold: number;
private cooldown: number;
private nextAttempt: number = 0;
constructor(threshold: number, cooldown: number) {
this.threshold = threshold;
this.cooldown = cooldown;
}
canRequest(): boolean {
if (Date.now() > this.nextAttempt) {
return true;
}
return false;
}
success() {
this.failures = 0;
}
failure() {
this.failures++;
if (this.failures >= this.threshold) {
this.nextAttempt = Date.now() + this.cooldown;
}
}
}
async function protectedFetch(url: string, cb: CircuitBreaker): Promise<string> {
if (!cb.canRequest()) {
throw new Error('Circuit breaker is open. Skipping request');
}
try {
const result = await fetchData(url);
cb.success();
return result;
} catch (error) {
cb.failure();
throw error;
}
}### 4. Graceful Degradation and Fallbacks
Sometimes the best way to handle failure is to provide cached or default data instead of failing completely.
async function fetchWithFallback(url: string, fallbackData: string): Promise<string> {
try {
return await retryFetch(url);
} catch {
console.warn('Returning fallback data');
return fallbackData;
}
}### Conclusion
Handling system-level failures in distributed systems is crucial for reliability. In TypeScript, use try-catch blocks, retries with exponential backoff, circuit breakers, and fallback strategies to keep your system resilient. These patterns help your system continue working smoothly, even in the face of unpredictable failures.