Handling Distributed System Failures Gracefully with TypeScript

Learn how to handle distributed system failures gracefully using TypeScript, focusing on error handling techniques and retries to build resilient applications.

Distributed systems involve multiple services or components working together, often across different servers or networks. Because of this complexity, failures such as network issues, timeouts, or service crashes can happen. As a developer, it's important to handle these failures gracefully to build resilient and user-friendly applications. In this article, we will explore some beginner-friendly techniques to handle errors in distributed systems using TypeScript.

First, let's understand a common pattern: retries with exponential backoff. When a request to a service fails, instead of immediately giving up, you can retry the request a few times with increasing delays. This helps reduce the load on the system and increases the chance of success.

Here’s a simple example of how to implement retries with exponential backoff in TypeScript:

typescript
async function fetchWithRetry(url: string, retries = 3, delay = 500): Promise<string> {
  try {
    const response = await fetch(url);
    if (!response.ok) {
      throw new Error(`HTTP error! status: ${response.status}`);
    }
    return await response.text();
  } catch (error) {
    if (retries > 0) {
      console.log(`Retrying... attempts left: ${retries}`);
      await new Promise(res => setTimeout(res, delay));
      return fetchWithRetry(url, retries - 1, delay * 2); // Exponential backoff
    } else {
      throw new Error(`Failed after retries: ${error}`);
    }
  }
}

In this function, we try to fetch data from a URL. If the request fails or returns a bad status, we catch the error and retry up to a specified number of times. The delay between retries doubles every attempt, which helps prevent overwhelming the server.

Next, it’s helpful to create custom error classes to better handle different failure scenarios. This makes it easier to recognize and respond to specific errors in your system.

typescript
class ServiceUnavailableError extends Error {
  constructor(message: string) {
    super(message);
    this.name = 'ServiceUnavailableError';
  }
}

async function callService(): Promise<void> {
  // Simulated failure
  throw new ServiceUnavailableError('Service is down');
}

async function main() {
  try {
    await callService();
  } catch (error) {
    if (error instanceof ServiceUnavailableError) {
      console.error('Please try again later:', error.message);
    } else {
      console.error('An unexpected error occurred:', error);
    }
  }
}

main();

This way, you can distinguish between temporary service failures and unexpected errors, allowing your application to respond accordingly, such as showing user-friendly messages or triggering fallback logic.

Finally, always make sure to log errors with enough information to debug later, but avoid exposing sensitive data. Monitoring and alerting based on these logs can help detect system issues early.

In conclusion, handling failures in distributed systems using TypeScript involves: implementing retries with exponential backoff, defining custom error types, and thoughtful error logging. Applying these practices will make your applications more robust and reliable.