Designing Resilient TypeScript Systems: Handling Edge Cases in Distributed Architectures

Learn how to design resilient TypeScript applications by effectively handling edge cases and errors in distributed systems, ensuring your apps stay robust and reliable.

Distributed architectures, such as microservices or serverless systems, offer scalability and flexibility but also introduce challenges like network failures, data inconsistencies, and race conditions. When building these types of systems with TypeScript, it’s crucial to anticipate and handle edge cases to create resilient applications.

This article will introduce beginners to common edge cases in distributed systems and demonstrate practical TypeScript patterns and error-handling techniques to mitigate these risks.

### Common Edge Cases in Distributed Systems

1. **Network Failures:** Requests can timeout or fail due to unreliable network connectivity.

2. **Partial Failures:** Some services succeed, others fail, leading to inconsistent states.

3. **Race Conditions:** Simultaneous operations result in unexpected data overwrites or corruption.

4. **Duplicate Messages:** Retries or network issues can cause duplicated requests or events.

### Practical Error Handling in TypeScript

The following example demonstrates a simple retry mechanism with exponential backoff to handle network errors when calling an external service.

typescript
async function fetchWithRetry(url: string, retries = 3, delay = 500): Promise<string> {
  try {
    const response = await fetch(url);
    if (!response.ok) throw new Error(`HTTP error! status: ${response.status}`);
    return await response.text();
  } catch (error) {
    if (retries > 0) {
      console.warn(`Request failed. Retrying in ${delay}ms...`);
      await new Promise(res => setTimeout(res, delay));
      return fetchWithRetry(url, retries - 1, delay * 2);
    } else {
      throw error;
    }
  }
}

In this function, we recursively retry fetching data, doubling the wait time after each failure. This helps minimize overwhelming a struggling service.

### Handling Partial Failures

When performing multiple dependent operations across services, use try-catch blocks and compensating actions to maintain system consistency.

typescript
async function processOrder(orderId: string) {
  try {
    await reserveInventory(orderId);
    await chargePayment(orderId);
    await confirmOrder(orderId);
  } catch (error) {
    console.error('Error processing order:', error);
    await cancelInventoryReservation(orderId); // Undo reserved inventory
    throw error;
  }
}

This example shows how to rollback inventory reservation if payment or confirmation fails, avoiding inconsistent order states.

### Preventing Race Conditions

Use concurrency-safe mechanisms like database transactions, optimistic locking, or distributed locks to avoid race conditions.

typescript
async function updateUserProfile(userId: string, update: Partial<UserProfile>) {
  const maxRetries = 3;
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const user = await getUser(userId);
    user.version = user.version || 0;
    const updatedProfile = { ...user, ...update, version: user.version + 1 };

    const success = await updateUserIfVersionMatches(userId, updatedProfile, user.version);
    if (success) return updatedProfile;
    // If not successful, another update happened concurrently, retry
  }
  throw new Error('Failed to update profile due to concurrent modifications');
}

This optimistic locking approach retries updates if the data version has changed due to another process updating simultaneously.

### Summary

Distributed systems are prone to unique challenges. By anticipating edge cases and using robust error handling, retries, rollbacks, and concurrency controls, you can increase your TypeScript application's resilience and reliability. Start simple with try-catch blocks and gradually implement more advanced patterns as your system grows.