Resilience Patterns

Goal: Master the four resilience primitives (retry, timeout, cache, fallback) and their combinations to build production-grade LLM pipelines.

Overview

Resilience in Constellation is built on four declarative primitives:

Retry - Automatic retry on transient failures
Timeout - Time limits to prevent hanging operations
Cache - Result storage to avoid redundant work
Fallback - Graceful degradation when all else fails

These primitives compose cleanly and are specified via the with clause on module calls. No custom error handling code required.

Why this matters for LLMs:

LLM APIs are inherently unreliable (rate limits, timeouts, transient errors)
Token costs make redundant calls expensive
User experience demands fast responses and graceful degradation
Production systems need predictable latency and failure modes

Decision Matrix

Start here: Choose your resilience strategy based on the operation characteristics.

Operation Type	Recommended Pattern	Rationale
LLM API call	`retry: 3, timeout: 30s, cache: 1h, fallback: simpler_model`	High latency, rate limits, expensive tokens
Embedding generation	`cache: 7d, timeout: 10s, retry: 2`	Deterministic, expensive, stable over time
Vector search	`timeout: 5s, retry: 2, fallback: []`	Fast expected, degradable to empty results
Prompt validation	`timeout: 1s, cache: 5min`	Fast, deterministic, frequently repeated
Token counting	`cache: 30min, timeout: 500ms`	Pure function, cacheable, must be fast
Model metadata fetch	`cache: 1d, timeout: 3s, retry: 2`	Static data, infrequent changes
User context lookup	`cache: 15min, timeout: 2s, fallback: default_context`	Session data, degradable
Safety check	`timeout: 5s, retry: 2, fallback: reject`	Critical, must complete, fail-safe
RAG retrieval	`timeout: 3s, cache: 10min, retry: 2, fallback: []`	Latency-sensitive, degradable
Response streaming	`timeout: 60s, retry: 0, fallback: cached_response`	Long-running, non-retriable

Decision Flowchart

Start: What kind of operation?
│
├─ Network call?
│  ├─ External API? → timeout + retry + fallback
│  └─ Internal service? → timeout + retry
│
├─ Expensive (tokens/compute)?
│  ├─ Deterministic? → cache (long TTL)
│  └─ Non-deterministic? → cache (short TTL) + retry
│
├─ Critical for correctness?
│  ├─ Must not fail? → retry + fallback (conservative default)
│  └─ Can degrade? → retry + fallback (empty/degraded)
│
└─ Latency-sensitive?
   ├─ User-facing? → timeout (short) + cache + fallback
   └─ Background? → timeout (long) + retry + cache

Retry Patterns

1. Basic Retry

Use when: Transient failures are common (network glitches, rate limits).

in prompt: String

response = GenerateText(prompt) with retry: 3
out response

Behavior:

1 initial attempt + 3 retries = 4 total attempts
No delay between retries (immediate retry)
Fails if all 4 attempts fail

When to use:

Quick operations where immediate retry makes sense
Internal services with fast recovery

When to avoid:

Rate-limited APIs (immediate retry will hit rate limit again)
Expensive operations (wastes resources on rapid retries)

2. Retry with Fixed Delay

Use when: Service needs brief recovery time between attempts.

in prompt: String

response = GenerateText(prompt) with
    retry: 3,
    delay: 1s,
    backoff: fixed

out response

Behavior:

Waits 1s between each retry
Attempt timing: 0s, 1s, 2s, 3s
Total time: 3s delay + 4 × execution time

When to use:

Services with brief recovery periods
Internal APIs with predictable failure patterns

Best practices:

Use 200ms-1s delay for internal services
Use 1s-3s delay for external APIs

3. Retry with Exponential Backoff

Use when: Service needs increasing recovery time (rate limits, overload).

in prompt: String

response = GenerateText(prompt) with
    retry: 5,
    delay: 1s,
    backoff: exponential

out response

Behavior:

Delay doubles each retry: 1s, 2s, 4s, 8s, 16s
Capped at 30s per attempt
Attempt timing: 0s, 1s (wait) → 1s, 2s (wait) → 3s, 4s (wait) → 7s, 8s (wait) → 15s

Why exponential:

Gives recovering services progressively more time
Avoids overwhelming a rate-limited API
Standard practice for external APIs (AWS, OpenAI, Anthropic)

When to use:

Rate-limited LLM APIs
Cloud services with backoff recommendations
Any external API with transient overload

Configuration guide:

Initial Delay	Max Retries	Total Wait Time	Use Case
100ms	3	~700ms	Fast internal services
500ms	4	~7.5s	Standard APIs
1s	5	~31s	Rate-limited LLM APIs
2s	5	~62s	Slow external services

4. Retry with Linear Backoff

Use when: Service needs predictable, increasing recovery time.

in prompt: String

response = GenerateText(prompt) with
    retry: 5,
    delay: 1s,
    backoff: linear

out response

Behavior:

Delay increases by base amount: 1s, 2s, 3s, 4s, 5s
Attempt timing: 0s, 1s → 1s, 2s → 3s, 3s → 6s, 4s → 10s

When to use:

Internal services with linear recovery characteristics
When you want more aggressive retry than exponential but not immediate

Comparison:

Retry	Exponential	Linear	Fixed
1 → 2	1s	1s	1s
2 → 3	2s	2s	1s
3 → 4	4s	3s	1s
4 → 5	8s	4s	1s
Total	15s	10s	4s

5. Retry Budget Pattern

Use when: You want to limit total retry time, not just attempts.

in prompt: String

response = GenerateText(prompt) with
    retry: 10,
    delay: 1s,
    backoff: exponential,
    timeout: 5s  # Timeout per attempt

out response

Effective behavior:

Each attempt has 5s timeout
Exponential backoff between attempts
Total possible time: 10 attempts × 5s = 50s + backoff delays
But: if timeouts are consistent, you'll exhaust retries quickly

Why this works:

Timeout bounds per-attempt time
Retry count bounds total attempts
Backoff strategy bounds retry frequency

Total time formula:

Total = (retries + 1) × timeout + sum(backoff_delays)

Example calculation:

retry: 5, timeout: 10s, delay: 1s, backoff: exponential
Attempts: 6 × 10s = 60s
Backoff: 1 + 2 + 4 + 8 + 16 = 31s
Total: ~91s maximum

6. Idempotency-Safe Retry

Critical: Only retry idempotent operations.

Idempotent operations (safe to retry):

# Reading - safe
profile = FetchUserProfile(userId) with retry: 3

# Pure computation - safe
embeddings = GenerateEmbeddings(text) with retry: 3

# GET requests - safe
data = HttpGet(url) with retry: 3

Non-idempotent operations (NEVER retry without safeguards):

# Writing - NOT SAFE
# Don't do this:
result = CreateUser(userData) with retry: 3  # May create duplicates!

# Payment - NOT SAFE
payment = ChargeCard(amount) with retry: 3  # May double-charge!

# Incrementing - NOT SAFE
count = IncrementCounter() with retry: 3  # May over-count!

How to handle non-idempotent operations:

Use idempotency keys (if the service supports it):

in paymentData: Record
in idempotencyKey: String

payment = ChargeCard(paymentData, idempotencyKey) with retry: 3

Check before retry:

in userData: Record

# No retry on creation
userId = CreateUser(userData)

# Retry-safe lookup
profile = FetchUserProfile(userId) with retry: 3

Use fallback instead of retry:

in userData: Record

userId = CreateUser(userData) with
    timeout: 10s,
    fallback: LookupExistingUser(userData)

Timeout Patterns

1. Basic Timeout

Use when: Operation has a known maximum reasonable duration.

in prompt: String

response = GenerateText(prompt) with timeout: 30s
out response

Behavior:

If execution exceeds 30s, operation is cancelled
Raises ModuleTimeoutException
No retry (unless explicitly added)

When to use:

Any network operation
Database queries
External API calls
File I/O operations

When to avoid:

Pure computations (unless you want to bound CPU time)
Operations with unpredictable duration (use timeout + fallback instead)

2. Timeout Per Attempt (with Retry)

Use when: Retry is needed, but each attempt should fail fast.

in prompt: String

response = GenerateText(prompt) with
    timeout: 10s,
    retry: 3

out response

Behavior:

Each of 4 attempts has a 10s timeout
Total possible time: 4 × 10s = 40s
If attempt 1 times out, retry immediately (or with delay if configured)

Key insight:

Timeout is per attempt, not total
Without timeout, a hanging first attempt blocks forever
With timeout, hanging attempts fail fast and allow retry

Example timeline:

0s    : Start attempt 1
10s   : Attempt 1 times out
10s   : Start attempt 2
15s   : Attempt 2 succeeds
Total : 15s

3. Deadline Propagation Pattern

Use when: Multiple operations must complete within a total time budget.

in query: String
in maxLatency: Duration

@example(30s)
in deadline: Duration

# Fast retrieval
docs = VectorSearch(query) with
    timeout: when deadline < 10s then 2s else 5s

# Main LLM call
response = GenerateWithContext(query, docs) with
    timeout: when deadline < 10s then 5s else 20s

out response

Behavior:

Adjust timeouts based on total deadline
Earlier operations get shorter timeouts to leave budget for later ones
Ensures total latency stays under deadline

Total budget allocation example:

Total deadline: 30s
├─ VectorSearch: 5s (17%)
├─ GenerateWithContext: 20s (67%)
└─ Buffer: 5s (16%)

Best practices:

Allocate 10-20% buffer for overhead
Earlier operations get smaller share
Critical operations get larger share

4. Adaptive Timeout Pattern

Use when: Timeout should adjust based on recent performance.

in prompt: String
in recentAvgLatency: Duration

# Use 2× recent average, capped at 60s
adaptiveTimeout = Min(recentAvgLatency * 2, 60s)

response = GenerateText(prompt) with
    timeout: adaptiveTimeout,
    retry: 2

out response

Why adaptive:

Accounts for model performance variations
Prevents false timeouts during slow periods
Prevents excessive waits during fast periods

Implementation note: This requires tracking metrics outside the pipeline. Use this pattern when:

You have latency monitoring
Model performance varies significantly
False timeouts are costly

5. Timeout with Graceful Degradation

Use when: Timeout is expected, and you have a degraded alternative.

in query: String

# Try full LLM generation
fullResponse = GenerateDetailed(query) with
    timeout: 10s,
    fallback: GenerateSimple(query) with timeout: 3s

out fullResponse

Behavior:

Try detailed response (10s timeout)
If timeout, fall back to simple response (3s timeout)
If simple also times out, raise error (or add another fallback)

Latency guarantee:

Best case: detailed response in <10s
Degraded case: simple response in <13s (10s + 3s)
Worst case: error at 13s

Cache Patterns

1. Basic Result Caching

Use when: Same inputs produce same outputs (pure functions).

in text: String

embeddings = GenerateEmbeddings(text) with cache: 1h
out embeddings

Behavior:

First call: execute module, store result
Subsequent calls (within 1h): return cached result
After 1h: cache expires, re-execute module

Cache key:

Computed from: module name + input values
GenerateEmbeddings("hello") and GenerateEmbeddings("world") have different keys
GenerateEmbeddings("hello") always has the same key

When to use:

Deterministic operations (same input → same output)
Expensive operations (LLM calls, embeddings, expensive compute)
Frequently repeated inputs

When to avoid:

Non-deterministic operations (e.g., GetCurrentTime)
Operations with side effects (writes, increments)
Rapidly changing data

2. TTL Selection Guide

Choose cache TTL based on data freshness requirements and change frequency.

Data Type	Recommended TTL	Rationale
Embeddings	7d - 30d	Text embeddings are deterministic and stable
Model responses	1h - 24h	Balance freshness vs. cost; depends on prompt stability
User context	5min - 30min	Session data changes moderately
Prompt validation	1h - 6h	Validation rules change infrequently
Token counts	1h	Deterministic but rules may change
Model metadata	1d - 7d	Model configs change rarely
Safety checks	1h	Balance safety vs. performance
RAG retrieval	10min - 1h	Documents change, but not constantly

Formula for choosing TTL:

TTL = min(
  Max_acceptable_staleness,
  Typical_reuse_window,
  Cost_saving_target / Request_cost
)

Example:

Max staleness: 6h (data updates every 6h)
Reuse window: 2h (users repeat queries within 2h)
Cost: $0.01/call, target $100/day savings → need 10k cached calls/day → 1h TTL sufficient

Chosen TTL: 1h (limited by reuse window)

3. Cache-Aside Pattern

Use when: Cache is optional, and you want explicit control over cache hits/misses.

in userId: String
in useCache: Boolean

# Conditional caching
profile = when useCache
  then LoadUserProfile(userId) with cache: 15min
  else LoadUserProfile(userId)

out profile

Behavior:

If useCache = true: use cache (15min TTL)
If useCache = false: bypass cache, always fetch fresh

When to use:

Admin operations need fresh data
User has option to "refresh"
Testing cache behavior

4. Write-Through Cache Pattern

Use when: Writes should update cache immediately.

in userId: String
in newData: Record

# Update storage
result = UpdateUser(userId, newData)

# Invalidate cache by re-fetching with cache
fresh = LoadUserProfile(userId) with cache: 15min

out fresh

Behavior:

Write operation (no cache)
Re-fetch with cache (stores fresh value in cache)
Subsequent reads get updated cached value

Limitation in Constellation:

No explicit cache invalidation API
Must re-fetch to update cache
Alternative: use cache_backend with external invalidation

5. Multi-Level Cache Pattern

Use when: Different cache durations for different operations.

in query: String

# Short TTL for retrieval (data changes)
docs = VectorSearch(query) with cache: 10min

# Long TTL for embeddings (deterministic)
queryEmbedding = GenerateEmbedding(query) with cache: 7d

# Medium TTL for generation (balance cost vs. freshness)
response = GenerateWithContext(query, docs) with cache: 1h

out response

Rationale:

Embeddings are pure → cache long
Retrieval results change → cache short
LLM responses are expensive → cache medium

6. Distributed Cache Pattern

Use when: Multiple instances need to share cached results.

in text: String

embeddings = GenerateEmbeddings(text) with
    cache: 1d,
    cache_backend: "redis"

out embeddings

Behavior:

Results stored in Redis (or Memcached)
All instances share the same cache
Cache persists across restarts

When to use:

Multi-instance deployments
Horizontal scaling
Cache needs to survive restarts

Setup required:

// Configure cache backend
ConstellationBuilder[IO]
  .withCacheBackend("redis", RedisCacheBackend(redisClient))
  .build

See also: cache_backend option

7. Cache Hit Rate Monitoring

Monitor cache effectiveness to tune TTLs.

# Get cache statistics
curl http://localhost:8080/metrics | jq .cache

# Expected output:
# {
#   "hits": 8543,
#   "misses": 1234,
#   "hitRate": 0.87,
#   "size": 456
# }

Target hit rates:

>80% - Excellent (most requests cached)
60-80% - Good (decent cache utilization)
40-60% - Fair (consider increasing TTL)
<40% - Poor (cache not effective, shorten TTL or drop caching)

Tuning based on hit rate:

Low hit rate + high misses → TTL too short, increase TTL
High hit rate + stale data → TTL too long, decrease TTL
Low hit rate + unique inputs → caching won't help, remove cache

Fallback Patterns

1. Static Default Fallback

Use when: You have a sensible default value.

in stockSymbol: String

price = GetStockPrice(stockSymbol) with
    retry: 2,
    timeout: 5s,
    fallback: 0.0

out price

Behavior:

Try to fetch price (with retry)
If all attempts fail, return 0.0
No error raised

When to use:

Non-critical data with obvious defaults
Degraded mode is acceptable
Errors would block the pipeline unnecessarily

When to avoid:

Critical data (where 0.0 would be misleading)
Operations where failure should be visible
Cases where caller needs to distinguish success from failure

2. Alternative Service Fallback

Use when: You have a backup service.

in prompt: String

response = PrimaryLLM(prompt) with
    timeout: 10s,
    retry: 2,
    fallback: BackupLLM(prompt) with timeout: 10s

out response

Behavior:

Try primary LLM (with retry)
If exhausted, try backup LLM
If backup fails, raise error (or add another fallback)

Use cases:

Primary: expensive, high-quality model
Backup: cheaper, lower-quality model
Ensures response even if primary is down

Example: Model cascade

in prompt: String

response = GPT4(prompt) with
    timeout: 30s,
    retry: 2,
    fallback: GPT35(prompt) with
        timeout: 15s,
        retry: 2,
        fallback: Claude(prompt) with timeout: 15s

out response

Latency worst-case:

GPT-4: 2 retries × 30s = 60s
GPT-3.5: 2 retries × 15s = 30s
Claude: 1 attempt × 15s = 15s
Total: ~105s maximum

Best practice:

Limit cascade depth to 2-3 levels
Use progressively shorter timeouts
Consider total latency budget

3. Cached Fallback Pattern

Use when: Stale data is better than no data.

in endpoint: String

# Try fresh fetch
data = FetchData(endpoint) with
    timeout: 5s,
    retry: 2,
    fallback: GetCachedData(endpoint)

out data

Behavior:

Try to fetch fresh data
If failed, return last known cached value (even if expired)
Requires GetCachedData module that returns last cached value

Implementation note: This requires a module that explicitly reads from cache without TTL checks.

4. Degraded Mode Fallback

Use when: Partial results are better than complete failure.

in query: String

# Try full context generation
fullContext = GenerateContext(query) with
    timeout: 10s,
    retry: 2,
    fallback: { summary: "", relevance: 0.0 }

# Try document retrieval
docs = VectorSearch(query) with
    timeout: 5s,
    retry: 2,
    fallback: []

# Generate with whatever context is available
response = GenerateWithContext(query, docs, fullContext)
out response

Behavior:

Context generation fails → use empty context
Vector search fails → use empty docs
Generation still proceeds with degraded inputs

When to use:

Non-critical features (enrichment, metadata)
Operations where partial data is useful
User experience prioritizes response over completeness

5. Fallback with Logging

Use when: Fallback is used, but you want visibility into failures.

in userId: String
in defaultData: Record

profile = LoadUserProfile(userId) with
    retry: 3,
    timeout: 5s,
    fallback: defaultData,
    on_error: log

out profile

Behavior:

Try to load profile (with retry)
If failed, return defaultData
Log error before using fallback

Why this matters:

Fallback masks errors from users
Logging ensures engineers see failures
Allows monitoring of fallback usage rate

Monitoring:

# Count fallback usage
grep "LoadUserProfile fallback" logs/*.log | wc -l

# If >5% of requests use fallback, investigate

6. Conditional Fallback

Use when: Fallback should vary based on context.

in query: String
in userTier: String

response = GeneratePremium(query) with
    timeout: 30s,
    retry: 2,
    fallback: when userTier == "free"
        then GenerateBasic(query)
        else RaiseError("Premium service unavailable")

out response

Behavior:

Free users: fallback to basic generation
Premium users: error raised (no degraded experience)

Use cases:

Tiered service levels
A/B testing (fallback for control group only)
Conditional degradation based on load

Combined Patterns

1. Full Resilience Stack

Use when: Operation is critical, expensive, and unreliable.

in prompt: String
in defaultResponse: String

response = GenerateText(prompt) with
    cache: 1h,
    retry: 3,
    delay: 1s,
    backoff: exponential,
    timeout: 30s,
    fallback: defaultResponse,
    on_error: log

out response

Execution order:

Check cache (if hit, return immediately)
On cache miss, execute module
If times out, retry (with exponential backoff)
Retry up to 3 times
If all retries fail, log error and return fallback
If any attempt succeeds, cache result

Latency analysis:

Best case: <5ms (cache hit)
Typical case: 10-30s (first attempt succeeds, cached)
Degraded case: ~75s (3 retries with backoff + timeouts) → fallback

Cost analysis:

Cache hit: $0 (no LLM call)
Cache miss: $0.01/call × 1-4 attempts = $0.01-$0.04
With 80% hit rate: $0.002-$0.008 average per request

2. Read-Through Cache with Retry

Use when: Cache is primary optimization, retry is backup.

in text: String

embeddings = GenerateEmbeddings(text) with
    cache: 7d,
    timeout: 10s,
    retry: 2,
    delay: 500ms,
    backoff: exponential

out embeddings

Behavior:

First call: execute (with retry), cache result
Subsequent calls: return cached (no retry needed)
Retry only happens on cache miss

Why this ordering:

Cache eliminates most retry needs (80%+ hit rate)
Retry provides resilience for the 20% cache misses
No wasted retries on cache hits

3. Lazy Evaluation with Cache

Use when: Result may not be needed, but if needed, should be cached.

in query: String
in includeMetadata: Boolean

# Only compute if needed
metadata = when includeMetadata
    then GenerateMetadata(query) with
        cache: 1h,
        lazy: true
    else {}

response = GenerateResponse(query, metadata)
out response

Behavior:

If includeMetadata = false: metadata never computed
If includeMetadata = true: compute once, cache for 1h
Subsequent requests with same query: use cached metadata

Use cases:

Optional expensive operations
Conditional features based on user tier
Operations with variable necessity

4. Circuit Breaker Pattern

Use when: Repeated failures should skip attempts entirely.

in endpoint: String
in circuitOpen: Boolean

data = when circuitOpen
    then FallbackData(endpoint)
    else FetchData(endpoint) with
        retry: 2,
        timeout: 5s,
        fallback: FallbackData(endpoint)

out data

Behavior:

If circuit open: immediately use fallback (no attempt)
If circuit closed: try fetch (with retry/timeout)
Circuit state managed externally

Implementation note: This requires external circuit breaker logic tracking failure rates. The pattern shows how to integrate it into pipelines.

Typical circuit breaker logic:

if failure_rate > 50% in last 1min:
    circuitOpen = true
    wait 30s
    try one request (half-open)
    if success: circuitOpen = false
    if failure: circuitOpen = true, wait 30s again

5. Priority + Timeout + Fallback

Use when: Resource-constrained environment with varying criticality.

in criticalQuery: String
in optionalQuery: String

# High priority, longer timeout, no fallback (must succeed)
critical = ProcessCritical(criticalQuery) with
    priority: high,
    timeout: 30s,
    retry: 3

# Low priority, short timeout, immediate fallback
optional = ProcessOptional(optionalQuery) with
    priority: low,
    timeout: 5s,
    retry: 1,
    fallback: {}

out critical
out optional

Behavior (under load):

Critical requests get priority scheduling
Critical requests get more retry budget
Optional requests fail fast (5s timeout, 1 retry)
Optional requests don't block critical ones

Use cases:

Multi-tenant systems (paid tier = high priority)
Background vs. real-time workloads
Critical path vs. enrichment

6. Retry with Timeout and Dynamic Backoff

Use when: Backoff should adapt to failure types.

in prompt: String
in recentFailureRate: Float

delay = when recentFailureRate > 0.5 then 5s else 1s
backoff = when recentFailureRate > 0.5 then exponential else linear

response = GenerateText(prompt) with
    retry: 3,
    delay: delay,
    backoff: backoff,
    timeout: 30s

out response

Behavior:

High failure rate → longer delay, exponential backoff (back off aggressively)
Low failure rate → shorter delay, linear backoff (retry faster)

Use cases:

Adaptive resilience during incidents
Multi-region deployments with varying health
APIs with variable rate limits

Anti-Patterns and Pitfalls

Anti-Pattern 1: Retry Without Timeout

Problem: Hanging calls never fail, retry never kicks in.

Bad:

response = SlowAPI(input) with retry: 3
# If SlowAPI hangs, retry never happens (still waiting on attempt 1)

Good:

response = SlowAPI(input) with
    timeout: 10s,
    retry: 3
# Timeout ensures hanging calls fail, allowing retry

Rule: Always pair retry with timeout.

Anti-Pattern 2: Cache Non-Deterministic Operations

Problem: Caching random/time-dependent results gives stale/incorrect data.

Bad:

time = GetCurrentTime() with cache: 1h
# Returns same time for 1 hour!

Good:

time = GetCurrentTime()
# No cache - always fresh

Rule: Only cache pure functions (same input → same output).

Anti-Pattern 3: Aggressive Retry on Non-Transient Errors

Problem: Retrying validation errors, auth errors, or malformed requests wastes resources.

Bad:

result = ValidateInput(input) with retry: 5
# If input is malformed, retrying won't help (will fail 5 times)

Good:

result = ValidateInput(input) with timeout: 1s
# Validation should be fast, no retry needed

Rule: Only retry transient errors (network, rate limits, timeouts). Don't retry permanent errors (validation, auth, not-found).

Implementation note: Constellation retries all errors. To avoid retrying permanent errors, ensure modules distinguish error types and raise non-retriable exceptions.

Anti-Pattern 4: Long Timeout Without Fallback

Problem: Users wait too long for failed operations.

Bad:

response = SlowAPI(input) with timeout: 120s, retry: 3
# User waits up to 8 minutes for a response!

Good:

response = SlowAPI(input) with
    timeout: 30s,
    retry: 2,
    fallback: CachedResponse(input)
# User waits max 90s, gets cached response if failed

Rule: If timeout × retries > 60s, add fallback for graceful degradation.

Anti-Pattern 5: Caching Errors

Problem: Errors don't get cached, but you might expect them to.

Behavior:

result = FlakyAPI(input) with cache: 1h, retry: 0
# If FlakyAPI fails, error is raised (not cached)
# Next call with same input will try again (no cache)

Clarification:

Only successful results are cached
Errors are never cached
This is correct behavior (caching errors would propagate failures)

If you want error caching:

result = FlakyAPI(input) with
    cache: 1h,
    fallback: { error: "Service unavailable" }
# Fallback value gets cached

Rule: Understand that cache only caches successful results.

Anti-Pattern 6: Insufficient Cache TTL

Problem: Cache expires too soon, defeating the purpose.

Bad:

embeddings = GenerateEmbeddings(text) with cache: 5min
# Embeddings are deterministic, why only 5min cache?

Good:

embeddings = GenerateEmbeddings(text) with cache: 7d
# Embeddings don't change, cache longer

Rule: Match TTL to data stability. Deterministic operations can cache for days.

Anti-Pattern 7: Fallback Without Logging

Problem: Fallback masks errors, making debugging impossible.

Bad:

result = ImportantAPI(input) with fallback: default
# If API is failing 100% of the time, you'll never know

Good:

result = ImportantAPI(input) with
    fallback: default,
    on_error: log
# Errors are logged, you can monitor failure rate

Rule: Always log when using fallback, or you'll be blind to failures.

Pitfall 1: Misunderstanding Timeout Scope

Misconception: timeout applies to total execution (including retries).

Reality: timeout applies per attempt.

Example:

result = API(input) with timeout: 10s, retry: 3
# Total time: 4 attempts × 10s = 40s (not 10s!)

Correct interpretation:

Each attempt has 10s timeout
4 total attempts (1 initial + 3 retries)
Maximum total time: 40s + backoff delays

Pitfall 2: Cache Key Collisions

Misconception: Module name is sufficient for cache key.

Reality: Cache key includes module name AND input values.

Example:

# These have DIFFERENT cache keys:
emb1 = Embed("hello") with cache: 1h  # Key: hash(Embed, "hello")
emb2 = Embed("world") with cache: 1h  # Key: hash(Embed, "world")

# These have the SAME cache key:
emb3 = Embed("hello") with cache: 1h  # Key: hash(Embed, "hello")
emb4 = Embed("hello") with cache: 1h  # Key: hash(Embed, "hello") - cache hit!

Pitfall 3: Forgetting Exponential Backoff Cap

Misconception: Exponential backoff grows forever.

Reality: Exponential backoff is capped at 30s per delay.

Example:

result = API(input) with
    retry: 10,
    delay: 1s,
    backoff: exponential

Actual delays:

Retry 1 → 2: 1s
Retry 2 → 3: 2s
Retry 3 → 4: 4s
Retry 4 → 5: 8s
Retry 5 → 6: 16s
Retry 6 → 7: 30s (capped, would be 32s)
Retry 7 → 8: 30s (capped)
Retry 8 → 9: 30s (capped)
Retry 9 → 10: 30s (capped)
Total delay: ~151s

Testing Resilience

1. Unit Testing Retry Logic

Test that retry actually retries:

// Mock module that fails N times then succeeds
val flakyModule = new FlakyModule(failureCount = 2)

// Execute with retry
val pipeline = """
  in input: String
  result = FlakyModule(input) with retry: 3
  out result
"""

val result = execute(pipeline, Map("input" -> "test"))
// Assert: succeeded after 3 attempts
assert(flakyModule.attemptCount == 3)

2. Unit Testing Timeout

Test that timeout actually fires:

// Mock module that hangs
val slowModule = new SlowModule(delay = 60.seconds)

val pipeline = """
  in input: String
  result = SlowModule(input) with timeout: 1s
  out result
"""

val result = execute(pipeline, Map("input" -> "test"))
// Assert: timed out after 1s
assert(result.isFailure)
assert(result.error.contains("timeout"))

3. Unit Testing Cache

Test that cache hits work:

var callCount = 0
val expensiveModule = new ExpensiveModule {
  override def execute(input: Input): Output = {
    callCount += 1
    Output(input.value)
  }
}

val pipeline = """
  in input: String
  result = ExpensiveModule(input) with cache: 1h
  out result
"""

// First call
execute(pipeline, Map("input" -> "test"))
assert(callCount == 1)

// Second call (same input)
execute(pipeline, Map("input" -> "test"))
assert(callCount == 1) // Still 1, cache hit!

// Third call (different input)
execute(pipeline, Map("input" -> "other"))
assert(callCount == 2) // Cache miss

4. Integration Testing Fallback

Test that fallback is used on failure:

val pipeline = """
  in input: String
  result = UnreliableAPI(input) with
    retry: 2,
    fallback: "default"
  out result
"""

// Simulate API down
setAPIStatus(down = true)

val result = execute(pipeline, Map("input" -> "test"))
assert(result("result") == "default")

5. Load Testing Resilience

Test behavior under load:

# Simulate high load
for i in {1..1000}; do
  curl -X POST http://localhost:8080/execute \
    -H "Content-Type: application/json" \
    -d '{"source": "...", "inputs": {...}}' &
done
wait

# Check metrics
curl http://localhost:8080/metrics | jq

# Verify:
# - Cache hit rate >80%
# - Retry rate <10%
# - Timeout rate <5%
# - Fallback rate <5%

6. Chaos Testing

Test resilience under failure conditions:

# Simulate network failures
# (using toxiproxy or similar)

# 1. Test timeout under latency
add_latency 5s
# Verify: requests timeout correctly

# 2. Test retry under packet loss
add_packet_loss 50%
# Verify: retry succeeds after multiple attempts

# 3. Test fallback under total outage
block_traffic
# Verify: fallback is used

# 4. Test cache under intermittent failures
add_intermittent_failures 20%
# Verify: cache rate increases (reduces failures)

Configuration Recipes

Recipe 1: Standard LLM API Call

Use case: Call OpenAI, Anthropic, Cohere, etc.

in prompt: String

response = LLMGenerate(prompt) with
    timeout: 30s,
    retry: 3,
    delay: 1s,
    backoff: exponential,
    cache: 1h,
    fallback: SimplerModel(prompt)

out response

Why:

30s timeout: generous for LLM calls
3 retries: handles transient rate limits
Exponential backoff: respects rate limits
1h cache: balances cost vs. freshness
Fallback: degrades to simpler model

Recipe 2: Embedding Generation

Use case: Generate text embeddings.

in text: String

embeddings = GenerateEmbeddings(text) with
    timeout: 10s,
    retry: 2,
    delay: 500ms,
    backoff: exponential,
    cache: 7d

out embeddings

Why:

10s timeout: embeddings are fast
2 retries: sufficient for transient errors
7d cache: embeddings are deterministic
No fallback: embeddings are critical, errors should surface

Recipe 3: Vector Search

Use case: Search vector database (Pinecone, Weaviate, etc.)

in query: String

results = VectorSearch(query) with
    timeout: 5s,
    retry: 2,
    delay: 200ms,
    fallback: []

out results

Why:

5s timeout: vector search should be fast
2 retries: brief retry for transient errors
200ms delay: fast retry (internal service)
Empty fallback: degradable to no results
No cache: results change as data updates

Recipe 4: Prompt Validation

Use case: Check prompt for safety, policy violations.

in prompt: String

validation = ValidatePrompt(prompt) with
    timeout: 2s,
    cache: 1h

out validation

Why:

2s timeout: validation should be fast
1h cache: same prompts repeat
No retry: validation errors are deterministic
No fallback: validation must complete

Recipe 5: RAG Pipeline

Use case: Retrieval-augmented generation.

in query: String

# Fast retrieval with short cache
docs = VectorSearch(query) with
    timeout: 3s,
    retry: 2,
    cache: 10min,
    fallback: []

# Expensive generation with long cache
response = GenerateWithContext(query, docs) with
    timeout: 30s,
    retry: 3,
    delay: 1s,
    backoff: exponential,
    cache: 1h,
    fallback: GenerateWithoutContext(query)

out response

Why:

Retrieval: fast timeout, short cache (data changes)
Generation: long timeout, long cache (expensive)
Fallback chain: retrieval fails → empty docs, generation fails → no-context generation

Recipe 6: Multi-Model Cascade

Use case: Try expensive model, fall back to cheaper.

in prompt: String

response = GPT4(prompt) with
    timeout: 60s,
    retry: 2,
    delay: 2s,
    backoff: exponential,
    cache: 2h,
    fallback: GPT35(prompt) with
        timeout: 30s,
        retry: 2,
        cache: 1h

out response

Why:

GPT-4: long timeout (slow), long cache (expensive)
GPT-3.5: shorter timeout (faster), shorter cache (cheaper)
Two-level fallback: best effort quality, guaranteed response

Best Practices Summary

General Principles

Always pair retry with timeout - Retry without timeout is useless (hanging calls never fail)
Only cache pure functions - Caching non-deterministic operations gives stale data
Log when using fallback - Fallback masks errors, logging provides visibility
Match TTL to data stability - Deterministic data can cache longer
Use exponential backoff for external APIs - Respects rate limits and recovering services
Test resilience under failure - Unit test each resilience primitive
Monitor cache hit rates - Tune TTLs based on actual hit rates

Resilience Checklist

Use this checklist for every external module call:

Quick Reference Table

Operation	Timeout	Retry	Backoff	Cache	Fallback
LLM API call	30s	3	exponential	1h	simpler model
Embeddings	10s	2	exponential	7d	-
Vector search	5s	2	linear	10min	[]
Validation	2s	0	-	1h	-
Token count	1s	0	-	30min	-
Metadata fetch	3s	2	linear	1d	-
User context	2s	2	linear	15min	default
Safety check	5s	2	linear	1h	reject

Module Options Reference - Complete option syntax
retry option - Detailed retry documentation
timeout option - Detailed timeout documentation
cache option - Detailed cache documentation
fallback option - Detailed fallback documentation
Retry and Fallback Cookbook - Working examples
Caching Cookbook - Caching examples
Resilient Pipeline Cookbook - Full pipeline example

Next: Module Development Patterns

Overview​

Decision Matrix​

Decision Flowchart​

Retry Patterns​

1. Basic Retry​

2. Retry with Fixed Delay​

3. Retry with Exponential Backoff​

4. Retry with Linear Backoff​

5. Retry Budget Pattern​

6. Idempotency-Safe Retry​

Timeout Patterns​

1. Basic Timeout​

2. Timeout Per Attempt (with Retry)​

3. Deadline Propagation Pattern​

4. Adaptive Timeout Pattern​

5. Timeout with Graceful Degradation​

Cache Patterns​

1. Basic Result Caching​

2. TTL Selection Guide​

3. Cache-Aside Pattern​

4. Write-Through Cache Pattern​

5. Multi-Level Cache Pattern​

6. Distributed Cache Pattern​

7. Cache Hit Rate Monitoring​

Fallback Patterns​

1. Static Default Fallback​

2. Alternative Service Fallback​

3. Cached Fallback Pattern​

4. Degraded Mode Fallback​

5. Fallback with Logging​

6. Conditional Fallback​

Combined Patterns​

1. Full Resilience Stack​

2. Read-Through Cache with Retry​

3. Lazy Evaluation with Cache​

4. Circuit Breaker Pattern​

5. Priority + Timeout + Fallback​

6. Retry with Timeout and Dynamic Backoff​

Anti-Patterns and Pitfalls​

Anti-Pattern 1: Retry Without Timeout​

Anti-Pattern 2: Cache Non-Deterministic Operations​

Anti-Pattern 3: Aggressive Retry on Non-Transient Errors​

Anti-Pattern 4: Long Timeout Without Fallback​

Anti-Pattern 5: Caching Errors​

Anti-Pattern 6: Insufficient Cache TTL​

Anti-Pattern 7: Fallback Without Logging​

Pitfall 1: Misunderstanding Timeout Scope​

Pitfall 2: Cache Key Collisions​

Pitfall 3: Forgetting Exponential Backoff Cap​

Testing Resilience​

1. Unit Testing Retry Logic​

2. Unit Testing Timeout​

3. Unit Testing Cache​

4. Integration Testing Fallback​

5. Load Testing Resilience​

6. Chaos Testing​

Configuration Recipes​

Recipe 1: Standard LLM API Call​

Recipe 2: Embedding Generation​

Recipe 3: Vector Search​

Recipe 4: Prompt Validation​

Recipe 5: RAG Pipeline​

Recipe 6: Multi-Model Cascade​

Best Practices Summary​

General Principles​

Resilience Checklist​

Quick Reference Table​

Related Documentation​

Overview

Decision Matrix

Decision Flowchart

Retry Patterns

1. Basic Retry

2. Retry with Fixed Delay

3. Retry with Exponential Backoff

4. Retry with Linear Backoff

5. Retry Budget Pattern

6. Idempotency-Safe Retry

Timeout Patterns

1. Basic Timeout

2. Timeout Per Attempt (with Retry)

3. Deadline Propagation Pattern

4. Adaptive Timeout Pattern

5. Timeout with Graceful Degradation

Cache Patterns

1. Basic Result Caching

2. TTL Selection Guide

3. Cache-Aside Pattern

4. Write-Through Cache Pattern

5. Multi-Level Cache Pattern

6. Distributed Cache Pattern

7. Cache Hit Rate Monitoring

Fallback Patterns

1. Static Default Fallback

2. Alternative Service Fallback

3. Cached Fallback Pattern

4. Degraded Mode Fallback

5. Fallback with Logging

6. Conditional Fallback

Combined Patterns

1. Full Resilience Stack

2. Read-Through Cache with Retry

3. Lazy Evaluation with Cache

4. Circuit Breaker Pattern

5. Priority + Timeout + Fallback

6. Retry with Timeout and Dynamic Backoff

Anti-Patterns and Pitfalls

Anti-Pattern 1: Retry Without Timeout

Anti-Pattern 2: Cache Non-Deterministic Operations

Anti-Pattern 3: Aggressive Retry on Non-Transient Errors

Anti-Pattern 4: Long Timeout Without Fallback

Anti-Pattern 5: Caching Errors

Anti-Pattern 6: Insufficient Cache TTL

Anti-Pattern 7: Fallback Without Logging

Pitfall 1: Misunderstanding Timeout Scope

Pitfall 2: Cache Key Collisions

Pitfall 3: Forgetting Exponential Backoff Cap

Testing Resilience

1. Unit Testing Retry Logic

2. Unit Testing Timeout

3. Unit Testing Cache

4. Integration Testing Fallback

5. Load Testing Resilience

6. Chaos Testing

Configuration Recipes

Recipe 1: Standard LLM API Call

Recipe 2: Embedding Generation

Recipe 3: Vector Search

Recipe 4: Prompt Validation

Recipe 5: RAG Pipeline

Recipe 6: Multi-Model Cascade

Best Practices Summary

General Principles

Resilience Checklist

Quick Reference Table

Related Documentation