Performance Tuning Guide

This guide covers production tuning for Constellation Engine: scheduler configuration, timeouts, circuit breakers, caching, object pooling, JVM settings, and monitoring.

Scheduler Configuration

Unbounded vs Bounded

Mode	Factory	Behavior	Use Case
Unbounded	`GlobalScheduler.unbounded`	Every task runs immediately on the Cats Effect thread pool	Development, low-concurrency
Bounded	`GlobalScheduler.bounded(...)`	Priority queue with concurrency limit	Production

The default is unbounded. For production deployments, use bounded:

import io.constellation.execution.GlobalScheduler
import scala.concurrent.duration._

GlobalScheduler.bounded(
  maxConcurrency    = 16,
  maxQueueSize      = 1000,
  starvationTimeout = 30.seconds
).use { scheduler =>
  val constellation = ConstellationImpl.builder()
    .withScheduler(scheduler)
    .build()
  // ...
}

maxConcurrency Sizing

Workload Type	Recommended	Rationale
CPU-bound modules (ML inference, computation)	cores x 1-2	Avoid context-switch overhead
IO-bound modules (HTTP calls, DB queries)	cores x 4-10	Threads block on IO, need more
Mixed	cores x 2-4	Balance between CPU and IO

To find the right value: start with cores x 2, monitor queue depth, and adjust.

maxQueueSize

Controls how many tasks can wait when all concurrency slots are busy.

Setting	Trade-off
Too small (e.g., 10)	Rejects requests quickly under load — good for fail-fast
Too large (e.g., 100000)	Absorbs bursts but increases memory and latency — requests wait longer
Recommended: 100-1000	Balances burst absorption with bounded memory

When the queue is full, new submissions throw QueueFullException.

starvationTimeout

Low-priority tasks get promoted after waiting this long, preventing indefinite starvation:

Setting	Behavior
Short (5-10s)	Low-priority tasks run sooner, less priority differentiation
Long (60s+)	Strong priority ordering, risk of low-priority starvation under sustained load
Default (30s)	Balanced

Monitor SchedulerStats.starvationPromotions — if this grows steadily, your system is under sustained high-priority load.

Environment Variables

For the HTTP server, scheduler configuration is available via environment variables:

CONSTELLATION_SCHEDULER_ENABLED=true
CONSTELLATION_SCHEDULER_MAX_CONCURRENCY=16
CONSTELLATION_SCHEDULER_STARVATION_TIMEOUT=30s

Timeout Strategy

Global DAG Timeout

Set a default timeout for all DAG executions:

ConstellationImpl.builder()
  .withDefaultTimeout(60.seconds)
  .build()

Any pipeline exceeding this duration is cancelled and modules receive Timed status.

Per-Module Timeout

Individual modules can have their own timeouts via ModuleConfig:

ModuleBuilder
  .metadata("SlowModule", "May take a while", 1, 0)
  .config(ModuleConfig(timeout = Some(30.seconds)))
  .implementation[In, Out] { ... }
  .build

Interaction Rules

Global Timeout	Module Timeout	Effective
60s	None	Module has 60s (inherits global)
60s	30s	Module has 30s (module wins)
None	30s	Module has 30s
None	None	No timeout

The global timeout applies to the entire DAG execution. If a single module's timeout fires first, that module fails but the DAG continues (other modules may still complete).

Circuit Breaker Tuning

Configuration

import io.constellation.execution.CircuitBreakerConfig

CircuitBreakerConfig(
  failureThreshold  = 5,          // Consecutive failures to trip
  resetDuration     = 30.seconds, // Time in Open before trying HalfOpen
  halfOpenMaxProbes = 1           // Probes allowed in HalfOpen
)

State Machine

     success
  ┌───────────┐
  │           │
  ▼           │
Closed ──(threshold failures)──> Open ──(resetDuration)──> HalfOpen
  ▲                                                          │
  │                                                          │
  └──────────────────(probe success)─────────────────────────┘
                     (probe failure) ──> Open

When to Adjust

Symptom	Action
Circuit opens too easily	Increase `failureThreshold` (e.g., 5 → 10)
Slow recovery after outage	Decrease `resetDuration` (e.g., 30s → 10s)
Flapping (open/close/open)	Increase `halfOpenMaxProbes` to allow more test requests
External service has cold start	Increase `resetDuration` to give it time to warm up

Monitoring

val stats: IO[CircuitStats] = circuitBreaker.stats

// CircuitStats fields:
// state: CircuitState (Closed, Open, HalfOpen)
// consecutiveFailures: Int
// totalSuccesses: Long
// totalFailures: Long
// totalRejected: Long

Cache Configuration

Compilation Caching

CachingLangCompiler wraps any LangCompiler and caches compilation results:

import io.constellation.lang.CachingLangCompiler

val compiler = CachingLangCompiler.withDefaults(baseCompiler)

Default settings: LRU eviction with reasonable max entries and TTL.

CacheBackend (SPI)

For runtime caching (module results, intermediate data), implement the CacheBackend trait:

trait CacheBackend {
  def get[A](key: String): IO[Option[CacheEntry[A]]]
  def set[A](key: String, value: A, ttl: FiniteDuration): IO[Unit]
  def delete(key: String): IO[Boolean]
  def clear: IO[Unit]
  def stats: IO[CacheStats]
}

See CacheBackend Integration Guide for Redis and Caffeine implementations.

Monitoring Cache Performance

curl http://localhost:8080/metrics | jq .cache

Key metric: hitRatio — should be >0.8 for stable workloads with unchanged sources.

val stats: IO[CacheStats] = cache.stats
// CacheStats fields:
// hits: Long
// misses: Long
// evictions: Long
// size: Int
// maxSize: Option[Int]
// hitRatio: Double (0.0 to 1.0)

If hitRatio is low:

Increase max cache size
Increase TTL if data changes infrequently
Check if cache keys are too specific (include timestamps, etc.)

Object Pool Tuning

Constellation uses object pooling internally to reduce allocation overhead:

DeferredPool — pools Deferred[IO, _] instances used for inter-module data flow
RuntimeStatePool — pools Runtime.State objects

Pooling reduces allocation by approximately 90% in high-throughput scenarios. These pools are managed automatically and do not require tuning in most deployments.

JVM Tuning

Garbage Collector

GC	Flag	Best For
G1GC	`-XX:+UseG1GC`	General-purpose, balanced latency/throughput (default on JDK 17+)
ZGC	`-XX:+UseZGC`	Low-latency, sub-millisecond pauses

Recommendation: Use G1GC (default) unless you need sub-millisecond pause times, in which case use ZGC.

Heap Sizing

Base formula: 512MB + ~1MB per concurrent DAG execution

Concurrent DAGs	Recommended Heap
10	512MB - 1GB
100	1GB - 2GB
1000	2GB - 4GB

java -Xms512m -Xmx2g -jar constellation.jar

Recommended JVM Flags

java \
  -Xms512m \
  -Xmx2g \
  -XX:+UseG1GC \
  -XX:MaxGCPauseMillis=100 \
  -XX:+HeapDumpOnOutOfMemoryError \
  -XX:HeapDumpPath=/var/log/constellation/ \
  -jar constellation.jar

For ZGC (JDK 17+):

java \
  -Xms512m \
  -Xmx2g \
  -XX:+UseZGC \
  -XX:+HeapDumpOnOutOfMemoryError \
  -jar constellation.jar

Monitoring

MetricsProvider SPI

Instrument Constellation with your metrics system by implementing MetricsProvider:

trait MetricsProvider {
  def counter(name: String, tags: Map[String, String]): IO[Unit]
  def histogram(name: String, value: Double, tags: Map[String, String]): IO[Unit]
  def gauge(name: String, value: Double, tags: Map[String, String]): IO[Unit]
}

See MetricsProvider Integration Guide for Prometheus and Datadog examples.

Built-in Instrumentation Points

Metric	Type	Tags	Description
`execution.started`	counter	`dag_name`	DAG execution started
`execution.completed`	counter	`dag_name`, `success`	DAG execution completed
`execution.duration_ms`	histogram	`dag_name`	End-to-end execution time
`module.duration_ms`	histogram	`module_name`	Per-module execution time
`module.failed`	counter	`module_name`	Module execution failures
`scheduler.active`	gauge	—	Currently running tasks
`scheduler.queued`	gauge	—	Tasks waiting in queue

Health Endpoints

Endpoint	Purpose	Access
`GET /health/live`	Liveness probe — is the process alive?	Public
`GET /health/ready`	Readiness probe — can it serve traffic?	Public
`GET /health/detail`	Component diagnostics	Opt-in, auth-gated

/health/detail returns scheduler stats, lifecycle state, custom check results, and compiler status.

SchedulerStats

val stats: IO[SchedulerStats] = scheduler.stats

// SchedulerStats fields:
// activeCount: Int         — currently executing tasks
// queuedCount: Int         — tasks waiting in queue
// totalSubmitted: Long     — lifetime submissions
// totalCompleted: Long     — lifetime completions
// highPriorityCompleted: Long
// lowPriorityCompleted: Long
// starvationPromotions: Long — low-priority tasks promoted due to starvation timeout

Diagnostic Checklist

Pipelines are slow

Check per-module execution times via MetricsProvider histograms
Identify the bottleneck module — is it CPU-bound or IO-bound?
For IO-bound: increase maxConcurrency, optimize the external call
For CPU-bound: verify maxConcurrency is not too high (context switching)
Check if compilation caching is active (hitRatio > 0.8)

Queue is growing

Monitor SchedulerStats.queuedCount over time
If steadily increasing: your submission rate exceeds processing capacity
Increase maxConcurrency (if resources allow)
Implement upstream backpressure (reject or queue at the API layer)
Check for a single slow module blocking the pipeline

High rejection rate

Check CircuitStats.totalRejected for circuit breaker rejections
If circuit is Open: the downstream module is failing — investigate root cause
Check for QueueFullException — the scheduler queue is saturated
Increase maxQueueSize for burst absorption, or increase maxConcurrency

Cache misses

Monitor CacheStats.hitRatio
If low: check if scripts change frequently (cache is invalidated on source change)
Increase max cache size if evictions are high
Increase TTL if compiled results are stable

Memory growing

Check execution storage: ExecutionStorage retains execution history in memory by default
Reduce maxExecutions in storage config (default: 1000)
Reduce maxValueSizeBytes to truncate large output values
Check for leaked CancellableExecution handles — ensure they are deregistered
Verify GC is functioning: enable GC logging with -Xlog:gc

Performance Targets

Reference numbers from the benchmark suite (see docs/dev/performance-benchmarks.md):

Operation	Target	Notes
Parse (small pipeline)	<5ms	3-5 line scripts
Full pipeline (medium)	<100ms	Parse + type-check + compile + execute
Cache hit	<5ms	Cached compilation result lookup
Cache speedup	>5x	Cached vs uncached compilation
Autocomplete response	<50ms	LSP completion request
Orchestration overhead per node	~0.15ms	Runtime scheduling overhead
p99 sustained throughput	<0.5ms/node	Under continuous load
Object pool allocation reduction	~90%	Pooled vs unpooled Deferred allocations

Next Steps

Scheduler — Detailed scheduler configuration and troubleshooting
Configuration — Environment variables and server settings
Runbook — Operational procedures and common issue resolution
Deployment — Production deployment with resource limits
Clustering — Distributed caching and shared state

Scheduler Configuration​

Unbounded vs Bounded​

maxConcurrency Sizing​

maxQueueSize​

starvationTimeout​

Environment Variables​

Timeout Strategy​

Global DAG Timeout​

Per-Module Timeout​

Interaction Rules​

Circuit Breaker Tuning​

Configuration​

State Machine​

When to Adjust​

Monitoring​

Cache Configuration​

Compilation Caching​

CacheBackend (SPI)​

Monitoring Cache Performance​

Object Pool Tuning​

JVM Tuning​

Garbage Collector​

Heap Sizing​

Recommended JVM Flags​

Monitoring​

MetricsProvider SPI​

Built-in Instrumentation Points​

Health Endpoints​

SchedulerStats​

Diagnostic Checklist​

Pipelines are slow​

Queue is growing​

High rejection rate​

Cache misses​

Memory growing​

Performance Targets​

Next Steps​

Scheduler Configuration

Unbounded vs Bounded

maxConcurrency Sizing

maxQueueSize

starvationTimeout

Environment Variables

Timeout Strategy

Global DAG Timeout

Per-Module Timeout

Interaction Rules

Circuit Breaker Tuning

Configuration

State Machine

When to Adjust

Monitoring

Cache Configuration

Compilation Caching

CacheBackend (SPI)

Monitoring Cache Performance

Object Pool Tuning

JVM Tuning

Garbage Collector

Heap Sizing

Recommended JVM Flags

Monitoring

MetricsProvider SPI

Built-in Instrumentation Points

Health Endpoints

SchedulerStats

Diagnostic Checklist

Pipelines are slow

Queue is growing

High rejection rate

Cache misses

Memory growing

Performance Targets

Next Steps