Deployment Guide

How to deploy Constellation Engine in production environments.

Docker

Build the Image

# Build fat JAR first
make assembly

# Build Docker image
make docker-build

The Dockerfile uses a multi-stage build:

Build stage — sbt assembly in a JDK image
Runtime stage — JRE-only image with non-root user, health check built in

Run with Docker

docker run -d \
  -p 8080:8080 \
  -e CONSTELLATION_PORT=8080 \
  -e JAVA_OPTS="-Xms256m -Xmx1g" \
  --name constellation \
  constellation-engine:latest

If you use the Module Provider Protocol, also expose the gRPC port:

docker run -d \
  -p 8080:8080 \
  -p 9090:9090 \
  -e CONSTELLATION_PORT=8080 \
  -e CONSTELLATION_PROVIDER_PORT=9090 \
  -e JAVA_OPTS="-Xms256m -Xmx1g" \
  --name constellation \
  constellation-engine:latest

Docker Compose

docker-compose up -d

The docker-compose.yml includes health check configuration and environment variable defaults. Edit it to enable scheduler, rate limiting, or authentication.

Kubernetes

Manifests are in deploy/k8s/.

Apply Manifests

kubectl apply -f deploy/k8s/namespace.yaml
kubectl apply -f deploy/k8s/configmap.yaml
kubectl apply -f deploy/k8s/deployment.yaml
kubectl apply -f deploy/k8s/service.yaml

What's Included

Manifest	Description
`namespace.yaml`	`constellation` namespace
`configmap.yaml`	Scheduler, rate limit, JVM settings
`deployment.yaml`	2 replicas, probes, resource limits, security context
`service.yaml`	ClusterIP service on port 8080 (add port 9090 for module providers)

Health Probes

The deployment configures:

Liveness: GET /health/live — restarts the pod if the process is unresponsive
- Initial delay: 30s, period: 10s, timeout: 5s, failure threshold: 3
Readiness: GET /health/ready — removes pod from service during draining/startup
- Initial delay: 10s, period: 5s, timeout: 3s, failure threshold: 3

Resource Limits

Default limits (adjust for your workload):

resources:
  requests:
    memory: "512Mi"
    cpu: "250m"
  limits:
    memory: "1Gi"
    cpu: "1000m"

Scaling

Constellation Engine is stateless at the HTTP layer. Scale horizontally by increasing replicas:

kubectl scale deployment constellation --replicas=4 -n constellation

Considerations:

Each instance has its own in-memory compilation cache (no shared cache)
Pipeline state (compiled DAGs) is per-instance
For shared state, use a load balancer with session affinity or an external store

Ingress / Load Balancer

Expose the service via your preferred ingress controller:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: constellation
  namespace: constellation
spec:
  rules:
    - host: constellation.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: constellation
                port:
                  number: 8080

For WebSocket (LSP) support, ensure your ingress supports WebSocket upgrades.

warning

The LSP WebSocket endpoint (/lsp) requires session affinity. Without sticky sessions, WebSocket connections will be randomly distributed across pods and connection state will be lost on each request. See Session Affinity Requirements below.

Graceful Shutdown

The server supports graceful shutdown:

Pod receives SIGTERM
Server enters Draining state
Readiness probe returns 503 (removed from service)
In-flight requests complete (up to 30s grace period)
Remaining requests are cancelled
Process exits

The K8s deployment sets terminationGracePeriodSeconds: 30 to match.

tip

Set terminationGracePeriodSeconds to at least your drain timeout plus 15 seconds buffer. If Kubernetes kills the pod before the drain completes, in-flight requests will be terminated abruptly.

Hardened Production Example

docker run -d \
  -p 8080:8080 \
  -p 9090:9090 \
  -e CONSTELLATION_PORT=8080 \
  -e CONSTELLATION_PROVIDER_PORT=9090 \
  -e CONSTELLATION_API_KEYS="sk-prod-your-admin-key-here-24chars:Admin" \
  -e CONSTELLATION_CORS_ORIGINS="https://your-app.example.com" \
  -e CONSTELLATION_RATE_LIMIT_RPM=200 \
  -e CONSTELLATION_RATE_LIMIT_BURST=40 \
  -e CONSTELLATION_SCHEDULER_ENABLED=true \
  -e CONSTELLATION_SCHEDULER_MAX_CONCURRENCY=8 \
  -e JAVA_OPTS="-Xms512m -Xmx2g -XX:+UseG1GC" \
  --name constellation \
  constellation-engine:latest

Monitoring

Metrics Endpoint

GET /metrics returns JSON with:

server — uptime, total request count
cache — compilation cache hit rate, entries, evictions
scheduler — active tasks, queue depth, starvation promotions

Health Endpoints

Endpoint	Purpose	Use For
`GET /health`	Basic check	Simple uptime monitoring
`GET /health/live`	Liveness	K8s liveness probe
`GET /health/ready`	Readiness	K8s readiness probe, load balancer
`GET /health/detail`	Diagnostics	Deep health inspection (auth-gated)

note

Use /health/live for liveness probes (should the pod be restarted?) and /health/ready for readiness probes (should the pod receive traffic?). Never use /health/ready for liveness probes as it returns 503 during graceful shutdown, which would cause unnecessary pod restarts.

Key Metrics to Monitor

Metric	Healthy Range	Action if Outside
`cache.hitRate`	> 0.8	Check for cache invalidation issues
`scheduler.queuedCount`	< maxConcurrency	Increase concurrency or scale out
`scheduler.starvationPromotions`	Low, stable	Increase starvation timeout or concurrency
`server.requests_total`	Increasing	If flat, check connectivity

Log Output

All logging uses SLF4J/Logback. Configure via logback.xml in the classpath. Default log format includes timestamps, log level, logger name, and message.

Request IDs are included in error logs via the X-Request-ID header (auto-generated if not provided by client).

Multi-Instance Deployments

Constellation Engine supports horizontal scaling with multiple instances. Each instance is stateless at the HTTP layer, making it straightforward to run behind a load balancer.

Running Multiple Instances

Docker Compose

services:
  constellation-1:
    build: .
    ports:
      - "8080:8080"
    environment:
      CONSTELLATION_PORT: "8080"
      CONSTELLATION_SCHEDULER_ENABLED: "true"
      JAVA_OPTS: "-Xms512m -Xmx1g"

  constellation-2:
    build: .
    ports:
      - "8081:8080"
    environment:
      CONSTELLATION_PORT: "8080"
      CONSTELLATION_SCHEDULER_ENABLED: "true"
      JAVA_OPTS: "-Xms512m -Xmx1g"

  nginx:
    image: nginx:latest
    ports:
      - "80:80"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
    depends_on:
      - constellation-1
      - constellation-2

Kubernetes

Increase replicas in the deployment:

kubectl scale deployment constellation-engine --replicas=4 -n constellation

Or set replicas in the manifest:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: constellation-engine
  namespace: constellation
spec:
  replicas: 4  # Multiple instances
  # ...

Shared State Considerations

Each Constellation instance maintains its own in-memory state:

State Type	Scope	Sharing Strategy
Compilation cache	Per-instance	Each instance has independent cache; consider distributed cache backend
Execution history	Per-instance	Use external storage (PostgreSQL, Redis) via `ExecutionStorage` SPI
Module registry	Per-instance	Modules are registered at startup; ensure identical configuration
Scheduler queue	Per-instance	Each instance schedules independently
Circuit breaker state	Per-instance	Each instance tracks failures independently

For workloads requiring shared state across instances:

Shared compilation cache — Use a distributed CacheBackend like Memcached or Redis:

import io.constellation.cache.memcached.{MemcachedCacheBackend, MemcachedConfig}

MemcachedCacheBackend.resource(MemcachedConfig.cluster(
  "memcached-1:11211,memcached-2:11211"
)).use { cache =>
  val compiler = LangCompilerBuilder()
    .withCacheBackend(cache)
    .build()
  // All instances share the same compilation cache
}

Shared execution history — Implement ExecutionStorage backed by a database:

val storage = new PostgresExecutionStorage(dataSource)
ConstellationImpl.builder()
  .withBackends(ConstellationBackends(storage = Some(storage)))
  .build()

Distributed circuit breakers — For coordinated failure handling across instances, implement a custom circuit breaker backed by Redis or a coordination service.

Load Balancing Strategies

Round-Robin (Default)

Suitable for stateless workloads where any instance can handle any request:

upstream constellation {
    server constellation-1:8080;
    server constellation-2:8080;
    server constellation-3:8080;
}

Kubernetes Services use round-robin by default.

Least Connections

Better for variable-duration requests (long-running pipelines):

upstream constellation {
    least_conn;
    server constellation-1:8080;
    server constellation-2:8080;
    server constellation-3:8080;
}

Weighted Distribution

For heterogeneous instances (different CPU/memory):

upstream constellation {
    server constellation-1:8080 weight=3;  # More powerful
    server constellation-2:8080 weight=1;
}

Session Affinity Requirements

Session affinity (sticky sessions) is required for:

Feature	Reason	Affinity Type
LSP WebSocket	Connection state is per-instance	IP hash or cookie
Step-through debugging	Execution state is per-instance	IP hash or cookie
Suspendable executions	Resume must hit same instance	Execution ID routing

Enabling Session Affinity in Kubernetes

apiVersion: v1
kind: Service
metadata:
  name: constellation-engine
  namespace: constellation
spec:
  type: ClusterIP
  sessionAffinity: ClientIP
  sessionAffinityConfig:
    clientIP:
      timeoutSeconds: 10800  # 3 hours
  selector:
    app.kubernetes.io/name: constellation-engine
  ports:
    - name: http
      port: 8080
      targetPort: 8080
    # Add this if using Module Provider Protocol:
    # - name: grpc-provider
    #   port: 9090
    #   targetPort: 9090

Nginx IP Hash

upstream constellation {
    ip_hash;
    server constellation-1:8080;
    server constellation-2:8080;
}

Ingress Controller (nginx-ingress)

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: constellation
  namespace: constellation
  annotations:
    nginx.ingress.kubernetes.io/affinity: "cookie"
    nginx.ingress.kubernetes.io/session-cookie-name: "CONSTELLATION_AFFINITY"
    nginx.ingress.kubernetes.io/session-cookie-max-age: "10800"
spec:
  rules:
    - host: constellation.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: constellation-engine
                port:
                  number: 8080

WebSocket Load Balancing

The LSP endpoint (/lsp) uses WebSocket. Ensure your load balancer:

Supports WebSocket upgrades — Most modern load balancers do
Has appropriate timeouts — WebSocket connections are long-lived
Uses session affinity — WebSocket state is per-connection

Nginx WebSocket Configuration

upstream constellation {
    ip_hash;  # Required for WebSocket affinity
    server constellation-1:8080;
    server constellation-2:8080;
}

server {
    listen 80;

    location / {
        proxy_pass http://constellation;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
        proxy_read_timeout 3600s;  # 1 hour for long-lived connections
    }
}

AWS ALB

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: constellation
  annotations:
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/target-group-attributes: stickiness.enabled=true,stickiness.lb_cookie.duration_seconds=10800
spec:
  ingressClassName: alb
  rules:
    - host: constellation.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: constellation-engine
                port:
                  number: 8080

Health Check Configuration for Load Balancers

Configure your load balancer to use the appropriate health endpoints:

Endpoint	Use For	Expected Response
`/health/live`	Liveness — is the process running?	200 always (unless process crashed)
`/health/ready`	Readiness — should it receive traffic?	200 when ready, 503 when draining

upstream constellation {
    server constellation-1:8080;
    server constellation-2:8080;

    # Health check (nginx plus / commercial)
    health_check uri=/health/ready interval=5s fails=2 passes=1;
}

For open-source nginx, use a sidecar or external health checker.

Autoscaling

Kubernetes Horizontal Pod Autoscaler

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: constellation-hpa
  namespace: constellation
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: constellation-engine
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

Custom Metrics Scaling

For more sophisticated scaling based on scheduler queue depth or request latency, expose Prometheus metrics and use the Prometheus Adapter:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: constellation-hpa-custom
  namespace: constellation
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: constellation-engine
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Pods
      pods:
        metric:
          name: constellation_scheduler_queued_count
        target:
          type: AverageValue
          averageValue: "50"  # Scale up when avg queue > 50

Next Steps

Configuration — Environment variables, auth, CORS, and rate limiting
Graceful Shutdown — Drain behavior and Kubernetes integration
Clustering — Distributed deployment with shared state
JSON Logging — Structured logging for log aggregation
Runbook — Operational procedures and troubleshooting

Docker​

Build the Image​

Run with Docker​

Docker Compose​

Kubernetes​

Apply Manifests​

What's Included​

Health Probes​

Resource Limits​

Scaling​

Ingress / Load Balancer​

Graceful Shutdown​

Hardened Production Example​

Monitoring​

Metrics Endpoint​

Health Endpoints​

Key Metrics to Monitor​

Log Output​

Multi-Instance Deployments​

Running Multiple Instances​

Docker Compose​

Kubernetes​

Shared State Considerations​

Load Balancing Strategies​

Round-Robin (Default)​

Least Connections​

Weighted Distribution​

Session Affinity Requirements​

Enabling Session Affinity in Kubernetes​

Nginx IP Hash​

Ingress Controller (nginx-ingress)​

WebSocket Load Balancing​

Nginx WebSocket Configuration​

AWS ALB​

Health Check Configuration for Load Balancers​

Autoscaling​

Kubernetes Horizontal Pod Autoscaler​

Custom Metrics Scaling​

Next Steps​

Docker

Build the Image

Run with Docker

Docker Compose

Kubernetes

Apply Manifests

What's Included

Health Probes

Resource Limits

Scaling

Ingress / Load Balancer

Graceful Shutdown

Hardened Production Example

Monitoring

Metrics Endpoint

Health Endpoints

Key Metrics to Monitor

Log Output

Multi-Instance Deployments

Running Multiple Instances

Docker Compose

Kubernetes

Shared State Considerations

Load Balancing Strategies

Round-Robin (Default)

Least Connections

Weighted Distribution

Session Affinity Requirements

Enabling Session Affinity in Kubernetes

Nginx IP Hash

Ingress Controller (nginx-ingress)

WebSocket Load Balancing

Nginx WebSocket Configuration

AWS ALB

Health Check Configuration for Load Balancers

Autoscaling

Kubernetes Horizontal Pod Autoscaler

Custom Metrics Scaling

Next Steps