Skip to main content

Architecture

This document explains how Constellation Engine works at a high level, helping you understand what happens when you run your pipelines.

Overview

Constellation Engine processes your pipelines through four stages:

Source (.cst) → Parse → Type Check → Compile to DAG → Execute

Each stage catches different errors, so problems are found early before execution.

How Your Pipeline Runs

1. Parsing

Your .cst source code is parsed into an abstract syntax tree (AST). Syntax errors are caught here - things like missing parentheses, invalid characters, or malformed statements.

Example syntax error:

Error at line 3, column 15: Expected ')' but found end of line

2. Type Checking

The compiler validates all field accesses and type operations. Type errors appear here - mismatched types, undefined variables, or invalid field access.

Example type error:

Error at line 5: Cannot merge String with Int

3. DAG Compilation

Your pipeline is converted to a directed acyclic graph (DAG) where each node is a module call. The compiler analyzes dependencies and identifies which operations can run in parallel.

          ┌─────────┐
│ Input A │
└────┬────┘

┌────▼────┐
│ Module1 │
└────┬────┘

┌──────────┼──────────┐
│ │
┌───▼───┐ ┌─────▼─────┐
│Module2│ │ Module3 │ ← These run in parallel
└───┬───┘ └─────┬─────┘
│ │
└──────────┬──────────┘

┌────▼────┐
│ Output │
└─────────┘

4. Execution

The runtime executes your DAG on Cats Effect fibers. Independent branches run in parallel automatically - you don't need to manage concurrency.

Key Concepts

ConceptWhat It Means
DAGDirected graph of module calls, enabling automatic parallelism
FiberLightweight thread for concurrent execution (much cheaper than OS threads)
Hot reloadChange .cst files without restarting the server
Content addressingCompiled DAGs are cached by source hash for instant reuse
Synthetic modulesLanguage constructs (merge, project, conditional) become real DAG nodes

Pipeline Stages at a Glance

StageInputOutputErrors Caught
ParseSource codeASTSyntax errors
Type CheckASTTyped ASTType mismatches, undefined variables
IR GenerationTyped ASTIntermediate representationDependency analysis
DAG CompileIRExecutable DAGInvalid module references
ExecuteDAG + inputsResultsRuntime failures

Extension Points

Custom Modules

You can add your own processing modules using the ModuleBuilder API:

val myModule = ModuleBuilder
.metadata("MyModule", "Description", 1, 0)
.implementationPure[MyInput, MyOutput] { input =>
MyOutput(transform(input))
}
.build

Modules are automatically available in your .cst pipelines once registered.

Cross-Process Modules (Module Provider Protocol)

External services can contribute pipeline modules via gRPC using the Module Provider Protocol. This enables modules written in any language, running in separate processes, with independent scaling.

Provider (Python, Go, etc.) ──gRPC──> ModuleProviderManager ──> ExternalModule (in pipeline)

The server validates schemas, manages connection lifecycle via heartbeats, and load-balances across provider groups. See Module Provider Integration.

Backend Integrations

Constellation Engine supports pluggable backends for observability and resilience:

BackendPurpose
MetricsProviderExport metrics to Prometheus, StatsD, etc.
TracerProviderDistributed tracing with OpenTelemetry
ExecutionListenerEvent streaming to Kafka, webhooks, etc.
CacheBackendCache compiled pipelines in Redis, Memcached
CircuitBreakerRegistryPer-module circuit breakers for resilience
ModuleProviderManagerCross-process module registration via gRPC

All backends default to no-op implementations with zero overhead.

Error Messages

Constellation Engine provides detailed error messages with source positions:

TypeError at line 12, column 5:
Cannot access field 'email' on type Record(name: String, age: Int)
Available fields: name, age

Execution Features

Cancellation and Timeouts

Pipelines can be cancelled or timed out:

constellation.run(pipeline, inputs)
.timeout(30.seconds)

Graceful Shutdown

The lifecycle manager ensures in-flight executions complete before shutdown:

Running → Draining → Stopped

Circuit Breakers

Per-module circuit breakers prevent cascading failures when external services are down.

For Contributors

For implementation details including code structure and internal APIs, see the LLM documentation in the repository's docs/ directory:

  • Compiler Internals (docs/components/compiler/) - Parser, type checker, IR generation, DAG compilation
  • Runtime Execution (docs/components/runtime/) - Module system, parallel execution, data flow
  • Type System (docs/components/core/) - CType, CValue, type algebra
  • HTTP API (docs/components/http-api/) - Server configuration, middleware, routes
  • SPI Integration - See Integrations for implementing custom backends

The source code is organized by module:

ModulePathPurpose
coremodules/core/Type system and specifications
runtimemodules/runtime/Module execution and DAG runtime
lang-parsermodules/lang-parser/constellation-lang parser
lang-compilermodules/lang-compiler/Type checking and DAG compilation
module-provider-sdkmodules/module-provider-sdk/Client library for cross-process providers
module-providermodules/module-provider/Server-side gRPC registration and lifecycle
http-apimodules/http-api/HTTP server and WebSocket LSP

Next Steps