Skip to main content

Key Concepts

Goal: Understand the 30 essential terms used throughout Constellation Engine.

Core Architecture

Constellation

The main orchestration engine that manages modules and execution.

What it does:

  • Stores registered modules
  • Provides module lookup
  • Manages execution state

Example:

val constellation = Constellation.create[IO]

See also: Embedded API


Module

A reusable processing unit with typed inputs and outputs.

What it is:

  • Defined in Scala using ModuleBuilder
  • Has a unique name (case-sensitive)
  • Specifies input/output types
  • Contains implementation logic

Example:

val uppercase = ModuleBuilder
.metadata("Uppercase", "Converts text to uppercase", 1, 0)
.implementationPure[TextInput, TextOutput] { input =>
TextOutput(input.text.toUpperCase)
}
.build

Key properties:

  • Name: "Uppercase" (must match usage in .cst files)
  • Version: 1.0 (major.minor)
  • Type signature: TextInput => TextOutput
  • Implementation: Pure function or IO

See also: Module Development


Pipeline

A DAG of module invocations defined in constellation-lang (.cst files).

What it contains:

  • Input declarations: in x: String
  • Module calls: result = Uppercase(x)
  • Output declarations: out result

Example:

in text: String
trimmed = Trim(text)
result = Uppercase(trimmed)
out result

Key properties:

  • Must be a valid DAG (no cycles)
  • All variables must be defined before use
  • All outputs must reference defined variables

See also: Pipeline Lifecycle


DAG (Directed Acyclic Graph)

The execution plan compiled from a pipeline.

What it represents:

  • Nodes = module invocations
  • Edges = data dependencies
  • Layers = parallel execution groups

Example visualization:

Layer 0: [Trim(text)]

Layer 1: [Uppercase(trimmed)]

Key properties:

  • No cycles allowed
  • Topologically sorted into layers
  • Nodes in same layer execute in parallel

See also: DAG Execution


Type System

CType

Type representation at compile time.

Hierarchy:

CType
├─ CPrimitive
│ ├─ CString
│ ├─ CInt
│ ├─ CDouble
│ └─ CBoolean
├─ CRecord(Map[String, CType])
├─ CUnion(Set[CType])
├─ CList(CType)
└─ COptional(CType)

Example:

val stringType: CType = CString
val recordType: CType = CRecord(Map("name" -> CString, "age" -> CInt))

See also: Type System, Type Syntax


CValue

Value representation at runtime.

Hierarchy:

CValue
├─ CPrimitive
│ ├─ CString("text")
│ ├─ CInt(42)
│ ├─ CDouble(3.14)
│ └─ CBoolean(true)
├─ CRecord(Map[String, CValue])
├─ CUnion(CValue, CType)
├─ CList(List[CValue])
├─ COptional(Option[CValue])
└─ CNone

Example:

val stringValue: CValue = CString("hello")
val recordValue: CValue = CRecord(Map("name" -> CString("Alice"), "age" -> CInt(30)))

See also: Type System


Type Compatibility

Rules for when one type can be used where another is expected.

Key rules:

  1. Exact match: CString matches CString
  2. Subtyping: CNone matches COptional(T) for any T
  3. Record subtyping: {a: String, b: Int} matches {a: String} (extra fields OK)
  4. Union subtyping: String matches String | Int

Example:

in x: String | Int     # Union type
result = Process(x) # OK if Process accepts String | Int

See also: Type System


Semantic Type

Type information during compilation (before final CType).

What it tracks:

  • Type constraints from usage
  • Type inference state
  • Error locations

Key semantic types:

  • ConcreteType(CType) - Fully resolved
  • UnionType(Set[SemanticType]) - Union of types
  • OptionalType(SemanticType) - Optional wrapper

See also: Pipeline Lifecycle


Compilation Stages

Parser

Converts .cst text to AST.

Input: String source code Output: AST (Abstract Syntax Tree)

Example:

in x: Int
result = Double(x)
out result

AST(
inputs = Map("x" -> CInt),
calls = List(Call("result", "Double", Map("x" -> Var("x")))),
outputs = Set("result")
)

See also: Pipeline Lifecycle


Type Checker

Validates types and resolves inference.

Input: AST Output: Typed IR (Intermediate Representation)

What it checks:

  • All variables are defined
  • Types are compatible
  • Module signatures match calls
  • No type errors

Example error:

Error: Type mismatch at line 3
Expected: CString
Got: CInt

See also: Error Handling


DAG Compiler

Converts typed IR to executable DAG.

Input: Typed IR Output: Execution DAG

What it does:

  • Builds dependency graph
  • Detects cycles
  • Sorts into execution layers
  • Optimizes for parallelism

See also: DAG Execution


IR (Intermediate Representation)

Typed AST before final compilation.

What it contains:

  • Type-checked expressions
  • Resolved module references
  • Variable bindings with types

Example:

IR(
inputs = Map("x" -> CInt),
bindings = Map("result" -> ModuleCall("Double", CInt -> CInt, Map("x" -> Var("x", CInt)))),
outputs = Set("result")
)

See also: Pipeline Lifecycle


Execution

Hot Execution

Precompiled DAG with fast execution.

Characteristics:

  • DAG compiled once
  • Reused for multiple inputs
  • Minimal startup latency
  • Used by HTTP API

Example:

// Compile once
val dag = compiler.compile(source)

// Execute many times
val result1 = dag.execute(inputs1)
val result2 = dag.execute(inputs2)

See also: Execution Modes


Cold Execution

Compile and execute on-demand.

Characteristics:

  • Compilation per execution
  • More flexible (can change source)
  • Higher latency
  • Used in development/testing

Example:

// Compile and execute together
val result = compiler.compileAndExecute(source, inputs)

See also: Execution Modes


Layer

Group of modules that can execute in parallel.

What it represents:

  • All nodes with same topological distance from inputs
  • No dependencies between nodes in same layer
  • Executes before next layer

Example:

# Layer 0 (both parallel)
a = ProcessA(input)
b = ProcessB(input)

# Layer 1 (waits for layer 0)
result = Merge(a, b)

See also: DAG Execution


Execution Context

Runtime state during pipeline execution.

What it contains:

  • Input values
  • Intermediate results
  • Module instances
  • Error state

See also: DAG Execution


Resilience

Retry

Automatic retry on failure.

Syntax:

result = UnreliableAPI(input) with {
retry: 3
}

Behavior:

  • Retries up to N times
  • Exponential backoff (configurable)
  • Fails if all retries exhausted

See also: Resilience Patterns, Module Options


Timeout

Maximum execution time.

Syntax:

result = SlowAPI(input) with {
timeout: 10s
}

Behavior:

  • Cancels execution after timeout
  • Raises timeout error
  • Can combine with retry

See also: Resilience Patterns


Cache

Store and reuse results.

Syntax:

result = ExpensiveComputation(input) with {
cache: 1h
}

Behavior:

  • Caches result by input hash
  • Returns cached value if available
  • Expires after TTL

See also: Resilience Patterns


Fallback

Alternative value on failure.

Syntax:

result = UnreliableAPI(input) with {
fallback: DefaultValue(input)
}

Behavior:

  • Executes fallback on error
  • Fallback must return compatible type
  • Can combine with retry

See also: Resilience Patterns


Cross-Process Modules

Module Provider

An external service that contributes pipeline modules to Constellation via gRPC.

What it does:

  • Runs in a separate process (can be any language)
  • Registers modules with a Constellation server
  • Receives ExecuteRequest RPCs when pipelines call its modules
  • Maintains a heartbeat-based control plane

Example:

import io.constellation.provider.sdk._

val provider = ConstellationProvider.create(
namespace = "ml",
instances = List("localhost:9090"),
config = SdkConfig(),
transportFactory = addr => new GrpcProviderTransport(channel),
executorServerFactory = new GrpcExecutorServerFactory(),
serializer = JsonCValueSerializer
)
provider.register(myModule)
provider.start.useForever

Key properties:

  • Decoupled from JVM (Python, Go, Rust modules possible)
  • Independent scaling and deployment
  • Higher latency than in-process modules (network round-trip)

See also: Module Provider


Provider Namespace

A dot-separated identifier (e.g., ml, data.transform) that groups modules from one provider.

Rules:

  • Each segment: starts with letter, alphanumeric + underscores only
  • Exclusively owned by one provider (or one provider group)
  • Cannot use reserved prefixes (e.g., stdlib)

Usage in constellation-lang:

result = ml.Analyze(text)
enriched = data.transform.Enrich(record)

See also: Module Provider


Provider Group

Multiple provider instances sharing a group ID to serve the same namespace with load balancing.

What it enables:

  • Horizontal scaling of external modules
  • Round-robin load balancing across group members
  • Resilient operation (remaining members continue if one disconnects)

Example:

val config = SdkConfig(groupId = Some("ml-pool"))

See also: Module Provider


API Concepts

HTTP API

REST interface for pipeline execution.

Key endpoints:

  • POST /execute - Execute pipeline
  • GET /health - Health check
  • GET /modules - List modules
  • POST /validate - Validate pipeline

Example:

curl -X POST http://localhost:8080/execute \
-H "Content-Type: application/json" \
-d '{"source": "...", "inputs": {...}}'

See also: HTTP API Reference


Embedded API

Programmatic usage in Scala applications.

Key components:

  • Constellation[F] - Module registry
  • DagCompiler[F] - Compilation
  • ExecutionDag[F] - Execution

Example:

for {
constellation <- Constellation.create[IO]
compiler <- DagCompiler.create[IO](constellation)
dag <- compiler.compile(source)
result <- dag.execute(inputs)
} yield result

See also: Embedded API


LSP (Language Server Protocol)

Editor integration for .cst files.

Features:

  • Syntax highlighting
  • Autocomplete
  • Error diagnostics
  • Hover documentation

Used by: VSCode extension

See also: Project Structure


Module Builder

ModuleBuilder

DSL for defining modules.

Methods:

  • .metadata(name, description, major, minor) - Basic info
  • .implementationPure[I, O](f) - Pure function
  • .implementation[I, O](f) - IO function
  • .withRetry(config) - Add retry
  • .withTimeout(duration) - Add timeout
  • .withCache(ttl) - Add caching
  • .build - Create module

Example:

val module = ModuleBuilder
.metadata("Process", "Processes data", 1, 0)
.implementationPure[Input, Output] { input =>
Output(process(input))
}
.withRetry(RetryConfig(maxAttempts = 3))
.build

See also: Module Development, Module Options


Implementation Types

Pure Implementation

No side effects, deterministic.

.implementationPure[Input, Output] { input =>
Output(compute(input))
}

Use when:

  • No IO needed
  • Deterministic result
  • No state changes

IO Implementation

Side effects allowed.

.implementation[Input, Output] { input =>
IO {
// Perform side effect
Output(result)
}
}

Use when:

  • Need to call external API
  • File I/O
  • Database access
  • Non-deterministic

See also: Module Development


constellation-lang Syntax

Input Declaration

Declare pipeline inputs.

Syntax:

in variableName: Type

Example:

in text: String
in count: Int
in user: {name: String, age: Int}

Rules:

  • Must appear before first module call
  • Variable names must be unique
  • Types must be valid CTypes

See also: Type Syntax


Module Call

Invoke a registered module.

Syntax:

variableName = ModuleName(arg1, arg2, ...) with { options }

Example:

result = Uppercase(text)
cached = ExpensiveAPI(input) with { cache: 1h }

Rules:

  • Module name must match registered module (case-sensitive)
  • Arguments must match module input type
  • Variable name must be unique

See also: Module Options


Output Declaration

Declare pipeline outputs.

Syntax:

out variableName

Example:

out result
out processedData

Rules:

  • Must reference a defined variable
  • Can have multiple outputs
  • Must appear after variable definition

With Clause

Attach resilience options to module call.

Syntax:

result = Module(input) with {
retry: 3,
timeout: 10s,
cache: 1h,
fallback: DefaultModule(input)
}

Available options:

  • retry: Int - Max retry attempts
  • timeout: Duration - Max execution time
  • cache: Duration - Cache TTL
  • fallback: ModuleCall - Fallback on error

See also: Resilience Patterns, Module Options


Testing

Test Fixtures

Predefined test data for benchmarks and tests.

Available:

  • Small program (5 lines, 2 modules)
  • Medium program (20 lines, 8 modules)
  • Large program (50 lines, 20 modules)

Location: modules/lang-compiler/src/test/scala/.../TestFixtures.scala

See also: Project Structure


Glossary Summary Table

TermCategoryOne-Line Definition
ConstellationCoreModule registry and orchestration engine
ModuleCoreReusable processing unit with typed I/O
PipelineCoreDAG of modules in .cst syntax
DAGCoreDirected acyclic graph execution plan
CTypeType SystemType at compile time
CValueType SystemValue at runtime
Type CompatibilityType SystemRules for type matching
Semantic TypeType SystemType during compilation
ParserCompilationText → AST conversion
Type CheckerCompilationAST → Typed IR validation
DAG CompilerCompilationTyped IR → Execution DAG
IRCompilationIntermediate representation
Hot ExecutionExecutionPrecompiled DAG, fast execution
Cold ExecutionExecutionOn-demand compilation
LayerExecutionParallel execution group
Execution ContextExecutionRuntime state during execution
RetryResilienceAutomatic retry on failure
TimeoutResilienceMaximum execution time
CacheResilienceStore and reuse results
FallbackResilienceAlternative on failure
Module ProviderCross-ProcessExternal service contributing modules via gRPC
Provider NamespaceCross-ProcessDot-separated module group owned by one provider
Provider GroupCross-ProcessMultiple providers sharing a namespace with load balancing
HTTP APIAPIREST interface for execution
Embedded APIAPIProgrammatic Scala usage
LSPAPILanguage server for editors
ModuleBuilderModule BuilderDSL for defining modules
Pure ImplementationModule BuilderNo side effects
IO ImplementationModule BuilderSide effects allowed
Input DeclarationSyntaxin x: Type
Module CallSyntaxvar = Module(args)
Output DeclarationSyntaxout var
With ClauseSyntaxModule options

Next Steps

Now that you know the terminology:

  1. Type System - Deep dive into CType/CValue
  2. Module Development - Create your first module
  3. Pipeline Lifecycle - How compilation works

Back to: Getting Started | Up to: LLM Guide Index