Key Concepts
Goal: Understand the 30 essential terms used throughout Constellation Engine.
Core Architecture
Constellation
The main orchestration engine that manages modules and execution.
What it does:
- Stores registered modules
- Provides module lookup
- Manages execution state
Example:
val constellation = Constellation.create[IO]
See also: Embedded API
Module
A reusable processing unit with typed inputs and outputs.
What it is:
- Defined in Scala using
ModuleBuilder - Has a unique name (case-sensitive)
- Specifies input/output types
- Contains implementation logic
Example:
val uppercase = ModuleBuilder
.metadata("Uppercase", "Converts text to uppercase", 1, 0)
.implementationPure[TextInput, TextOutput] { input =>
TextOutput(input.text.toUpperCase)
}
.build
Key properties:
- Name: "Uppercase" (must match usage in
.cstfiles) - Version:
1.0(major.minor) - Type signature:
TextInput => TextOutput - Implementation: Pure function or IO
See also: Module Development
Pipeline
A DAG of module invocations defined in constellation-lang (.cst files).
What it contains:
- Input declarations:
in x: String - Module calls:
result = Uppercase(x) - Output declarations:
out result
Example:
in text: String
trimmed = Trim(text)
result = Uppercase(trimmed)
out result
Key properties:
- Must be a valid DAG (no cycles)
- All variables must be defined before use
- All outputs must reference defined variables
See also: Pipeline Lifecycle
DAG (Directed Acyclic Graph)
The execution plan compiled from a pipeline.
What it represents:
- Nodes = module invocations
- Edges = data dependencies
- Layers = parallel execution groups
Example visualization:
Layer 0: [Trim(text)]
↓
Layer 1: [Uppercase(trimmed)]
Key properties:
- No cycles allowed
- Topologically sorted into layers
- Nodes in same layer execute in parallel
See also: DAG Execution
Type System
CType
Type representation at compile time.
Hierarchy:
CType
├─ CPrimitive
│ ├─ CString
│ ├─ CInt
│ ├─ CDouble
│ └─ CBoolean
├─ CRecord(Map[String, CType])
├─ CUnion(Set[CType])
├─ CList(CType)
└─ COptional(CType)
Example:
val stringType: CType = CString
val recordType: CType = CRecord(Map("name" -> CString, "age" -> CInt))
See also: Type System, Type Syntax
CValue
Value representation at runtime.
Hierarchy:
CValue
├─ CPrimitive
│ ├─ CString("text")
│ ├─ CInt(42)
│ ├─ CDouble(3.14)
│ └─ CBoolean(true)
├─ CRecord(Map[String, CValue])
├─ CUnion(CValue, CType)
├─ CList(List[CValue])
├─ COptional(Option[CValue])
└─ CNone
Example:
val stringValue: CValue = CString("hello")
val recordValue: CValue = CRecord(Map("name" -> CString("Alice"), "age" -> CInt(30)))
See also: Type System
Type Compatibility
Rules for when one type can be used where another is expected.
Key rules:
- Exact match:
CStringmatchesCString - Subtyping:
CNonematchesCOptional(T)for any T - Record subtyping:
{a: String, b: Int}matches{a: String}(extra fields OK) - Union subtyping:
StringmatchesString | Int
Example:
in x: String | Int # Union type
result = Process(x) # OK if Process accepts String | Int
See also: Type System
Semantic Type
Type information during compilation (before final CType).
What it tracks:
- Type constraints from usage
- Type inference state
- Error locations
Key semantic types:
ConcreteType(CType)- Fully resolvedUnionType(Set[SemanticType])- Union of typesOptionalType(SemanticType)- Optional wrapper
See also: Pipeline Lifecycle
Compilation Stages
Parser
Converts .cst text to AST.
Input: String source code Output: AST (Abstract Syntax Tree)
Example:
in x: Int
result = Double(x)
out result
↓
AST(
inputs = Map("x" -> CInt),
calls = List(Call("result", "Double", Map("x" -> Var("x")))),
outputs = Set("result")
)
See also: Pipeline Lifecycle
Type Checker
Validates types and resolves inference.
Input: AST Output: Typed IR (Intermediate Representation)
What it checks:
- All variables are defined
- Types are compatible
- Module signatures match calls
- No type errors
Example error:
Error: Type mismatch at line 3
Expected: CString
Got: CInt
See also: Error Handling
DAG Compiler
Converts typed IR to executable DAG.
Input: Typed IR Output: Execution DAG
What it does:
- Builds dependency graph
- Detects cycles
- Sorts into execution layers
- Optimizes for parallelism
See also: DAG Execution
IR (Intermediate Representation)
Typed AST before final compilation.
What it contains:
- Type-checked expressions
- Resolved module references
- Variable bindings with types
Example:
IR(
inputs = Map("x" -> CInt),
bindings = Map("result" -> ModuleCall("Double", CInt -> CInt, Map("x" -> Var("x", CInt)))),
outputs = Set("result")
)
See also: Pipeline Lifecycle
Execution
Hot Execution
Precompiled DAG with fast execution.
Characteristics:
- DAG compiled once
- Reused for multiple inputs
- Minimal startup latency
- Used by HTTP API
Example:
// Compile once
val dag = compiler.compile(source)
// Execute many times
val result1 = dag.execute(inputs1)
val result2 = dag.execute(inputs2)
See also: Execution Modes
Cold Execution
Compile and execute on-demand.
Characteristics:
- Compilation per execution
- More flexible (can change source)
- Higher latency
- Used in development/testing
Example:
// Compile and execute together
val result = compiler.compileAndExecute(source, inputs)
See also: Execution Modes
Layer
Group of modules that can execute in parallel.
What it represents:
- All nodes with same topological distance from inputs
- No dependencies between nodes in same layer
- Executes before next layer
Example:
# Layer 0 (both parallel)
a = ProcessA(input)
b = ProcessB(input)
# Layer 1 (waits for layer 0)
result = Merge(a, b)
See also: DAG Execution
Execution Context
Runtime state during pipeline execution.
What it contains:
- Input values
- Intermediate results
- Module instances
- Error state
See also: DAG Execution
Resilience
Retry
Automatic retry on failure.
Syntax:
result = UnreliableAPI(input) with {
retry: 3
}
Behavior:
- Retries up to N times
- Exponential backoff (configurable)
- Fails if all retries exhausted
See also: Resilience Patterns, Module Options
Timeout
Maximum execution time.
Syntax:
result = SlowAPI(input) with {
timeout: 10s
}
Behavior:
- Cancels execution after timeout
- Raises timeout error
- Can combine with retry
See also: Resilience Patterns
Cache
Store and reuse results.
Syntax:
result = ExpensiveComputation(input) with {
cache: 1h
}
Behavior:
- Caches result by input hash
- Returns cached value if available
- Expires after TTL
See also: Resilience Patterns
Fallback
Alternative value on failure.
Syntax:
result = UnreliableAPI(input) with {
fallback: DefaultValue(input)
}
Behavior:
- Executes fallback on error
- Fallback must return compatible type
- Can combine with retry
See also: Resilience Patterns
Cross-Process Modules
Module Provider
An external service that contributes pipeline modules to Constellation via gRPC.
What it does:
- Runs in a separate process (can be any language)
- Registers modules with a Constellation server
- Receives
ExecuteRequestRPCs when pipelines call its modules - Maintains a heartbeat-based control plane
Example:
import io.constellation.provider.sdk._
val provider = ConstellationProvider.create(
namespace = "ml",
instances = List("localhost:9090"),
config = SdkConfig(),
transportFactory = addr => new GrpcProviderTransport(channel),
executorServerFactory = new GrpcExecutorServerFactory(),
serializer = JsonCValueSerializer
)
provider.register(myModule)
provider.start.useForever
Key properties:
- Decoupled from JVM (Python, Go, Rust modules possible)
- Independent scaling and deployment
- Higher latency than in-process modules (network round-trip)
See also: Module Provider
Provider Namespace
A dot-separated identifier (e.g., ml, data.transform) that groups modules from one provider.
Rules:
- Each segment: starts with letter, alphanumeric + underscores only
- Exclusively owned by one provider (or one provider group)
- Cannot use reserved prefixes (e.g.,
stdlib)
Usage in constellation-lang:
result = ml.Analyze(text)
enriched = data.transform.Enrich(record)
See also: Module Provider
Provider Group
Multiple provider instances sharing a group ID to serve the same namespace with load balancing.
What it enables:
- Horizontal scaling of external modules
- Round-robin load balancing across group members
- Resilient operation (remaining members continue if one disconnects)
Example:
val config = SdkConfig(groupId = Some("ml-pool"))
See also: Module Provider
API Concepts
HTTP API
REST interface for pipeline execution.
Key endpoints:
POST /execute- Execute pipelineGET /health- Health checkGET /modules- List modulesPOST /validate- Validate pipeline
Example:
curl -X POST http://localhost:8080/execute \
-H "Content-Type: application/json" \
-d '{"source": "...", "inputs": {...}}'
See also: HTTP API Reference
Embedded API
Programmatic usage in Scala applications.
Key components:
Constellation[F]- Module registryDagCompiler[F]- CompilationExecutionDag[F]- Execution
Example:
for {
constellation <- Constellation.create[IO]
compiler <- DagCompiler.create[IO](constellation)
dag <- compiler.compile(source)
result <- dag.execute(inputs)
} yield result
See also: Embedded API
LSP (Language Server Protocol)
Editor integration for .cst files.
Features:
- Syntax highlighting
- Autocomplete
- Error diagnostics
- Hover documentation
Used by: VSCode extension
See also: Project Structure
Module Builder
ModuleBuilder
DSL for defining modules.
Methods:
.metadata(name, description, major, minor)- Basic info.implementationPure[I, O](f)- Pure function.implementation[I, O](f)- IO function.withRetry(config)- Add retry.withTimeout(duration)- Add timeout.withCache(ttl)- Add caching.build- Create module
Example:
val module = ModuleBuilder
.metadata("Process", "Processes data", 1, 0)
.implementationPure[Input, Output] { input =>
Output(process(input))
}
.withRetry(RetryConfig(maxAttempts = 3))
.build
See also: Module Development, Module Options
Implementation Types
Pure Implementation
No side effects, deterministic.
.implementationPure[Input, Output] { input =>
Output(compute(input))
}
Use when:
- No IO needed
- Deterministic result
- No state changes
IO Implementation
Side effects allowed.
.implementation[Input, Output] { input =>
IO {
// Perform side effect
Output(result)
}
}
Use when:
- Need to call external API
- File I/O
- Database access
- Non-deterministic
See also: Module Development
constellation-lang Syntax
Input Declaration
Declare pipeline inputs.
Syntax:
in variableName: Type
Example:
in text: String
in count: Int
in user: {name: String, age: Int}
Rules:
- Must appear before first module call
- Variable names must be unique
- Types must be valid CTypes
See also: Type Syntax
Module Call
Invoke a registered module.
Syntax:
variableName = ModuleName(arg1, arg2, ...) with { options }
Example:
result = Uppercase(text)
cached = ExpensiveAPI(input) with { cache: 1h }
Rules:
- Module name must match registered module (case-sensitive)
- Arguments must match module input type
- Variable name must be unique
See also: Module Options
Output Declaration
Declare pipeline outputs.
Syntax:
out variableName
Example:
out result
out processedData
Rules:
- Must reference a defined variable
- Can have multiple outputs
- Must appear after variable definition
With Clause
Attach resilience options to module call.
Syntax:
result = Module(input) with {
retry: 3,
timeout: 10s,
cache: 1h,
fallback: DefaultModule(input)
}
Available options:
retry: Int- Max retry attemptstimeout: Duration- Max execution timecache: Duration- Cache TTLfallback: ModuleCall- Fallback on error
See also: Resilience Patterns, Module Options
Testing
Test Fixtures
Predefined test data for benchmarks and tests.
Available:
- Small program (5 lines, 2 modules)
- Medium program (20 lines, 8 modules)
- Large program (50 lines, 20 modules)
Location: modules/lang-compiler/src/test/scala/.../TestFixtures.scala
See also: Project Structure
Glossary Summary Table
| Term | Category | One-Line Definition |
|---|---|---|
| Constellation | Core | Module registry and orchestration engine |
| Module | Core | Reusable processing unit with typed I/O |
| Pipeline | Core | DAG of modules in .cst syntax |
| DAG | Core | Directed acyclic graph execution plan |
| CType | Type System | Type at compile time |
| CValue | Type System | Value at runtime |
| Type Compatibility | Type System | Rules for type matching |
| Semantic Type | Type System | Type during compilation |
| Parser | Compilation | Text → AST conversion |
| Type Checker | Compilation | AST → Typed IR validation |
| DAG Compiler | Compilation | Typed IR → Execution DAG |
| IR | Compilation | Intermediate representation |
| Hot Execution | Execution | Precompiled DAG, fast execution |
| Cold Execution | Execution | On-demand compilation |
| Layer | Execution | Parallel execution group |
| Execution Context | Execution | Runtime state during execution |
| Retry | Resilience | Automatic retry on failure |
| Timeout | Resilience | Maximum execution time |
| Cache | Resilience | Store and reuse results |
| Fallback | Resilience | Alternative on failure |
| Module Provider | Cross-Process | External service contributing modules via gRPC |
| Provider Namespace | Cross-Process | Dot-separated module group owned by one provider |
| Provider Group | Cross-Process | Multiple providers sharing a namespace with load balancing |
| HTTP API | API | REST interface for execution |
| Embedded API | API | Programmatic Scala usage |
| LSP | API | Language server for editors |
| ModuleBuilder | Module Builder | DSL for defining modules |
| Pure Implementation | Module Builder | No side effects |
| IO Implementation | Module Builder | Side effects allowed |
| Input Declaration | Syntax | in x: Type |
| Module Call | Syntax | var = Module(args) |
| Output Declaration | Syntax | out var |
| With Clause | Syntax | Module options |
Next Steps
Now that you know the terminology:
- Type System - Deep dive into CType/CValue
- Module Development - Create your first module
- Pipeline Lifecycle - How compilation works
Back to: Getting Started | Up to: LLM Guide Index