Text Analysis

A multi-step pipeline that cleans text and computes several metrics in parallel.

Use Case

You receive a document and need to normalize it, count words, measure length, and split into lines — all from a single input.

The Pipeline

# text-analysis.cst
# Analyzes text input and produces multiple metrics

@example("The quick brown fox jumps over the lazy dog.\nThis is a sample document for analysis.")
in document: String

# Clean the input
cleaned = Trim(document)
normalized = Lowercase(cleaned)

# Analyze the text
words = WordCount(normalized)
chars = TextLength(normalized)

# Split into lines for further processing
lines = SplitLines(cleaned)

out cleaned
out normalized
out words
out chars
out lines

Explanation

Step	Module	Purpose
1	`Trim`	Remove leading/trailing whitespace
2	`Lowercase`	Normalize to lowercase
3	`WordCount`	Count words in the normalized text
4	`TextLength`	Count characters
5	`SplitLines`	Split into a list of lines

Steps 3, 4, and 5 are independent of each other (they each depend only on normalized or cleaned). The runtime executes them in parallel automatically.

Automatic Parallelization

Constellation analyzes dependencies and runs independent operations in parallel. You write sequential-looking code; the runtime optimizes execution.

Running the Example

Input

{
  "document": "The quick brown fox jumps over the lazy dog.\nThis is a sample document for analysis."
}

Expected Output

{
  "cleaned": "The quick brown fox jumps over the lazy dog.\nThis is a sample document for analysis.",
  "normalized": "the quick brown fox jumps over the lazy dog.\nthis is a sample document for analysis.",
  "words": 18,
  "chars": 83,
  "lines": ["The quick brown fox jumps over the lazy dog.", "This is a sample document for analysis."]
}

Variations

With uppercase comparison

in document: String

cleaned = Trim(document)
lower = Lowercase(cleaned)
upper = Uppercase(cleaned)
length = TextLength(cleaned)

out lower
out upper
out length

note

Outputting intermediate values like cleaned is useful for debugging. In production, remove unnecessary outputs to reduce response size.

Best Practices

Clean first, analyze second — normalize input before computing metrics
Fan out from a common base — multiple analyses from one cleaned input run in parallel
Output intermediate results — expose cleaned alongside metrics for debugging

Simple Transform — single transformation
Hello World — basic string operations
Data Pipeline — numeric data analysis

Use Case​

The Pipeline​

Explanation​

Running the Example​

Input​

Expected Output​

Variations​

With uppercase comparison​

Best Practices​

Related Examples​