Skip to main content

Text Analysis

A multi-step pipeline that cleans text and computes several metrics in parallel.

Use Case

You receive a document and need to normalize it, count words, measure length, and split into lines — all from a single input.

The Pipeline

# text-analysis.cst
# Analyzes text input and produces multiple metrics

@example("The quick brown fox jumps over the lazy dog.\nThis is a sample document for analysis.")
in document: String

# Clean the input
cleaned = Trim(document)
normalized = Lowercase(cleaned)

# Analyze the text
words = WordCount(normalized)
chars = TextLength(normalized)

# Split into lines for further processing
lines = SplitLines(cleaned)

out cleaned
out normalized
out words
out chars
out lines

Explanation

StepModulePurpose
1TrimRemove leading/trailing whitespace
2LowercaseNormalize to lowercase
3WordCountCount words in the normalized text
4TextLengthCount characters
5SplitLinesSplit into a list of lines

Steps 3, 4, and 5 are independent of each other (they each depend only on normalized or cleaned). The runtime executes them in parallel automatically.

Automatic Parallelization

Constellation analyzes dependencies and runs independent operations in parallel. You write sequential-looking code; the runtime optimizes execution.

Running the Example

Input

{
"document": "The quick brown fox jumps over the lazy dog.\nThis is a sample document for analysis."
}

Expected Output

{
"cleaned": "The quick brown fox jumps over the lazy dog.\nThis is a sample document for analysis.",
"normalized": "the quick brown fox jumps over the lazy dog.\nthis is a sample document for analysis.",
"words": 18,
"chars": 83,
"lines": ["The quick brown fox jumps over the lazy dog.", "This is a sample document for analysis."]
}

Variations

With uppercase comparison

in document: String

cleaned = Trim(document)
lower = Lowercase(cleaned)
upper = Uppercase(cleaned)
length = TextLength(cleaned)

out lower
out upper
out length
note

Outputting intermediate values like cleaned is useful for debugging. In production, remove unnecessary outputs to reduce response size.

Best Practices

  1. Clean first, analyze second — normalize input before computing metrics
  2. Fan out from a common base — multiple analyses from one cleaned input run in parallel
  3. Output intermediate results — expose cleaned alongside metrics for debugging