Semantic & Ontology Architecture
Last updated: March 17, 2026 18:00 UTC
How the system understands — the Comprehension Engine
Vitruvyan's cognitive pipeline must answer two fundamental questions about every user query:
- What is this about? — Ontology (structure, entities, intent, domain classification)
- How is it said? — Semantics (sentiment, emotion, linguistic register, irony)
These two dimensions are architecturally separated but computationally unified.
The design principle
Separation where it matters:
- Contracts → OntologyPayload ≠ SemanticPayload (different schemas)
- Ownership → Pattern Weavers owns ontology, Babel Gardens owns semantics
- Evolution → Each schema evolves independently per-domain
- Fusion → Only semantics participates in signal fusion (L2/L3)
Unification where it matters:
- Processing → Single LLM call produces both sections
- Context → Shared context (knowing "AAPL" is a ticker affects sentiment analysis)
- Latency → One call instead of two = ~50% less latency and cost
- Output → ComprehensionResult = single container, two distinct sectionsThis is not a compromise — it's the recognition that ontology and semantics are different lenses on the same text. Separating the lenses (contracts) while unifying the observation (LLM call) gives us the best of both worlds.
Architecture overview
The three layers
The Comprehension Engine operates on a three-layer architecture where each layer has a distinct responsibility:
Layer 1 — LLM Comprehension (core, domain-agnostic)
A single LLM call produces the full ComprehensionResult. The prompt is assembled from two independent sections provided by the active domain plugin:
| Section | Owner | Produces | Example |
|---|---|---|---|
| Ontology prompt | Pattern Weavers | OntologyPayload — gate, entities, intent, topics | Entity types, intent vocabulary, gate rules |
| Semantics prompt | Babel Gardens | SemanticPayload — sentiment, emotion, linguistic | Sentiment labels, emotion taxonomy, register detection |
Why a single call? Because context flows between sections. Knowing that "Apple" is a ticker (ontology) informs that "crashed" means a stock decline, not a physical impact (semantics). Two separate calls would lose this cross-domain awareness.
Layer 2 — Domain-Specific Models (vertical responsibility)
Specialized models add domain-calibrated signals that complement the LLM's general comprehension. These are external to the core — each vertical provides its own:
| Vertical | Model | Signals | Interface |
|---|---|---|---|
| Finance | FinBERT | sentiment_valence, market_fear_index, volatility_perception | ISignalContributor |
| Security | SecBERT | (to be defined) | ISignalContributor |
| Healthcare | BioBERT | (to be defined) | ISignalContributor |
Layer 2 models are registered via SignalContributorRegistry and produce SignalEvidence objects — typed, confidence-scored, with full extraction traces.
Layer 3 — Signal Fusion (core, domain-configurable)
All signals from Layer 1 (LLM) and Layer 2 (domain models) are fused into a unified assessment:
| Strategy | When to use | How it works |
|---|---|---|
| Weighted | Default; stable, interpretable | Confidence-weighted average with per-source weight overrides |
| Bayesian | High-conflicting signals | Log-odds posterior update; high-confidence sources dominate |
| LLM Arbitrated | Complex disagreements | LLM resolves conflicting signals with reasoning |
Fusion weights are domain-configurable. Finance example: LLM: 0.45, FinBERT: 0.35, multilingual: 0.20.
Why two Sacred Orders, not one
A natural question: if ontology and semantics are produced together, why keep Pattern Weavers and Babel Gardens as separate Sacred Orders?
Different mandates
| Pattern Weavers | Babel Gardens | |
|---|---|---|
| Question | "What is this about?" | "How is it expressed?" |
| Epistemic layer | Reason | Perception |
| Output | Structure (entities, types, intent) | Signals (sentiment, emotion, register) |
| Downstream | Entity resolution, intent routing | Signal fusion, risk scoring |
| Fusion | Does not participate | Core participant (L2/L3) |
| Evolution | New entity types, new intents | New emotions, new signal models |
Independent evolution per domain
A security vertical needs different entity types (CVE, IP address, malware family) but the same emotion taxonomy. A healthcare vertical needs different sentiment calibration (clinical vs. emotional) but the same entity resolution pattern. Separating the contracts lets domains evolve each dimension independently.
The "lenses" metaphor
Ontology is a structural lens: it sees categories, types, relationships. Semantics is an affective lens: it sees tone, intensity, intent behind the words.
Both lenses observe the same text simultaneously (single LLM call), but they produce fundamentally different kinds of knowledge. Mixing them into one schema would conflate structure with affect — a category error in the epistemic sense.
Contract architecture
OntologyPayload (Pattern Weavers)
Defined in contracts/pattern_weavers.py. Strict schema (extra="forbid").
OntologyPayload:
gate: DomainGate # verdict (in_domain/out_of_domain/ambiguous), domain, confidence, reasoning
entities: [OntologyEntity] # raw, canonical, entity_type, confidence, relations
intent_hint: str # domain-specific intent
topics: [str] # topic tags
sentiment_hint: str # coarse sentiment (positive/negative/neutral/mixed)
temporal_context: str # real_time/historical/forward_looking
language: str # ISO 639-1
complexity: str # simple/compound/multi_intentSemanticPayload (Babel Gardens)
Defined in contracts/comprehension.py. Strict schema (extra="forbid").
SemanticPayload:
sentiment: SentimentPayload # label, score, confidence, magnitude, aspects, reasoning
emotion: EmotionPayload # primary, secondary, intensity, confidence, cultural_context, reasoning
linguistic: LinguisticPayload # text_register, irony_detected, ambiguity_score, code_switchingComprehensionResult (unified container)
ComprehensionResult:
ontology: OntologyPayload # ← Pattern Weavers schema
semantics: SemanticPayload # ← Babel Gardens schema
raw_query: str
language: str
comprehension_metadata: dict # timing, model, domain plugin usedThe two payloads live side by side but never merge. Each can be consumed independently by downstream components.
Plugin system
IComprehensionPlugin (domain-wide)
Each domain registers one plugin that shapes both the ontology and semantics sections of the prompt:
class IComprehensionPlugin(ABC):
def get_domain_name(self) -> str: ...
def get_ontology_prompt_section(self) -> str: ... # → shapes OntologyPayload
def get_semantics_prompt_section(self) -> str: ... # → shapes SemanticPayload
def get_entity_types(self) -> List[str]: ...
def get_gate_keywords(self) -> List[str]: ...
def get_signal_schemas(self) -> Dict[str, Dict]: ... # → Layer 2 signal definitions
def validate_result(self, result) -> ComprehensionResult: ...Built-in: GenericComprehensionPlugin (domain-agnostic, always available).
Finance: FinanceComprehensionPlugin (11 entity types, multilingual keywords, ticker normalization, FinBERT signal schemas).
ISignalContributor (Layer 2 models)
Domain-specific models implement this interface to contribute signals to the fusion pipeline:
class ISignalContributor(ABC):
def get_contributor_name(self) -> str: ...
def get_signal_names(self) -> List[str]: ...
def contribute(self, text: str, context: dict) -> List[SignalEvidence]: ...
def is_available(self) -> bool: ...Contributors are lazy-loaded and availability-checked at runtime. If FinBERT's transformers is not installed, the contributor gracefully degrades — the system runs on LLM-only signals.
Feature flags
| Flag | Default | Effect |
|---|---|---|
BABEL_COMPREHENSION_V3 | 0 | Enables /v2/comprehend + /v2/fuse endpoints |
PATTERN_WEAVERS_V3 | 0 | Enables /compile endpoint (ontology-only, pre-Comprehension) |
BABEL_DOMAIN | generic | Which domain plugins to auto-register at startup |
When BABEL_COMPREHENSION_V3=1, the Comprehension Engine supersedes both PATTERN_WEAVERS_V3 and the separate emotion detection endpoint. The graph node (comprehension_node) replaces both pattern_weavers_node and emotion_detector_node with full backward compatibility.
Code map
| Layer | Component | Location |
|---|---|---|
| Contracts | ComprehensionResult, IComprehensionPlugin, ISignalContributor | contracts/comprehension.py |
| Contracts | OntologyPayload, ISemanticPlugin | contracts/pattern_weavers.py |
| LIVELLO 1 | ComprehensionConsumer (JSON→result parser) | core/cognitive/babel_gardens/consumers/comprehension_consumer.py |
| LIVELLO 1 | SignalFusionConsumer (weighted/bayesian) | core/cognitive/babel_gardens/consumers/signal_fusion_consumer.py |
| LIVELLO 1 | ComprehensionPluginRegistry + SignalContributorRegistry | core/cognitive/babel_gardens/governance/signal_registry.py |
| LIVELLO 2 | ComprehensionAdapter (LLM orchestration) | services/api_babel_gardens/adapters/comprehension_adapter.py |
| LIVELLO 2 | SignalFusionAdapter (fusion + LLM arbitration) | services/api_babel_gardens/adapters/signal_fusion_adapter.py |
| LIVELLO 2 | /v2/comprehend, /v2/fuse routes | services/api_babel_gardens/api/routes_comprehension.py |
| Graph | comprehension_node (replaces PW + emotion nodes) | core/orchestration/langgraph/node/comprehension_node.py |
| Finance | FinanceComprehensionPlugin | domains/finance/babel_gardens/finance_comprehension_plugin.py |
| Finance | FinBERTContributor | services/api_babel_gardens/plugins/finbert_contributor.py |
Tests
| Suite | Count | What it covers |
|---|---|---|
| Core comprehension | 49 | Contracts, consumers, registries, cross-domain scenarios |
| Finance comprehension | 29 | Finance plugin, FinBERT contributor, fusion with finance signals |
| PW v3 (ontology-only) | 25 | Pre-Comprehension ontology compilation |
| Finance PW v3 | 12 | Finance semantic plugin |
| Total | 115 |
Evolution path
The Comprehension Engine is designed to grow with the system:
- New domains add plugins (
IComprehensionPlugin) — no core changes needed - New models register as contributors (
ISignalContributor) — plug-and-play - New signals extend
SemanticPayloadorget_signal_schemas()return — backward compatible - New fusion strategies extend
FusionStrategyenum — consumer handles dispatch
The architecture guarantees that adding a security domain (SecBERT, CVE entities, threat emotions) requires zero changes to the core comprehension engine — only new domain files.
Entity Relations — Ontological Structure (v3.1, March 2026)
The relational layer
v3.1 adds a structural cognition layer to Pattern Weavers: typed, directed relationships between entities. This extends the OntologyPayload contract with EntityRelation edges, creating the first formal relational primitive in the system.
Why not GraphRAG?
Vitruvyan's architecture is fundamentally multi-turn epistemic, not document-graph. The relational layer achieves structural cognition through:
- Inline relations in
OntologyEntity(LLM-extracted, per-query) - Persisted relations in PostgreSQL
entity_relationstable (accumulated from Codex ingestion) - Relation-aware enrichment via
EntityResolverRegistry(pre-loaded at query time)
This provides 90%+ of GraphRAG's value (entity disambiguation, cross-source linking, structural context) without the operational complexity of a dedicated graph database.
Relation vocabulary (CORE_RELATION_TYPES)
| Type | Direction | Use case |
|---|---|---|
owns | directed | Ownership, subsidiary, parent company |
part_of | directed | Organizational or conceptual membership |
competes_with | bidirectional | Direct competitor or alternative |
depends_on | directed | Supply chain, operational, logical dependency |
affects | directed | Causal or consequential influence |
represents | directed | Cross-source identity link (Odoo/Zendesk/HubSpot record → canonical entity) |
related_to | weak | General association (low specificity fallback) |
Domain plugins MAY extend this vocabulary with domain-specific types.
Two provenance paths
Online (LLM comprehension): Every query where the LLM detects entity relationships produces EntityRelation objects with source="llm". These flow through the existing comprehension pipeline and enrich the CAN node's narrative context.
Offline (Codex ingestion): When Oculus Prime ingests data from structured sources (APIs with FK relationships — e.g., Odoo invoices referencing customer records), Codex Hunters' normalizers can populate BoundEntity.relations with source="codex". These are persisted to entity_relations and available for subsequent query enrichment.
Ingestion pipeline (Oculus → Codex → Relations)
External Source (Zendesk, Odoo, HubSpot)
↓ API call
Oculus Prime (Perception) → Evidence Pack (immutable, no interpretation)
↓ oculus_prime.evidence.created
Codex Hunters Track → Restore (normalizer extracts FK relations) → Bind
↓ BoundEntity.relations populated
BusAdapter.process_bind()
↓ stores entity data + calls store_entity_relations()
PostgreSQL: codex_entities + entity_relations tablesQuery enrichment pipeline
User query → Comprehension Engine → OntologyPayload with inline relations
↓
entity_resolver_node → EntityResolverRegistry.resolve()
↓ _relation_enricher callback queries entity_relations
state["entity_known_relations"] populated
↓
CAN node renders both inline + persisted relations in narrative contextKey constraints
- Grounding, not reasoning: relations record facts, not inferences
- Oculus boundary: Oculus (Perception) preserves source metadata but never interprets it — relation extraction happens in Codex (Knowledge)
- No graph traversal (Phase 1): flat relational storage with indexed lookups
- Anti-hallucination: closed vocabulary, confidence >= 0.7,
extra="forbid"on all Pydantic models - Phase 2 (future): recursive CTE traversal, relationship scoring, ontology constraint enforcement