Architecture
Semantic & Ontology

Semantic & Ontology Architecture

Last updated: March 17, 2026 18:00 UTC

How the system understands — the Comprehension Engine

Vitruvyan's cognitive pipeline must answer two fundamental questions about every user query:

  1. What is this about? — Ontology (structure, entities, intent, domain classification)
  2. How is it said? — Semantics (sentiment, emotion, linguistic register, irony)

These two dimensions are architecturally separated but computationally unified.


The design principle

Separation where it matters:
  - Contracts    →  OntologyPayload ≠ SemanticPayload  (different schemas)
  - Ownership    →  Pattern Weavers owns ontology, Babel Gardens owns semantics
  - Evolution    →  Each schema evolves independently per-domain
  - Fusion       →  Only semantics participates in signal fusion (L2/L3)

Unification where it matters:
  - Processing   →  Single LLM call produces both sections
  - Context      →  Shared context (knowing "AAPL" is a ticker affects sentiment analysis)
  - Latency      →  One call instead of two = ~50% less latency and cost
  - Output       →  ComprehensionResult = single container, two distinct sections

This is not a compromise — it's the recognition that ontology and semantics are different lenses on the same text. Separating the lenses (contracts) while unifying the observation (LLM call) gives us the best of both worlds.


Architecture overview


The three layers

The Comprehension Engine operates on a three-layer architecture where each layer has a distinct responsibility:

Layer 1 — LLM Comprehension (core, domain-agnostic)

A single LLM call produces the full ComprehensionResult. The prompt is assembled from two independent sections provided by the active domain plugin:

SectionOwnerProducesExample
Ontology promptPattern WeaversOntologyPayload — gate, entities, intent, topicsEntity types, intent vocabulary, gate rules
Semantics promptBabel GardensSemanticPayload — sentiment, emotion, linguisticSentiment labels, emotion taxonomy, register detection

Why a single call? Because context flows between sections. Knowing that "Apple" is a ticker (ontology) informs that "crashed" means a stock decline, not a physical impact (semantics). Two separate calls would lose this cross-domain awareness.

Layer 2 — Domain-Specific Models (vertical responsibility)

Specialized models add domain-calibrated signals that complement the LLM's general comprehension. These are external to the core — each vertical provides its own:

VerticalModelSignalsInterface
FinanceFinBERTsentiment_valence, market_fear_index, volatility_perceptionISignalContributor
SecuritySecBERT(to be defined)ISignalContributor
HealthcareBioBERT(to be defined)ISignalContributor

Layer 2 models are registered via SignalContributorRegistry and produce SignalEvidence objects — typed, confidence-scored, with full extraction traces.

Layer 3 — Signal Fusion (core, domain-configurable)

All signals from Layer 1 (LLM) and Layer 2 (domain models) are fused into a unified assessment:

StrategyWhen to useHow it works
WeightedDefault; stable, interpretableConfidence-weighted average with per-source weight overrides
BayesianHigh-conflicting signalsLog-odds posterior update; high-confidence sources dominate
LLM ArbitratedComplex disagreementsLLM resolves conflicting signals with reasoning

Fusion weights are domain-configurable. Finance example: LLM: 0.45, FinBERT: 0.35, multilingual: 0.20.


Why two Sacred Orders, not one

A natural question: if ontology and semantics are produced together, why keep Pattern Weavers and Babel Gardens as separate Sacred Orders?

Different mandates

Pattern WeaversBabel Gardens
Question"What is this about?""How is it expressed?"
Epistemic layerReasonPerception
OutputStructure (entities, types, intent)Signals (sentiment, emotion, register)
DownstreamEntity resolution, intent routingSignal fusion, risk scoring
FusionDoes not participateCore participant (L2/L3)
EvolutionNew entity types, new intentsNew emotions, new signal models

Independent evolution per domain

A security vertical needs different entity types (CVE, IP address, malware family) but the same emotion taxonomy. A healthcare vertical needs different sentiment calibration (clinical vs. emotional) but the same entity resolution pattern. Separating the contracts lets domains evolve each dimension independently.

The "lenses" metaphor

Ontology is a structural lens: it sees categories, types, relationships. Semantics is an affective lens: it sees tone, intensity, intent behind the words.

Both lenses observe the same text simultaneously (single LLM call), but they produce fundamentally different kinds of knowledge. Mixing them into one schema would conflate structure with affect — a category error in the epistemic sense.


Contract architecture

OntologyPayload (Pattern Weavers)

Defined in contracts/pattern_weavers.py. Strict schema (extra="forbid").

OntologyPayload:
  gate: DomainGate          # verdict (in_domain/out_of_domain/ambiguous), domain, confidence, reasoning
  entities: [OntologyEntity] # raw, canonical, entity_type, confidence, relations
  intent_hint: str           # domain-specific intent
  topics: [str]              # topic tags
  sentiment_hint: str        # coarse sentiment (positive/negative/neutral/mixed)
  temporal_context: str      # real_time/historical/forward_looking
  language: str              # ISO 639-1
  complexity: str            # simple/compound/multi_intent

SemanticPayload (Babel Gardens)

Defined in contracts/comprehension.py. Strict schema (extra="forbid").

SemanticPayload:
  sentiment: SentimentPayload   # label, score, confidence, magnitude, aspects, reasoning
  emotion: EmotionPayload       # primary, secondary, intensity, confidence, cultural_context, reasoning
  linguistic: LinguisticPayload # text_register, irony_detected, ambiguity_score, code_switching

ComprehensionResult (unified container)

ComprehensionResult:
  ontology: OntologyPayload       # ← Pattern Weavers schema
  semantics: SemanticPayload      # ← Babel Gardens schema
  raw_query: str
  language: str
  comprehension_metadata: dict    # timing, model, domain plugin used

The two payloads live side by side but never merge. Each can be consumed independently by downstream components.


Plugin system

IComprehensionPlugin (domain-wide)

Each domain registers one plugin that shapes both the ontology and semantics sections of the prompt:

class IComprehensionPlugin(ABC):
    def get_domain_name(self) -> str: ...
    def get_ontology_prompt_section(self) -> str: ...      # → shapes OntologyPayload
    def get_semantics_prompt_section(self) -> str: ...     # → shapes SemanticPayload
    def get_entity_types(self) -> List[str]: ...
    def get_gate_keywords(self) -> List[str]: ...
    def get_signal_schemas(self) -> Dict[str, Dict]: ...   # → Layer 2 signal definitions
    def validate_result(self, result) -> ComprehensionResult: ...

Built-in: GenericComprehensionPlugin (domain-agnostic, always available). Finance: FinanceComprehensionPlugin (11 entity types, multilingual keywords, ticker normalization, FinBERT signal schemas).

ISignalContributor (Layer 2 models)

Domain-specific models implement this interface to contribute signals to the fusion pipeline:

class ISignalContributor(ABC):
    def get_contributor_name(self) -> str: ...
    def get_signal_names(self) -> List[str]: ...
    def contribute(self, text: str, context: dict) -> List[SignalEvidence]: ...
    def is_available(self) -> bool: ...

Contributors are lazy-loaded and availability-checked at runtime. If FinBERT's transformers is not installed, the contributor gracefully degrades — the system runs on LLM-only signals.


Feature flags

FlagDefaultEffect
BABEL_COMPREHENSION_V30Enables /v2/comprehend + /v2/fuse endpoints
PATTERN_WEAVERS_V30Enables /compile endpoint (ontology-only, pre-Comprehension)
BABEL_DOMAINgenericWhich domain plugins to auto-register at startup

When BABEL_COMPREHENSION_V3=1, the Comprehension Engine supersedes both PATTERN_WEAVERS_V3 and the separate emotion detection endpoint. The graph node (comprehension_node) replaces both pattern_weavers_node and emotion_detector_node with full backward compatibility.


Code map

LayerComponentLocation
ContractsComprehensionResult, IComprehensionPlugin, ISignalContributorcontracts/comprehension.py
ContractsOntologyPayload, ISemanticPlugincontracts/pattern_weavers.py
LIVELLO 1ComprehensionConsumer (JSON→result parser)core/cognitive/babel_gardens/consumers/comprehension_consumer.py
LIVELLO 1SignalFusionConsumer (weighted/bayesian)core/cognitive/babel_gardens/consumers/signal_fusion_consumer.py
LIVELLO 1ComprehensionPluginRegistry + SignalContributorRegistrycore/cognitive/babel_gardens/governance/signal_registry.py
LIVELLO 2ComprehensionAdapter (LLM orchestration)services/api_babel_gardens/adapters/comprehension_adapter.py
LIVELLO 2SignalFusionAdapter (fusion + LLM arbitration)services/api_babel_gardens/adapters/signal_fusion_adapter.py
LIVELLO 2/v2/comprehend, /v2/fuse routesservices/api_babel_gardens/api/routes_comprehension.py
Graphcomprehension_node (replaces PW + emotion nodes)core/orchestration/langgraph/node/comprehension_node.py
FinanceFinanceComprehensionPlugindomains/finance/babel_gardens/finance_comprehension_plugin.py
FinanceFinBERTContributorservices/api_babel_gardens/plugins/finbert_contributor.py

Tests

SuiteCountWhat it covers
Core comprehension49Contracts, consumers, registries, cross-domain scenarios
Finance comprehension29Finance plugin, FinBERT contributor, fusion with finance signals
PW v3 (ontology-only)25Pre-Comprehension ontology compilation
Finance PW v312Finance semantic plugin
Total115

Evolution path

The Comprehension Engine is designed to grow with the system:

  1. New domains add plugins (IComprehensionPlugin) — no core changes needed
  2. New models register as contributors (ISignalContributor) — plug-and-play
  3. New signals extend SemanticPayload or get_signal_schemas() return — backward compatible
  4. New fusion strategies extend FusionStrategy enum — consumer handles dispatch

The architecture guarantees that adding a security domain (SecBERT, CVE entities, threat emotions) requires zero changes to the core comprehension engine — only new domain files.


Entity Relations — Ontological Structure (v3.1, March 2026)

The relational layer

v3.1 adds a structural cognition layer to Pattern Weavers: typed, directed relationships between entities. This extends the OntologyPayload contract with EntityRelation edges, creating the first formal relational primitive in the system.

Why not GraphRAG?

Vitruvyan's architecture is fundamentally multi-turn epistemic, not document-graph. The relational layer achieves structural cognition through:

  1. Inline relations in OntologyEntity (LLM-extracted, per-query)
  2. Persisted relations in PostgreSQL entity_relations table (accumulated from Codex ingestion)
  3. Relation-aware enrichment via EntityResolverRegistry (pre-loaded at query time)

This provides 90%+ of GraphRAG's value (entity disambiguation, cross-source linking, structural context) without the operational complexity of a dedicated graph database.

Relation vocabulary (CORE_RELATION_TYPES)

TypeDirectionUse case
ownsdirectedOwnership, subsidiary, parent company
part_ofdirectedOrganizational or conceptual membership
competes_withbidirectionalDirect competitor or alternative
depends_ondirectedSupply chain, operational, logical dependency
affectsdirectedCausal or consequential influence
representsdirectedCross-source identity link (Odoo/Zendesk/HubSpot record → canonical entity)
related_toweakGeneral association (low specificity fallback)

Domain plugins MAY extend this vocabulary with domain-specific types.

Two provenance paths

Online (LLM comprehension): Every query where the LLM detects entity relationships produces EntityRelation objects with source="llm". These flow through the existing comprehension pipeline and enrich the CAN node's narrative context.

Offline (Codex ingestion): When Oculus Prime ingests data from structured sources (APIs with FK relationships — e.g., Odoo invoices referencing customer records), Codex Hunters' normalizers can populate BoundEntity.relations with source="codex". These are persisted to entity_relations and available for subsequent query enrichment.

Ingestion pipeline (Oculus → Codex → Relations)

External Source (Zendesk, Odoo, HubSpot)
     ↓ API call
Oculus Prime (Perception) → Evidence Pack (immutable, no interpretation)
     ↓ oculus_prime.evidence.created
Codex Hunters Track → Restore (normalizer extracts FK relations) → Bind
     ↓ BoundEntity.relations populated
BusAdapter.process_bind()
     ↓ stores entity data + calls store_entity_relations()
PostgreSQL: codex_entities + entity_relations tables

Query enrichment pipeline

User query → Comprehension Engine → OntologyPayload with inline relations

entity_resolver_node → EntityResolverRegistry.resolve()
     ↓ _relation_enricher callback queries entity_relations
state["entity_known_relations"] populated

CAN node renders both inline + persisted relations in narrative context

Key constraints

  • Grounding, not reasoning: relations record facts, not inferences
  • Oculus boundary: Oculus (Perception) preserves source metadata but never interprets it — relation extraction happens in Codex (Knowledge)
  • No graph traversal (Phase 1): flat relational storage with indexed lookups
  • Anti-hallucination: closed vocabulary, confidence >= 0.7, extra="forbid" on all Pydantic models
  • Phase 2 (future): recursive CTE traversal, relationship scoring, ontology constraint enforcement