Skip links

Medallion Architecture in Practice in AI Era — Beyond the Theory

TL;DR

  • Medallion still matters for AI—but only with strong data quality and trust boundaries.
  • Shift quality to Bronze, organize Silver by domain for scalable pipelines.
  • Certified Gold + semantic layer makes data usable for business and AI.

If you have read anything about modern data platforms in the last decade, you have encountered the medallion architecture: Bronze for raw data, Silver for cleansed and conformed data, Gold for business-ready data. It is an elegant conceptual model. But implementing it at enterprise scale — with hundreds of sources, dozens of domains, and AI models consuming the output — is a different story entirely.

Most articles about medallion architecture are theoretical or vendor-authored. I have built production medallion architectures that ingests data from over a hundred source systems across tens of business domains. I have implemented these architectures in Snowflake, Databricks, Cloud-provider based and mix of both as well. In either case, the technology mattered less than the conventions, contracts, and quality gates we enforced. Here is what I learned about the decisions that actually matter — especially now that AI models depend on the data flowing through these layers.

Why Medallion Still Matters in the AI Era

AI systems are only as reliable as the data contracts that feed them. When models drift, audits happen, or incidents spike, medallion gating is what keeps issues contained. Without it, bad upstream changes cascade into features, predictions, and customer experiences before anyone notices.

Medallion layers serve distinct risk controls: Bronze protects ingestion, lineage, and catches quality issues at the earliest possible point. Silver organizes data by domain with minimal transformations, making it MDM-ready and DQ-gate-passed. Gold creates decision-ready products — KPIs, metrics, aggregations, and cross-domain data products — with SLAs, DQ certification, and owners. Each layer is a trust boundary — and in the AI era, trust boundaries are non-negotiable.

Bronze: The Ingestion Layer — Left-Shift Quality Here

Bronze is your system of record for raw data. It should be an exact copy of the source, with metadata added — never transformed. But here is the critical insight most implementations miss: data quality checks should start at Bronze, not Silver.

The philosophy is to left-shift quality as close to the origin as possible. As soon as data lands in Bronze, quality checks should validate it. This means any downstream consumption from Bronze — whether Silver processing, ad hoc analysis, or operational feeds — already has quality dates attached. You know when the data was validated, what passed, and what failed. The alternative — waiting until Silver to check quality — means you are already carrying bad data through your pipeline before anyone catches it.

  • Schema management: Source schemas change constantly. You need a configurable schema evolution policy per source: auto-evolve for non-critical sources, quarantine for critical ones, and alert-and-continue for everything in between. We default to auto-evolve for operational feeds and quarantine for financial and regulatory sources.
  • Audit infrastructure: Every Bronze table has companion audit tables: source metadata (system, timestamp, batch ID), record counts (source vs. target reconciliation), and error tables (records that failed ingestion with error codes and retry metadata). Non-negotiable for auditability and reproducibility.
  • CDC handling: For transactional sources, we use CDC (Change Data Capture). Store CDC events as-is in Bronze — append-only, preserving the complete change history. Do not attempt to merge or apply changes at this layer. This enables point-in-time reconstruction and the audit trails that AI model reproducibility demands.
  • DQ at Bronze: Embedded quality checks run as soon as data lands: completeness (did all expected values arrive, non-nulls?), freshness (is the data within the expected time window?), schema conformance (did the schema change unexpectedly?), and basic validity (are critical fields within expected ranges?).

Failures quarantine the batch, or records and trigger alerts. This left-shift approach catches upstream issues hours or days earlier than waiting for Silver-layer checks.

Silver: Domain-Organized, DQ-Gated, MDM-Ready

This is where my approach diverges from the conventional wisdom. Most medallion guides tell you to organize Silver by source system and defer domain integration to Gold. I have found this Domain, harmonization in Silver works better at enterprise scale.

Silver should be organized by business domain. Here is why: when you are designing domain-driven delivery — and especially when you have an MDM system that will master data by domain, taking the output of Silver before Gold — the data needs to be organized by domains in Silver. Silver is where you perform minimal but essential transformations: data type standardization, null handling, deduplication, business key derivation, SCD processing, and cross-source alignment within a domain. The result is clean, domain-organized, DQ-gate-passed data that is ready for MDM processing.

The quality gate between Bronze and Silver is now the second line of defense (after Bronze-level checks). At Silver, checks become more sophisticated: referential integrity across tables within a domain, consistency checks (do related fields agree?), accuracy validation against business rules, and distribution anomaly detection. If a dataset fails the hard gate, it does not promote to Silver — it stays in Bronze quarantine and triggers an alert.

This domain-organized Silver serves multiple purposes:

  • It feeds MDM systems (like Informatica MDM, Profisee, Reltio,…) that master data by domain — performing match, merge, and survivorship processing on clean, domain-aligned data.
  • It enables domain teams to own their Silver layer, creating clear accountability for data quality within each domain.
  • It simplifies the Silver-to-Gold transformation by making Gold about enrichment, aggregation, and cross-domain integration rather than domain organization.
  • It supports parallel domain delivery — each domain’s Silver can be developed and promoted independently.

Gold: KPIs, Metrics, Certified Data Products

Gold is where Silver domain data gets enriched, mastered, and transformed into consumable products. Gold serves two distinct purposes:

  • Domain Gold — owned by domain owners: Within each domain, Gold takes the MDM-enriched Silver data and builds KPIs, metrics, aggregations, and analytical views. These are the governed, certified data products that domain-specific business users consume. Each domain Gold product has a named domain owner who is a business leader, not a technical resource.
  • Cross-Domain Gold — owned by data product owners: Cross-domain data products integrate data across multiple domains to answer enterprise-level questions. These have a dedicated data product owner responsible for the integration logic, cross-domain semantics, and consumer SLAs.
  • The quality gate between Silver and Gold is business-rule oriented: does total revenue reconcile with the source system? Are all expected entities present after MDM processing? Do cross-domain relationships resolve correctly? Do the KPIs compute to expected ranges?

DQ Certification Badges

This is an essential practice that most platforms miss. Every Gold data product should carry a DQ certification badge — a visible, queryable indicator of its quality status. The badge communicates to consumers: this data has passed all quality gates from Bronze through Silver through Gold, it meets the defined quality SLAs (completeness, validity, accuracy, timeliness, consistency), and it is certified for consumption and trusted decision-making.

Certification badges are not static. They are recalculated frequently (pipeline runs based). A product can lose its certification if quality degrades — and when it does, consumers need to be made aware. This creates transparency: business users know they are consuming certified data, and when certification lapses, they know to wait or investigate. Trust is earned through visibility, not assumed through silence.

Beyond Gold: The Semantic and Ontology Layer

Gold is not the end. Business users and executives do not want to write SQL against Gold tables. They want to ask questions in natural language, explore relationships across domains, and get answers without understanding the underlying schema.

We built an ontology layer on top of Gold, which creates a virtual knowledge graph over the medallion architecture. Business entities from Gold become nodes in the knowledge graph; relationships between domains become edges. Natural Language Query (NLQ) capability lets business users ask questions like ” Identify all Critical Components sourced from a “Supplier” in “Country”, and map their downstream dependencies to determine which Specific Engine Programs and High-Priority Customer Orders at the “Plant” will experience a ‘Line-Down’ risk or fulfillment delay due to this supply disruption.” without knowing which tables or joins are involved.

This semantic layer is what transforms a data platform from a technology asset into a Actionable, Operational intelligence fabric. It is also where AI agents add tremendous value — agents that can traverse the knowledge graph, answer questions, and surface insights that structured queries would miss.

Production AI Risks — And How Medallion Mitigates Them

  • Data Drift and Silent Schema Breaks: Left-shifted DQ at Bronze catches schema and distribution changes at the earliest point. Quarantine and alert before Silver promotions.
  • Inconsistent Semantics Across Domains: Domain-organized Silver converges definitions (units, codes, hierarchies) within each domain; MDM processing between Silver and Gold resolves cross-system entity conflicts.
  • Low-Trust Features: DQ certification badges on Gold products ensure that feature tables serving AI models are explicitly certified. If certification lapses, the pipeline quarantines — not silently serves stale or degraded data.
  • Un-auditable Decisions: Left-shifted DQ at Bronze creates an audit trail from the point of entry. Promotion artifacts and gate results enable reproducibility of the exact data state that fed any model run.

Five Medallion Patterns I See in the Wild

  1. Classic BI Medallion: Batch ingestion to Bronze, domain-organized Silver, business marts and KPIs in Gold. Great for governed reporting and executive dashboards. The most common starting pattern.
  2. Streaming/IoT Medallion: streaming into Bronze with immediate DQ checks, windowed cleanses in domain-organized Silver, and near-real-time Gold aggregates feeding operational AI and alerting.
  3. MDM-Infused Medallion: Domain-organized Silver feeds MDM for match/merge/survivorship processing. Golden entities are resolved at the Silver-to-Gold boundary, then productized in Gold with KPIs and cross-domain integration.
  4. ML Feature Medallion: Feature engineering in domain Silver, DQ-certified feature tables in Gold, online features synced to a low-latency store with the same contracts and certification badges.
  5. M&A Harmonization Medallion: When enterprises merge or acquire, each entity brings its own systems, schemas, hierarchies, and definitions for the same business domains. Bronze preserves each entity’s raw data separately with provenance tagging and per-entity DQ checks — critical because quality baselines differ across entities.

Silver harmonizes similar sources from different entities into a canonical domain model: resolving semantic drift, aligning hierarchies, standardizing business keys, and converging competing definitions through governed decisions. Gold delivers unified KPIs and metrics across the combined enterprise with DQ certification badges. This pattern is essential for any organization navigating M&A that needs a unified data view without losing entity-level lineage and quality visibility.

Practical Lessons at Scale

  • Left-shift DQ as far as possible: Quality checks at Bronze catch issues hours or days earlier than Silver-only checks. Every layer should have its own quality gate, but Bronze is the first and most important line of defence.
  • Domain-organize Silver for MDM readiness: When MDM systems master data by domain, Silver must be domain-organized. This aligns the data architecture with the domain delivery model and simplifies MDM integration.
  • Business ownership of Gold is non-negotiable: Domain Gold products need domain owners. Cross-domain Gold products need data product owners. Both must be business leaders, not engineers.
  • DQ certification badges build trust: A visible, queryable certification status on every Gold product tells consumers whether they can trust the data. Certification that degrades triggers notifications — transparency builds confidence.
  • Naming conventions matter at scale: Use hierarchical names. Consistency across domains requires strict linting in CI/CD.

Signals That You Are Winning

  • Domain products ship in weeks, not quarters — each with a named owner, published SLA, and DQ certification badge.
  • Schema changes at Bronze are caught by left-shifted DQ before they propagate to Silver.
  • MDM processing between Silver and Gold produces trusted golden entities across domains.
  • Business users browse certified Gold products in the marketplace and self-serve.
  • AI model teams consume only DQ-certified feature tables with full lineage and freshness SLAs.

Four Moves You Can Make Tomorrow

  1. Left-shift your quality checks to Bronze: If DQ only happens at Silver or Gold, you are carrying bad data too far into the pipeline. Add completeness, freshness, and schema-conformance checks at Bronze immediately. Start with your highest-value domain.
  2. Reorganize Silver by domain: If your Silver layer is organized by source system, you are deferring domain alignment too late. Reorganize Silver by domain with minimal transformations, ready for MDM processing. This aligns your architecture with domain-driven delivery.
  3. Implement DQ certification badges on Gold: Every Gold data product should carry a visible certification status. Consumers should know whether the product they are consuming has passed all gates. Certification that lapses should trigger alerts to both the owner and consumers.
  4. Invest in the semantic layer: If your data consumers are writing SQL against Gold tables, you are leaving value on the table. Evaluate ontology and knowledge graph options that let business users ask questions in natural language. This is where the next order-of-magnitude value unlock lives.

Looking Ahead

Medallion architecture is not a decade-old trend — it is how we make AI production-grade. The organizations that get this right today — with left-shifted DQ at Bronze, domain-organized Silver feeding MDM, DQ-certified Gold products, rich semantics using Ontology/ KGs — will be the ones best positioned to adopt agentic AI at the platform layer. Ship domain products fast, promote only on green gates, certify before consumption, and keep ownership clear. That is how models earn trust and stay in production.

In my next article, I will go deeper into data quality — not as a checkbox, but as a continuous program that I turned around from failing to thriving, including how AI agents accelerated the transformation.

This website uses cookies to improve your web experience.
Home
Account
Cart
Search