Regulatory rules as versioned data, not if-statements

I build Collabios, an EU influencer marketing platform run out of Estonia (legal entity Lustrace Digital OÜ). A big chunk of the product is unglamorous: making sure a sponsored post is labeled the way the law in that creator's country actually requires. That sounds like a checkbox. It is not. It turned into one of the more interesting modeling problems I've worked on, because the "spec" is written by thirteen different regulators who never coordinated with each other and who keep editing it.

This post is about the design pattern I landed on: treat each jurisdiction's rules as versioned data with an applicability function, not as branches in code. If you've ever watched a switch statement on country grow a new arm every quarter until nobody can reason about it, this is for you.

TL;DR: Hard-coding regulatory logic as if-statements rots fast because the rules change on the regulator's schedule, not yours. Model each rule as a data record with an appliesWhen predicate and an effective date, keep old versions instead of mutating them, and make the engine a pure function of (facts, ruleset, asOfDate). Code review of a legal change becomes a data diff.

The naive version, and why it broke

The first cut looked like every "handle multiple countries" first cut.

// illustrative, NOT our actual source
function disclosureLabel(country) {
  if (country === "FR") return "Publicité";
  if (country === "DE") return "Werbung";
  if (country === "IT") return "Pubblicità";
  // ...and so on
}

Two things killed this almost immediately.

First, "the rule" is not one value. Take Germany. The disclosure obligation sits in §5a of the UWG (the Unfair Competition Act). What made it a real modeling problem is the case law on top of the statute: in I ZR 90/20 (the Cathy Hummels case) the German Federal Court of Justice established that a platform's own built-in paid-partnership tag is not automatically sufficient disclosure. So "the German rule" is really: a required wording, plus a constraint that the platform's native tag alone doesn't satisfy it, plus the civil-enforcement mechanism behind it. You cannot return that as a string.

Second, the branches don't share a shape. France isn't only about the disclosure word. Under Loi 2023-451 (9 June 2023), and specifically the implementing Décret 2025-1137 (28 November 2025), a written contract becomes mandatory once a brand-creator partnership crosses €1,000, effective 1 January 2026. That's a totally different kind of obligation than "use this word in your caption" — it's a precondition on the deal, not the post. Italy adds yet another axis: the AGCom Codice di Condotta (Delibera 197/25/CONS) only treats a creator as an "influencer rilevante" once they pass 500,000 followers or 1,000,000 average monthly views, and then it wants "Pubblicità" inside the first three hashtags specifically.

So now my if tree has branches that are about words, branches that are about thresholds, branches that are about contract preconditions, and branches that are about where in the post the label has to appear. Whenever a regulator moved, I was editing control flow and praying my tests covered the country I'd touched.

The reframe: a rule is a record, not a branch

The unlock was to stop thinking of jurisdictions as code paths and start thinking of every individual obligation as a row of data with the same envelope.

// illustrative shape, NOT our actual schema
{
  "id": "it-pubblicita-first-three-hashtags",
  "jurisdiction": "IT",
  "authority": "AGCom — Delibera 197/25/CONS",
  "obligation": "disclosure_placement",
  "requiredWording": ["Pubblicità", "#adv"],
  "constraint": "must_appear_in_first_three_hashtags",
  "appliesWhen": {
    "anyOf": [
      { "fact": "followers", "op": ">=", "value": 500000 },
      { "fact": "avgMonthlyViews", "op": ">=", "value": 1000000 }
    ]
  },
  "effectiveFrom": "2025-08-01",
  "supersededBy": null
}

A few things fall out of this immediately.

Applicability becomes a predicate, not a branch. appliesWhen is a small boolean expression evaluated against a facts object (the creator's follower count, the deal value, whether the brand is established in the EU, whether the content targets minors). The engine doesn't know anything about Italy. It knows how to evaluate predicates. Italy is just data.

Heterogeneous obligations stop fighting each other. A French "written-contract-above-€1,000" rule and an Italian "word-in-first-three-hashtags" rule are the same record shape with different obligation types. The consumer of the engine switches on obligation, and that switch is small and stable, because the set of obligation kinds changes far more slowly than the set of specific rules.

The authority is part of the data. Every record carries the regulator and the instrument it came from. That's not decoration — it's how the output stays auditable and how I can show a user why a requirement fired. It also means the data doubles as the citation list.

Versioning: never mutate a rule, supersede it

This is the part I'd push hardest if you're building something similar.

Regulators don't patch in place and neither should you. When Spain reshaped its regime with Real Decreto 444/2024 (in force 2 May 2024) — introducing the "users of special relevance" category with "Publicidad" as the disclosure word under CNMC oversight — the correct move was not to edit the old Spanish rule. It was to write a new record with effectiveFrom: "2024-05-02" and set the old one's supersededBy to the new id.

Why bother keeping the dead rule?

You get to answer "was this campaign compliant in March 2024?" Compliance is a point-in-time question. If you mutate rules in place, you've thrown away your ability to reason about the past, and the past is exactly what a dispute is about.
Diffs become reviewable. A regulatory change lands as a data diff — a new record, a flipped supersededBy pointer — instead of a behavioral change buried in a control-flow edit. I can read the PR the way I'd read a changelog. So can a lawyer who can't read the code.
You stop being scared of effective dates. A rule that's been published but isn't in force yet is just a record whose effectiveFrom is in the future. The engine filters it out for "today" and includes it for "as of launch day." The UK's DMCC Act 2024 — which gives the CMA direct enforcement powers from 2 June 2026 — lived in the dataset as a future-dated record long before that date mattered, sitting quietly next to the existing ASA / CAP Code Section 2 rules.

The engine then becomes a pure function:

// illustrative, NOT our actual source
function evaluate(facts, ruleset, asOfDate) {
  return ruleset
    .filter((r) => r.effectiveFrom <= asOfDate)
    .filter((r) => r.supersededBy === null || supersededAfter(r, asOfDate))
    .filter((r) => matches(r.appliesWhen, facts))
    .map((r) => toRequirement(r));
}

Pure function, three inputs, deterministic output. That property is worth a lot. It means the whole thing is trivially testable: a test case is just a facts object plus an expected list of requirement ids. When I add a jurisdiction, I add data and a fixture, not a branch and a prayer.

What this buys you in practice

The concrete payoff showed up in the influencer disclosure generator we ship. A creator (or a brand auditing a campaign) gives the engine a few facts — country, platform, follower count, deal value — and it returns the obligations that actually apply, each tagged with the regulator it came from. The UI is thin. The intelligence is in the dataset, and the dataset is reviewable by people who are not engineers.

A couple of patterns I'd carry to any rules-heavy domain — tax, KYC, content moderation, accessibility:

Separate "what is true about the case" (facts) from "what the law says" (rules) from "when" (the as-of date). Three inputs, one pure function. The moment those three get tangled, you're back to if-statements.
Model the predicate, not the answer. appliesWhen is more durable than a precomputed label, because regulators change who a rule applies to as often as they change what it requires (Italy's follower threshold, Spain's special-relevance category).
Treat the citation as a first-class field. If you can't say which authority and instrument a requirement came from, you can't defend the output, and honestly you probably don't understand the rule well enough to encode it.

None of this is exotic. It's event-sourcing intuition (append, don't mutate) plus a tiny predicate evaluator, applied to a domain where the "events" are published by governments. But framing the law as versioned data instead of branching logic is the difference between a system you can hand to a non-engineer to audit and a switch statement that everyone is afraid to touch.

If you're modeling anything where the spec is written by an outside body that revises it on its own schedule — and that's more domains than you'd think — I'd start here.

— Ghassen Daoud, founder, Collabios

When the Specification Is Written by Regulators: Modeling Rules as Versioned Data

The naive version, and why it broke

The reframe: a rule is a record, not a branch

Versioning: never mutate a rule, supersede it

What this buys you in practice

Comments

Command Palette

The naive version, and why it broke

The reframe: a rule is a record, not a branch

Versioning: never mutate a rule, supersede it

What this buys you in practice

Comments