SaaS Subscription Analytics
- dbt
- PostgreSQL
- Docker Compose
A dimensional modeling project that builds a star schema for SaaS subscription billing data. Customers change plans, pause and resume, upgrade and downgrade, all tracked as SCD Type 2 slowly changing dimensions using dbt snapshots, with incremental fact tables and custom generic tests. Includes a generated dbt docs site for exploring the DAG and column-level documentation.
flowchart LR
Seed[Seed CSV Files] --> Raw[(Raw Tables)]
Raw --> Snap[dbt snapshot<br/>SCD Type 2]
Raw --> Stg[Staging Models]
Snap --> Dims[Dimension Tables]
Stg --> Dims
Stg --> Facts[Fact Tables<br/>incremental]
Dims --> Facts
Dims --> Marts[Mart Views]
Facts --> Marts
Marts --> Docs[dbt Docs Site<br/>localhost:8081]
subgraph PostgreSQL
Raw
Snap
Stg
Dims
Facts
Marts
end
Why dimensional modeling
Most dbt projects you see in portfolios run dbt init, create a staging model and a single mart, and call it done. That’s fine for learning the tool, but it doesn’t demonstrate the skill that actually matters in analytics engineering interviews: modeling data that changes over time.
In a SaaS business, a customer’s plan isn’t static. They sign up on a free tier, upgrade to pro after a trial, downgrade when budgets tighten, maybe churn and come back later. If you only store the current plan, you can’t answer “what plan were they on when they made that purchase?” or “how many customers downgraded last quarter?” -questions that every SaaS company cares about.
Dimensional modeling with SCD Type 2 solves this. Every time a customer’s plan changes, the old record gets an valid_to timestamp and a new record opens. The fact table joins to the dimension using a date range, so you always get the plan that was active at the time of the event. This is the foundation of every serious analytics warehouse.
How it works
The project models a subscription billing system for a fictional SaaS product with four plan tiers: free, starter, pro, and enterprise.
Source data. Seed CSV files provide the initial state: customers with signup dates and current plans, subscription change events (upgrades, downgrades, cancellations, reactivations), and invoices tied to billing periods. A Python script generates realistic subscription lifecycle patterns -customers don’t jump from free to enterprise, they follow plausible upgrade paths with realistic timing.
Snapshots. dbt snapshots run against the customers table using check strategy on the current_plan column. Each time the plan value changes, dbt closes the old record (dbt_valid_to gets a timestamp) and opens a new one. This builds the SCD Type 2 history automatically -no manual valid_from / valid_to logic needed.
Staging. Standard dbt staging models: rename columns to consistent conventions, cast types, deduplicate where needed. One staging model per source table.
Dimensions. dim_customers joins the snapshot history to build a complete customer dimension with valid_from and valid_to date ranges. dim_plans is a static reference table with plan names, prices, and seat limits. dim_dates is a date spine generated by a dbt macro.
Facts. fct_invoices is an incremental model -on each run, it only processes invoices with an invoice_date after the last run’s max. It joins to dim_plans using the invoice amount to pick up the plan tier for each billing event. fct_subscription_events tracks every plan change as a discrete event with the old plan, new plan, and change type.
Marts. mart_monthly_recurring_revenue calculates MRR by month and plan, accounting for upgrades, downgrades, and churn. mart_cohort_retention groups customers by signup month and tracks how many are still active in each subsequent month.
Data model
erDiagram
dim_customers ||--o{ fct_invoices : "billed to"
dim_customers ||--o{ fct_subscription_events : "changes plan"
dim_plans ||--o{ fct_invoices : "plan at time"
dim_plans ||--o{ fct_subscription_events : "old/new plan"
dim_dates ||--o{ fct_invoices : "invoice date"
dim_customers {
text customer_key PK
int customer_id
text name
text email
text current_plan
date valid_from
date valid_to
boolean is_current
}
dim_plans {
int plan_id PK
text plan_name
numeric monthly_price
int max_seats
}
dim_dates {
date date_day PK
int year
int month
int day_of_month
text month_name
boolean is_month_start
date month_start
}
fct_invoices {
int invoice_id PK
int customer_id FK
int plan_id FK
date invoice_date FK
numeric amount
text status
}
fct_subscription_events {
int event_id PK
int customer_id FK
int old_plan_id FK
int new_plan_id FK
text change_type
timestamp changed_at
}
Custom generic tests
Beyond the standard not_null and unique tests, the project includes custom generic tests that enforce business logic:
- valid_date_range -ensures
valid_fromis always beforevalid_toon SCD Type 2 records, and that exactly one record per entity has a nullvalid_to(the current record) - no_overlapping_ranges -checks that no customer has two active dimension records for the same date range, which would cause fan-out in fact table joins
- referential_integrity_by_date -verifies that every fact record can join to exactly one dimension record using the date range, catching both orphans and duplicates
- accepted_transitions -validates that plan changes follow allowed paths (e.g., free → starter is allowed, free → enterprise is not)
Running it
docker compose up -d
This starts PostgreSQL and a dbt container. The dbt container runs seeds, snapshots, models, and tests on startup, then serves the dbt docs site.
Check everything ran cleanly:
docker compose logs dbt
You should see dbt seed, dbt snapshot, dbt run, and dbt test all complete with no errors.
Browse the dbt docs site at localhost:8081 to explore the DAG, column descriptions, and test coverage.
Connect to the database to query the models directly:
docker compose exec postgres psql -U dbt -d analytics
-- MRR by plan over time
SELECT * FROM marts.mart_monthly_recurring_revenue ORDER BY month_start, plan_name;
-- Cohort retention
SELECT * FROM marts.mart_cohort_retention ORDER BY signup_cohort, months_since_signup;
-- SCD Type 2 history for a single customer
SELECT customer_id, current_plan, valid_from, valid_to, is_current
FROM dims.dim_customers
WHERE customer_id = 1
ORDER BY valid_from;
To simulate plan changes and see snapshots update:
docker compose exec dbt bash -c "python3 /app/scripts/simulate_changes.py && dbt snapshot && dbt run && dbt test"
This inserts new subscription events into the source, reruns snapshots (which close old records and open new ones), rebuilds the downstream models, and validates everything.
Shut it down:
docker compose down -v
Design decisions
SaaS billing as the domain. Subscription data is the canonical use case for SCD Type 2 -plans change on known dates, you need historical accuracy for revenue reporting, and the business questions are intuitive. It’s also a domain that every interviewer understands without needing context.
dbt snapshots over manual SCD logic. Writing your own valid_from / valid_to merge logic is error-prone and verbose. dbt snapshots handle this declaratively -you point them at a source table, tell them which columns to watch, and they maintain the history automatically. This is how production dbt projects handle SCDs.
Check strategy over timestamp strategy. dbt snapshots support two change-detection strategies: timestamp (compare an updated_at column) and check (compare the actual column values). Check strategy is more robust for this use case because it catches changes even if the source system doesn’t reliably update its timestamp. The tradeoff is slightly more compute per snapshot run, which is irrelevant at this scale.
Incremental fact tables. fct_invoices uses dbt’s incremental materialization with an invoice_date filter. On the first run, it processes everything. On subsequent runs, it only processes new rows. This demonstrates the pattern used in production to avoid full-table rebuilds on large datasets.
Custom generic tests over one-off tests. Generic tests are reusable across models -valid_date_range works on any SCD Type 2 table, not just dim_customers. This is how mature dbt projects scale their testing: build a library of generic tests that encode business rules, then apply them declaratively in YAML.