Batch & Orchestration
Weather ELT Pipeline
Scheduled ELT pipeline pulling daily weather data for 10 UK cities, with dbt transformations and Airflow orchestration.
- Python
- PostgreSQL
- dbt
- Airflow
- Docker Compose
Streaming & Real-Time
Ecommerce Clickstream Streaming Pipeline
Real-time event streaming with sub-5-second end-to-end latency, from simulated clickstream to live Grafana dashboard.
- Python
- Kafka
- ClickHouse
- Grafana
- Docker Compose
Change Data Capture Pipeline
CDC pipeline capturing row-level changes from PostgreSQL via Debezium and applying them to an analytics replica in real time.
- PostgreSQL
- Debezium
- Kafka
- Kafka Connect
- Python
- Docker Compose
Lakehouse
NYC Taxi Data Lakehouse
Medallion architecture (bronze/silver/gold) on local object storage with Delta Lake, full schema evolution and time travel.
- PySpark
- Delta Lake
- MinIO
- Jupyter
- Docker Compose
Data Quality
Data Quality and Pipeline Observability
Pipeline ingesting real UK food hygiene data, with automated Great Expectations quality gates and Prometheus/Grafana observability.
- Python
- PostgreSQL
- Great Expectations
- Prometheus
- Grafana
- Docker Compose
DevOps & Infrastructure
CI/CD and Infrastructure-as-Code
The batch ELT pipeline wrapped in Terraform, GitHub Actions CI, pre-commit hooks, and a single-command dev environment.
- Terraform
- GitHub Actions
- Docker
- Make
- Python
- dbt