Skip to content

Projects

Data engineering portfolio projects, each built from scratch with full documentation.

Batch & Orchestration

Weather ELT Pipeline

Scheduled ELT pipeline pulling daily weather data for 10 UK cities, with dbt transformations and Airflow orchestration.

  • Python
  • PostgreSQL
  • dbt
  • Airflow
  • Docker Compose

Streaming & Real-Time

Ecommerce Clickstream Streaming Pipeline

Real-time event streaming with sub-5-second end-to-end latency, from simulated clickstream to live Grafana dashboard.

  • Python
  • Kafka
  • ClickHouse
  • Grafana
  • Docker Compose

Change Data Capture Pipeline

CDC pipeline capturing row-level changes from PostgreSQL via Debezium and applying them to an analytics replica in real time.

  • PostgreSQL
  • Debezium
  • Kafka
  • Kafka Connect
  • Python
  • Docker Compose

Lakehouse

NYC Taxi Data Lakehouse

Medallion architecture (bronze/silver/gold) on local object storage with Delta Lake, full schema evolution and time travel.

  • PySpark
  • Delta Lake
  • MinIO
  • Jupyter
  • Docker Compose

Data Quality

Data Quality and Pipeline Observability

Pipeline ingesting real UK food hygiene data, with automated Great Expectations quality gates and Prometheus/Grafana observability.

  • Python
  • PostgreSQL
  • Great Expectations
  • Prometheus
  • Grafana
  • Docker Compose

DevOps & Infrastructure

CI/CD and Infrastructure-as-Code

The batch ELT pipeline wrapped in Terraform, GitHub Actions CI, pre-commit hooks, and a single-command dev environment.

  • Terraform
  • GitHub Actions
  • Docker
  • Make
  • Python
  • dbt