Skip to content

Data Engineer

Michael Barley

I build real-time pipelines, data lakehouses, and scalable infrastructure that processes billions of events. Python, SQL, Kafka, Spark, and a lot of AWS.

6+ Years Experience
700+ Retailers Served
Billions Events Processed
6 Portfolio Projects

Featured Projects

View all →
Batch & Orchestration

Weather ELT Pipeline

Scheduled ELT pipeline pulling daily weather data for 10 UK cities, with dbt transformations and Airflow orchestration.

  • Python
  • PostgreSQL
  • dbt
  • Airflow
  • Docker Compose
Streaming & Real-Time

Ecommerce Clickstream Streaming Pipeline

Real-time event streaming with sub-5-second end-to-end latency, from simulated clickstream to live Grafana dashboard.

  • Python
  • Kafka
  • ClickHouse
  • Grafana
  • Docker Compose
Lakehouse

NYC Taxi Data Lakehouse

Medallion architecture (bronze/silver/gold) on local object storage with Delta Lake, full schema evolution and time travel.

  • PySpark
  • Delta Lake
  • MinIO
  • Jupyter
  • Docker Compose
Streaming & Real-Time

Change Data Capture Pipeline

CDC pipeline capturing row-level changes from PostgreSQL via Debezium and applying them to an analytics replica in real time.

  • PostgreSQL
  • Debezium
  • Kafka
  • Kafka Connect
  • Python
  • Docker Compose

What I Work With

Languages

  • Python
  • SQL
  • TypeScript

Streaming

  • Kafka
  • Debezium
  • ClickHouse

Data

  • Spark
  • Delta Lake
  • dbt
  • PostgreSQL

Infrastructure

  • AWS
  • Terraform
  • Docker
  • Kubernetes

Orchestration

  • Airflow
  • GitHub Actions

Observability

  • Prometheus
  • Grafana
  • Great Expectations

Latest Writing

All posts →