Snowflake

End-to-End Airbnb Analytics Engineering

CSV -> S3 -> Snowflake -> dbt Bronze/Silver/Gold models with analytics-ready outputs.

SnowflakedbtAWS S3PythonSQL

Overview

An end-to-end data engineering pipeline for Airbnb listings, hosts, and bookings using Snowflake and dbt. The project implements medallion modeling, incremental processing, and historical tracking for analytics.

Problem

  • Airbnb source data arrives as separate raw files that are not analytics-ready.
  • Business users need consistent, reliable metrics with historical tracking across changing entities.

Solution

  • Ingest source CSVs into staging, then model Bronze/Silver/Gold layers in dbt on Snowflake.
  • Use incremental models to process only new/changed records and improve runtime efficiency.
  • Implement dbt snapshots (SCD Type 2) for bookings, hosts, and listings to preserve history.

Architecture

  1. Source CSV data -> AWS S3 -> Snowflake staging tables
  2. dbt Bronze models for raw structured ingestion
  3. dbt Silver models for cleaning, standardization, and enrichment
  4. dbt Gold models (`fact`, `obt`) for analytics and BI consumption

Metrics

  • Produced analytics-ready Gold datasets (`fact` and `obt`) for downstream reporting.
  • Reduced rebuild overhead via incremental model execution in Bronze/Silver layers.
  • Improved trust with dbt tests, source checks, and lineage visibility.

Highlights

  • Medallion architecture with clear layer boundaries and ownership.
  • SCD Type 2 snapshots for historical point-in-time analysis.
  • Reusable macros and Jinja templating to keep transformations maintainable.