Project preview

In this course, we will focus on asset-aware orchestrators and how they make data pipelines easier to manage. You’ll use Dagster, an open-source orchestrator, to build a sample data pipeline.

Using data from NYC OpenData, you’ll build a data pipeline that:

Extracts the data, stored in Parquet files, from NYC OpenData
Loads it into a DuckDB database
Transforms and prepares it for analysis
Creates a visualization using the transformed data

If you get stuck or want to jump ahead, check out the finished project here on GitHub.