Project preview

In this course, we will focus on asset-aware orchestrators and how they make data pipelines easier to manage. You’ll use Dagster, an open-source orchestrator, to build a sample data pipeline.

Using data from NYC OpenData, you’ll build a data pipeline that:

  • Extracts the data, stored in Parquet files, from NYC OpenData
  • Loads it into a DuckDB database
  • Transforms and prepares it for analysis
  • Creates a visualization using the transformed data

If you get stuck or want to jump ahead, check out the finished project here on GitHub.