Creating a schedule with a date-based partition
In the previous lesson, you created the trip_update_job
job that updates most of your assets. The job was put on a schedule that updates the assets on the fifth day of every month at midnight.
Now that you’ve partitioned the relevant assets, the schedule can be changed to only get the latest month’s data and not refresh the entirety of the asset. This is best practice and saves time on compute to limit intake of only new data.
Currently, trip_update_job
in jobs/__init__.py
should look like this:
trip_update_job = define_asset_job(
name="trip_update_job",
selection=AssetSelection.all() - AssetSelection.assets(["trips_by_week"]),
)
To add partition to the job, make the following changes:
Import the
monthly_partition
frompartitions
:from ..partitions import monthly_partition
In the job, add a
partitions_def
parameter equal tomonthly_partition
:partitions_def=monthly_partition,
The job should now look like this:
from dagster import define_asset_job, AssetSelection, AssetKey
from ..partitions import monthly_partition
trip_update_job = define_asset_job(
name="trip_update_job",
partitions_def=monthly_partition, # partitions added here
selection=AssetSelection.all() - AssetSelection.assets(["trips_by_week"])
)