This tutorial demonstrates using dagster-airlift
to observe DAGs from multiple Airflow instances, and federate execution between them using Dagster as a centralized control plane.
Using dagster-airlift
we can
All of this can be done with no changes to Airflow code.
This tutorial will take you through an imaginary data platform team that has the following scenario:
warehouse
, run by another team, that is responsible for loading data into a data warehouse.metrics
, run by the data platform team, that deploys all the metrics constructed by data scientists on top of the data warehouse.Two DAGs have been causing a lot of pain lately for the team: warehouse.load_customers
and metrics.customer_metrics
. The warehouse.load_customers
DAG is responsible for loading customer data into the data warehouse, and the metrics.customer_metrics
DAG is responsible for computing metrics on top of the customer data. There's a cross-instance dependency relationship between these two DAGs, but it's not observable or controllable. The data platform team would ideally only like to rebuild the metrics.customer_metrics
DAG when the warehouse.load_customers
DAG has new data. In this guide, we'll use dagster-airlift
to observe the warehouse
and metrics
Airflow instances, and set up a federated execution controlled by Dagster that only triggers the metrics.customer_metrics
DAG when the warehouse.load_customers
DAG has new data. This process won't require any changes to the Airflow code.