In this tutorial, we'll help you make the switch from Airflow to Dagster. Here, we review an Airflow DAG and show how the same functionality can be achieved in Dagster.
In this tutorial, we'll rewrite an Airflow DAG as a Dagster job. Let's start with a basic Airflow DAG:
from datetime import datetime, timedelta
from textwrap import dedent
from airflow import DAG
from airflow.operators.bash import BashOperator
with DAG("tutorial",
default_args={"retries":1,},
description="A simple tutorial DAG",
schedule_interval=timedelta(days=1),
start_date=datetime(2021,1,1),
catchup=False,
tags=["example"],)as dag:
t1 = BashOperator(
task_id="print_date",
bash_command="date",)
t2 = BashOperator(
task_id="sleep",
bash_command="sleep 5",
retries=3,)
templated_command = dedent("""
{% for i in range(5) %}
echo "{{ ds }}"
echo "{{ macros.ds_add(ds, 7)}}"
{% endfor %}
""")
t3 = BashOperator(
task_id="templated",
bash_command=templated_command,)
t1 >>[t2, t3]
To rewrite this DAG in Dagster, we'll break it down into three parts:
Define the computations: the ops - in Airflow, the operators
Define the graph: the job - in Airflow, the DAG
Define the schedule - In Airflow, the schedule (how simple!)
A Dagster job is made up of a graph of ops. This should feel familiar if you've used the Airflow Task API. With ops, the focus is on writing a graph with Python functions as nodes and data dependencies in between them as edges.
In Dagster, the minimum unit of computation is an op. This directly corresponds to an operator in Airflow. Here, we map the operators of our example Airflow DAG t1, t2, and t3 to their respective Dagster ops.
In Dagster, the computations defined in ops are composed in jobs, which define the sequence and dependency structure of the computations you want to execute. This directly corresponds to a DAG in Airflow. Here, we compose the op's print_date, sleep and templated to match the dependency structure defined by the Airflow operators t1, t2, and t3.
In Dagster, schedules can be defined for jobs, which determine the cadence at which a job is triggered to be executed. Below we define a schedule that will run the tutorial_job daily:
While Airflow and Dagster have some significant differences, there are many concepts that overlap. Use this cheatsheet to understand how Airflow concepts map to Dagster.
Dagster uses normal Python functions instead of framework-specific operator classes. For off-the-shelf functionality with third-party tools, Dagster provides integration libraries.