In this guide, we’ll show you how to use Dagster Pipes with Dagster’s built-in subprocess PipesSubprocessClient to run a local subprocess with a given command and environment. You can then send information such as structured metadata and logging back to Dagster from the subprocess, where it will be visible in the Dagster UI.
Learn about using Dagster Pipes with other entities in the Dagster system in the Reference section
This guide focuses on using an out-of-the-box PipesSubprocessClient resource. For further customization with the subprocess invocation, use open_dagster_pipes approach instead.
To use Dagster Pipes to run a subprocess, you’ll need to have Dagster (dagster) and the Dagster UI (dagster-webserver) installed. Refer to the Installation guide for more info.
You'll also need an existing Python script. We’ll use the following Python script to demonstrate. This file will be invoked by the Dagster asset that you’ll create later in this tutorial.
Create a file named external_code.py and paste the following into it:
import pandas as pd
defmain():
orders_df = pd.DataFrame({"order_id":[1,2],"item_id":[432,878]})
total_orders =len(orders_df)print(f"processing total {total_orders} orders")if __name__ =="__main__":
main()