Skip to content

Apache Airflow

We use Apache Airflow to orchestrate scheduled tasks in our data warehouse. Airflow manages our DAGs (Directed Acyclic Graphs) that run dbt models, data pipelines, and other scheduled jobs.

[TODO] write base guidelines for airflow dags @Michelle V * should be used for scheduling, e.g. it shouldn't execute python code that does the job rather trigger cloud function) * memory requests and limits * base dag class includes https://github.com/majority-dev/dt-airflow-dags/blob/master/lib/utils.py - do not run if previous run is still running * organization * variables

Environments

We maintain two Airflow environments:

Stage Environment

Use this environment to: - Test new DAGs before production - Validate changes to existing workflows - Debug issues without affecting production

Production Environment

Production Changes

Always test in Stage before merging to master. Production DAGs run on real data and can impact downstream systems.


Working with DAGs

Repository Structure