Apache Airflow
We use Apache Airflow to orchestrate scheduled tasks in our data warehouse. Airflow manages our DAGs (Directed Acyclic Graphs) that run dbt models, data pipelines, and other scheduled jobs.
[TODO] write base guidelines for airflow dags @Michelle V * should be used for scheduling, e.g. it shouldn't execute python code that does the job rather trigger cloud function) * memory requests and limits * base dag class includes https://github.com/majority-dev/dt-airflow-dags/blob/master/lib/utils.py - do not run if previous run is still running * organization * variables
Environments
We maintain two Airflow environments:
Stage Environment
- URL: https://stage-airflow.minority.com/home
- Purpose: Testing and development
- Git Branch:
stagefrom dt-airflow-dags - Cluster:
stage-bankV2-use2-aks(Azure AKS)
Use this environment to: - Test new DAGs before production - Validate changes to existing workflows - Debug issues without affecting production
Production Environment
- URL: https://airflow.minority.com/home
- Purpose: Production workloads
- Git Branch:
masterfrom dt-airflow-dags - Cluster:
prod-bankV2-use2-aks(Azure AKS)
Production Changes
Always test in Stage before merging to master. Production DAGs run on real data and can impact downstream systems.