File Ingestion
All SFTP-based file ingestion runs through the unified dt-file-ingestion image, orchestrated by Airflow.
Providers
Checkout
| Auth | SSH key (checkout-ssh-key Airflow variable, raw PEM) |
| DAG | checkout_sftp_download |
| Schedule | 30 */4 * * * (every 4 hours at :30) |
| GCS bucket | checkout-sftp-{env} |
| Processor | raw |
Reports:
| Report | SFTP path | Pattern |
|---|---|---|
checkout_financial_actions |
/majority-usa-llc/reports-majority-usa-llc/financial-actions/payout-id |
*.csv |
checkout_financial_actions_fees |
/majority-usa-llc/reports-majority-usa-llc/financial-actions/date-range |
*.csv |
checkout_payouts |
/majority-usa-llc/reports-majority-usa-llc/payouts |
*.csv |
InComm
| Auth | Password (incomm-sftp-password-key Airflow variable, username: majority_prod) |
| DAG | incomm_sftp_download |
| Schedule | 30 9 * * * (daily at 09:30 UTC) |
| GCS bucket | incomm-sftp-{env} |
| Processor | raw |
Reports:
| Report | SFTP path | Pattern | Notes |
|---|---|---|---|
incomm_cashtie_billing_report |
/reports |
2*/*.csv |
Recursive listing (date-named subdirectories) |
incomm_swipe_report |
/reports/spil_reports |
*.csv |
UTF-8 BOM and trailing blank lines stripped |
The DAG includes a downstream dbt step: dbt build --select tag:incomm --target airflow_federated.
Lithic
| Auth | SSH key (lithic-sftp-private-key Airflow variable) |
| DAG | lithic_sftp_download |
| Schedule | 0 13 * * * (daily at 13:00 UTC) |
| GCS bucket | lithic-sftp-{env} |
| Processor | raw |
Reports:
| Report | SFTP path | Pattern |
|---|---|---|
settlement_detail |
/lithic_reports |
*_settlement_detail.csv |
cards |
/lithic_reports |
*_cards.csv |
daily_network_settlement_summary |
/lithic_reports |
*_daily_network_settlement_summary.csv |
accounts |
/lithic_reports |
*_accounts.csv |
card_transactions |
/lithic_reports |
*_card_transactions.csv |
network_reports |
/network-reports |
*.txt |
Network reports run sequentially after all lithic report downloads complete.
CFSB
| Auth | SSH key (cfsb-sftp-pkey Airflow variable) + PGP decryption (cfsb-majority-pgp-private-key) |
| DAG | cfsb_sftp_download |
| Schedule | 15 9 * * * (daily at 09:15 UTC) |
| GCS bucket | cfsb-sftp-{env} |
| Processor | pgp |
Reports:
| Report | SFTP path | Pattern |
|---|---|---|
cfsb_transactions_reconciliation |
cfsb_transactions_reconciliation/ |
TXNDDA_MAJORITY_*.csv.pgp |
WebBank
| Auth | SSH key (webbank-ssh-privatekey) + GPG decryption (webbank-pgp-privatekey) |
| DAG | webbank_sftp_download |
| Schedule | 0 14 * * * (daily at 14:00 UTC) |
| GCS bucket | webbank-sftp-{env} |
| Processor | bai2 |
Majority Ledger
| Source | Azure Blob Storage (not SFTP) |
| DAG | majority_ledger_blob_download |
| Schedule | 0 * * * * (hourly) |
| GCS bucket | majority-ledger-blob-{env} |
| Processor | raw |
Downloads LedgerTransactionsHistory/*.csv from Azure Blob Storage (prodmajorityreporting / stagemajorityreporting storage accounts, reports container).
ATM All Points (deprecated)
Deprecated
This pipeline is migrated but not used -- dbt references were removed ~18 months ago. It still runs weekly.
| Auth | Password (atm-ftp-username, atm-ftp-password Airflow variables) |
| DAG | atm_all_points_sftp_download |
| Schedule | 0 19 * * 0 (Sundays at 19:00 UTC) |
| GCS bucket | atm-all-points-sftp-{env} |
| Processor | raw |
Reports:
| Report | SFTP path | Pattern |
|---|---|---|
allpoint_geo_tid_all |
/ |
allpoint_geo_tid_all.csv |
Dev environment cannot connect to this SFTP server -- only testable in prod.
AWS (legacy)
Stale but not yet deleted
The following S3 buckets previously used for file ingestion are now stale. Active ingestion targets GCS.
| Provider | S3 bucket(s) | Status |
|---|---|---|
| Checkout | checkout-majority-{env}, psp-funding-reconciliation-majority-{env} |
To be deleted |
| InComm | incomm-report-majority-{env} |
To be deleted |
| ATM All Points | atm-allpoints-majority-{env} |
To be deleted |
How it works
flowchart LR
SFTP["SFTP Server"] -->|dt-file-ingestion| GCS["GCS Bucket"]
GCS -->|PubSub notification| CF["dt-gcp-bq-ingestion\ncloud function"]
CF --> BQ["BigQuery"]
- Airflow triggers a
KubernetesPodOperatorWithCredentialsrunning thedt-file-ingestionimage - The image connects to the provider's SFTP server and downloads files matching configured patterns
- Files are uploaded to a provider-specific GCS bucket
- A PubSub notification (
gcs-file-upload-topic) triggers thedt-gcp-bq-ingestion-cloud-function - The cloud function loads the file into BigQuery, adding metadata columns:
ingested_at,file_name,bucket_name