Skip to content

Intercom API

The dt-intercom-api pipeline extracts data from the Intercom API and uploads it as JSON files to GCS. It is scheduled via Airflow.

Streams

Stream Endpoint Method Incremental Notes
admins /admins GET No Full load each run
teams /teams GET No Full load each run
tags /tags GET No Full load each run
contacts /contacts/search POST Yes Cursor-based pagination
conversations /conversations/search POST Yes Also fetches conversation_parts
calls /calls GET Yes Page-based pagination, also fetches call_transcriptions

GCS Output

Files are stored with date partitioning:

gs://intercom-api-{env}/{stream_name}/year=YYYY/month=MM/day=DD/{stream_name}_{timestamp}.json

State files for incremental streams:

gs://intercom-api-{env}/_intercom_state/{stream_name}_state.json

Call Transcriptions

Transcripts are fetched via GET /calls/{id}/transcript for calls with a non-null transcription_url. Each record contains:

  • call_id — the call ID
  • call_updated_at — the call's updated_at timestamp
  • transcript — array of utterances (start_time, end_time, speaker, content)

Running --stream calls fetches both calls and their transcriptions in a single pagination pass, uploading to separate GCS paths (calls/ and call_transcriptions/).