Skip to content

Intercom API

The dt-intercom-api pipeline extracts data from the Intercom API and uploads it as JSON files to GCS. It is scheduled via Airflow.

Direction of flow

This page covers data flowing from Intercom into GCS. For the reverse direction — pushing user attributes and events to Intercom — see the Intercom data product.

Streams

Stream Endpoint Method Incremental Notes
admins /admins GET No Full load each run
teams /teams GET No Full load each run
tags /tags GET No Full load each run
contacts /contacts/search POST Yes Cursor-based pagination
conversations /conversations/search POST Yes Also fetches conversation_parts
calls /calls GET Yes Page-based pagination, also fetches call_transcriptions
articles /articles GET Yes Page-based pagination
collections /help_center/collections GET No Full snapshot each run — endpoint is not sorted by updated_at, so incremental sync is unsafe

GCS Output

Files are stored with date partitioning:

gs://intercom-api-{env}/{stream_name}/year=YYYY/month=MM/day=DD/{stream_name}_{timestamp}.json

State files for incremental streams:

gs://intercom-api-{env}/_intercom_state/{stream_name}_state.json

Call Transcriptions

Transcripts are fetched via GET /calls/{id}/transcript for calls with a non-null transcription_url. Each record contains:

  • call_id — the call ID
  • call_updated_at — the call's updated_at timestamp
  • transcript — array of utterances (start_time, end_time, speaker, content)

Running --stream calls fetches both calls and their transcriptions in a single pagination pass, uploading to separate GCS paths (calls/ and call_transcriptions/).

Articles and Collections

articles is an incremental stream over GET /articles — same page-based, DESC-updated_at shape as calls, with overlap_pages=3 (vs calls' default 40) since the article catalogue is small.

collections uses a stateless full-snapshot helper (_fetch_all_paginated_list_endpoint) because GET /help_center/collections is not sorted by updated_at — the maximum updated_at can appear on any page, which breaks the standard skip-ahead logic. Each run fetches all collections, dedupes by id, and writes a single GCS file. There is no state file for collections.