Intercom API

The dt-intercom-api pipeline extracts data from the Intercom API and uploads it as JSON files to GCS. It is scheduled via Airflow.

Direction of flow

This page covers data flowing from Intercom into GCS. For the reverse direction — pushing user attributes and events to Intercom — see the Intercom data product.

Streams

Stream	Endpoint	Method	Incremental	Notes
`admins`	`/admins`	GET	No	Full load each run
`teams`	`/teams`	GET	No	Full load each run
`tags`	`/tags`	GET	No	Full load each run
`contacts`	`/contacts/search`	POST	Yes	Cursor-based pagination
`conversations`	`/conversations/search`	POST	Yes	Also fetches `conversation_parts`
`calls`	`/calls`	GET	Yes	Page-based pagination, also fetches `call_transcriptions`
`articles`	`/articles`	GET	Yes	Page-based pagination
`collections`	`/help_center/collections`	GET	No	Full snapshot each run — endpoint is not sorted by `updated_at`, so incremental sync is unsafe

GCS Output

Files are stored with date partitioning:

gs://intercom-api-{env}/{stream_name}/year=YYYY/month=MM/day=DD/{stream_name}_{timestamp}.json

State files for incremental streams:

gs://intercom-api-{env}/_intercom_state/{stream_name}_state.json

Call Transcriptions

Transcripts are fetched via GET /calls/{id}/transcript for calls with a non-null transcription_url. Each record contains:

call_id — the call ID
call_updated_at — the call's updated_at timestamp
transcript — array of utterances (start_time, end_time, speaker, content)

Running --stream calls fetches both calls and their transcriptions in a single pagination pass, uploading to separate GCS paths (calls/ and call_transcriptions/).

Articles and Collections

articles is an incremental stream over GET /articles — same page-based, DESC-updated_at shape as calls, with overlap_pages=3 (vs calls' default 40) since the article catalogue is small.

collections uses a stateless full-snapshot helper (_fetch_all_paginated_list_endpoint) because GET /help_center/collections is not sorted by updated_at — the maximum updated_at can appear on any page, which breaks the standard skip-ahead logic. Each run fetches all collections, dedupes by id, and writes a single GCS file. There is no state file for collections.