Skip to content

Event Data Processing

Event data is read from 3 tables:

  1. Original: backend_events
  2. Exceptions: backend_events_exceptions
  3. Exclusions: backend_events_exclusions

This setup allows us to keep both original events and changed ones. Backfill events are stored in backend_events_exceptions and processed when the STAGE table is created (via create_stage_table macro).

Data Backfilling Process

From time to time we need to add or adjust data we've already received. Most backfills are done on event_data.

Backfill Process

  1. Identify Events

Identify events that need backfilling by their:

  • event_id
  • event_created_at
  1. Insert into Exceptions table

Insert events into backend_events_exceptions with the needed information.

You can source data from: - dw-prod-gwiiag.kinesis.backend_events table - A temporary table you create

Example - Update Event Data:

-- Backfill remittance status
insert into `dw-prod-gwiiag`.s3.backend_events_exceptions(
    event_created_at,
    event_id,
    event_name,
    user_id,
    event_external_reference_id,
    event_advisor_id,
    event_data,
    event_ingested_at,
    exception_added_timestamp,
    exception_description,
    is_event_update
)
select
    event_created_at,
    event_id,
    event_name,
    user_id,
    event_external_reference_id,
    event_advisor_id,
    json_set(event_data, '$.Status', 'Failed', create_if_missing => true) as event_data,
    event_ingested_at,
    current_timestamp as exception_added_timestamp,
    'Remittance was never sent updates' as exception_description,
    true as is_event_update
from `dw-prod-gwiiag`.kinesis.backend_events
where event_created_at = '2024-02-03 15:38:21.224000'
  and event_id = '77655190-b2eb-41bd-a7cd-c30834a0a360';

Exclude Duplicate Events

If the duplicate events is identified by the event_id, you can use below macro to exclude the duplicate events. It will find the duplicated event for the event name (it checks 14 days back from the current date). The event will be added to the exclusions and exceptions tables.

dbt run-operation remove_duplicates_by_event_id --args '{"stage_table_name": "card_was_shipped_event"}'

If there is a duplicate event by the business_id (e.g. transaction_id), but the event_id is different, you can use below macro.

dbt run-operation add_duplicates_to_exclusions --args '{"stage_table_name": "card_was_shipped_event", "event_id": "b62f0b23-c66d-473d-b822-ddb79b1753ed"}'

Refreshing the models that use the backfilled event

All the above actions require you to rerun the stage models. All the stage models are incremental and use the current_date_offset variable.

dbt run -m model_name --vars '{current_date_offset: 14}'
If you need to refresh more data you can use the DAG dbt_manual_refresh_historical_models