Event Data Processing
Event data is read from 3 tables:
- Original:
backend_events - Exceptions:
backend_events_exceptions - Exclusions:
backend_events_exclusions
This setup allows us to keep both original events and changed ones. Backfill events are stored in backend_events_exceptions and processed when the STAGE table is created (via create_stage_table macro).
Data Backfilling Process
From time to time we need to add or adjust data we've already received. Most backfills are done on event_data.
Backfill Process
- Identify Events
Identify events that need backfilling by their:
event_idevent_created_at
- Insert into Exceptions table
Insert events into backend_events_exceptions with the needed information.
You can source data from:
- dw-prod-gwiiag.kinesis.backend_events table
- A temporary table you create
Example - Update Event Data:
-- Backfill remittance status
insert into `dw-prod-gwiiag`.s3.backend_events_exceptions(
event_created_at,
event_id,
event_name,
user_id,
event_external_reference_id,
event_advisor_id,
event_data,
event_ingested_at,
exception_added_timestamp,
exception_description,
is_event_update
)
select
event_created_at,
event_id,
event_name,
user_id,
event_external_reference_id,
event_advisor_id,
json_set(event_data, '$.Status', 'Failed', create_if_missing => true) as event_data,
event_ingested_at,
current_timestamp as exception_added_timestamp,
'Remittance was never sent updates' as exception_description,
true as is_event_update
from `dw-prod-gwiiag`.kinesis.backend_events
where event_created_at = '2024-02-03 15:38:21.224000'
and event_id = '77655190-b2eb-41bd-a7cd-c30834a0a360';
Exclude Duplicate Events
If the duplicate events is identified by the event_id, you can use below macro to exclude the duplicate events.
It will find the duplicated event for the event name (it checks 14 days back from the current date). The event will be added to the exclusions and exceptions tables.
dbt run-operation remove_duplicates_by_event_id --args '{"stage_table_name": "card_was_shipped_event"}'
If there is a duplicate event by the business_id (e.g. transaction_id), but the event_id is different, you can use below macro.
dbt run-operation add_duplicates_to_exclusions --args '{"stage_table_name": "card_was_shipped_event", "event_id": "b62f0b23-c66d-473d-b822-ddb79b1753ed"}'
Refreshing the models that use the backfilled event
All the above actions require you to rerun the stage models. All the stage models are incremental and use the current_date_offset variable.