Skip to content

BigQuery

BigQuery is our data warehouse hosted on Google Cloud Platform. Prod | Dev

Finding Raw Events

All backend events are sent through Kinesis to one table in our BigQuery data warehouse:

For Prod: dw-prod-gwiiag.kinesis.backend_events

For Stage: dw-dev-s5dlij.kinesis.backend_events_stage

For Dev: dw-dev-s5dlij.kinesis.backend_events

Best Practices

  • Use CTEs to filter data before joining with other tables. This will improve performance and readability.
  • List the columns (avoid using *) to improve readability and performance.
  • Always use the date filter. Extra points if you use the partition column to filter the data ⭐️. You can find it in table details.
  • If the table is clustered, use the cluster column to filter the data and it must be the first filter. You can also find it in table details.
  • All names should be in snake_case.
  • If you need any help please feel free to ask us! 💬

Querying backend_events table

select *
from dw-prod-gwiiag.kinesis.backend_events
where event_ingested_at >= 'yyyy-mm-dd'
  and event_name = 'EventName'
limit 100;

Warning

  • Always filter by event_ingested_at first - The table is partitioned by ingestion day for performance
  • Use event_name as secondary filter - The table is clustered by this field for faster queries
  • Add additional conditions after these two filters as needed
  • If you need to check data in the event, try querying the dbt_stage.event_name_as_snake_case table.

Performance Monitoring


Importing ad-hoc data

Sometimes we might receive some data that we need to import into BigQuery for ad-hoc analysis or backfill purposes. Before the import the file should be placed in the majority-bulk-upload in the dw-prod project. It should follow the naming convention: <date_of_export>_filename. It can then be imported into dw-prod-gwiiag.bulk_upload dataset. You can use UI to import the data, for CLI use the instructions below. Table name should be the same as the filename (snake_case).


JSON Import

BigQuery only supports newline-delimited JSON (ndJSON) format. See Loading JSON data from Cloud Storage.

Convert JSON List to ndJSON

If you have a JSON list file, convert it to ndJSON:

jq -c '.[]' json_list.json > result.ndjson

Import with CLI:

bq load \
  --autodetect \
  --null_marker='' \
  --source_format=NEWLINE_DELIMITED_JSON \
  dw-prod-gwiiag.bulk_upload.20241114_table_name \
  gs://majority-bulk-upload/20241114_table_name.json

Time Travel

BigQuery's time travel feature allows you to query historical data.

BigQuery retains historical data for: - 7 days by default - Check your table's time travel window before attempting recovery

select *
from dataset_name.table_name
for system_time as of timestamp_sub(current_timestamp(), interval 1 hour);

Common Use Cases

  • Recover accidentally deleted data
  • Compare current vs. historical states
  • Audit data changes
  • Restore tables to a previous state