Skip to main content

GA4 BigQuery SQL: How to Unnest event_params Correctly

 If you have recently connected Google Analytics 4 (GA4) to BigQuery, you have likely attempted a simple query to extract a page URL and hit a wall.

You probably tried writing standard SQL like this:

SELECT 
  event_name, 
  event_params.page_location -- This fails
FROM 
  `project.analytics_12345.events_20240101`
LIMIT 10

And BigQuery immediately responded with this error:

Error: Cannot access field page_location on a value with type ARRAY<STRUCT<key STRING, value STRUCT<string_value STRING, int_value INT64...>>>

This is the most common hurdle for engineers transitioning from Universal Analytics or standard relational databases to the GA4 schema. The data is not missing; it is packed inside a nested structure known as a REPEATED RECORD.

To query this data efficiently, you must master the UNNEST operator. This guide covers the architectural reason behind this schema, the precise syntax to extract data, and how to optimize these queries for cost and performance.

The Architecture: Why GA4 Data is Nested

To write effective SQL, you must understand the storage model. In a traditional relational database (like PostgreSQL or MySQL), every event parameter (page title, session ID, button text) would be its own column.

However, GA4 is schema-less by design. Users can define custom dimensions and metrics at any time. If BigQuery used a dedicated column for every possible custom parameter, the table schema would break effectively immediately due to column limits and sparse data inefficiencies.

Instead, Google uses the Entity-Attribute-Value (EAV) model nested within a single column called event_params.

The event_params Schema

In BigQuery, the data type is ARRAY<STRUCT>. Visually, a single row in your database looks like this JSON representation:

"event_name": "page_view",
"event_params": [
  { "key": "page_location", "value": { "string_value": "/home", "int_value": null } },
  { "key": "ga_session_id", "value": { "string_value": null, "int_value": 1699999999 } },
  { "key": "page_title",    "value": { "string_value": "Home Page", "int_value": null } }
]

Because event_params is an array (a list), you cannot select page_location directly. You must tell SQL to search inside the list for the key page_location and extract the corresponding value.

Solution 1: Scalar Subqueries (Best for Selecting Columns)

The cleanest way to extract specific parameters as columns is using Scalar Subqueries. This method allows you to transform the nested array into a flat column inline.

This is the preferred modern syntax for Analytics Engineers building flattening models (e.g., in dbt).

SELECT
  event_date,
  event_name,
  -- Extracting a String Parameter
  (SELECT value.string_value 
   FROM UNNEST(event_params) 
   WHERE key = 'page_location') AS page_location,

  -- Extracting a String Parameter
  (SELECT value.string_value 
   FROM UNNEST(event_params) 
   WHERE key = 'page_title') AS page_title,

  -- Extracting an Integer Parameter (Note: int_value)
  (SELECT value.int_value 
   FROM UNNEST(event_params) 
   WHERE key = 'ga_session_id') AS session_id

FROM
  `YOUR_PROJECT.analytics_123456789.events_20240101`
WHERE
  event_name = 'page_view'
LIMIT 100;

Why this works

  1. UNNEST(event_params): Temporarily explodes the array into a virtual table for that specific row.
  2. WHERE key = '...': Filters that virtual table to find the specific parameter you want.
  3. value.string_value: Selects the actual data. Note that you must target the correct type field (explained in the Data Types section below).

Solution 2: CROSS JOIN UNNEST (Best for Filtering)

While scalar subqueries are great for selecting columns, they are verbose if you need to filter rows based on a parameter.

If you want to query "All events where page_location equals /checkout", using CROSS JOIN UNNEST is often more performant and readable.

SELECT
  event_name,
  event_timestamp,
  params.value.string_value AS page_location
FROM
  `YOUR_PROJECT.analytics_123456789.events_20240101`,
  UNNEST(event_params) AS params
WHERE
  params.key = 'page_location'
  AND params.value.string_value = 'https://www.example.com/checkout'

The Comma Syntax

In BigQuery Standard SQL, the comma , between the table name and UNNEST is shorthand for CROSS JOIN. This operation creates a row for every combination of the parent row and the array elements. If an event has 5 parameters, this join temporarily creates 5 rows, allowing you to filter against them directly.

Handling Polymorphic Data Types

One of the most frequent errors occurs when developers try to select value.string_value for a number, or vice versa. The GA4 value struct contains specific fields for different data types.

You must query the correct field based on the data type sent by GA4:

  1. string_value: URLs, Titles, Campaign Names, IDs stored as strings.
  2. int_valuega_session_idengagement_time_msec, counts.
  3. float_value: Currency values, calculated metrics.
  4. double_value: Similar to float, often used for high-precision revenue.

The Coalesce Safety Net

If you are unsure whether a custom parameter (e.g., order_id) was sent as a string or a number, use COALESCE to check multiple fields. CAST the number to a string to unify the column type.

(SELECT COALESCE(value.string_value, CAST(value.int_value AS STRING))
 FROM UNNEST(event_params) 
 WHERE key = 'order_id') AS order_id

Advanced Optimization: Creating a UDF

If you are a Principal Engineer setting up the environment for a team of analysts, do not make them write UNNEST logic every time. Create a Persistent User-Defined Function (UDF).

This encapsulates the logic and ensures consistency across your project.

CREATE OR REPLACE FUNCTION `YOUR_PROJECT.your_dataset.get_param`(
  params ARRAY<STRUCT<key STRING, value STRUCT<string_value STRING, int_value INT64, float_value FLOAT64, double_value FLOAT64>>>, 
  param_key STRING, 
  type STRING
) AS (
  (SELECT 
     CASE 
       WHEN type = 'string' THEN value.string_value
       WHEN type = 'int' THEN CAST(value.int_value AS STRING)
       WHEN type = 'double' THEN CAST(value.double_value AS STRING)
       ELSE NULL 
     END
   FROM UNNEST(params) 
   WHERE key = param_key)
);

Usage:

SELECT 
  event_name,
  `YOUR_PROJECT.your_dataset.get_param`(event_params, 'page_location', 'string') AS page_url
FROM 
  `YOUR_PROJECT.analytics_123456789.events_2024*`

Summary and Best Practices

  1. Don't Panic: Nested data is a feature, not a bug. It allows GA4 to be flexible without altering database schemas.
  2. Use Scalar Subqueries: For transforming the raw table into a flat table (e.g., SELECT (SELECT ... FROM UNNEST...)).
  3. Watch Your Types: Ensure you are selecting int_value for IDs/Metrics and string_value for text.
  4. Mind the Partition: Always query strictly necessary date partitions (using _TABLE_SUFFIX or partitioned tables) before unnesting large datasets to keep BigQuery costs low.

By mastering UNNEST, you unlock the full power of raw GA4 data, allowing for attribution modeling and user journey analysis that the standard GA4 interface cannot provide.