If you have recently connected Google Analytics 4 (GA4) to BigQuery, you have likely attempted a simple query to extract a page URL and hit a wall.
You probably tried writing standard SQL like this:
SELECT
event_name,
event_params.page_location -- This fails
FROM
`project.analytics_12345.events_20240101`
LIMIT 10
And BigQuery immediately responded with this error:
Error: Cannot access field page_location on a value with type ARRAY<STRUCT<key STRING, value STRUCT<string_value STRING, int_value INT64...>>>
This is the most common hurdle for engineers transitioning from Universal Analytics or standard relational databases to the GA4 schema. The data is not missing; it is packed inside a nested structure known as a REPEATED RECORD.
To query this data efficiently, you must master the UNNEST operator. This guide covers the architectural reason behind this schema, the precise syntax to extract data, and how to optimize these queries for cost and performance.
The Architecture: Why GA4 Data is Nested
To write effective SQL, you must understand the storage model. In a traditional relational database (like PostgreSQL or MySQL), every event parameter (page title, session ID, button text) would be its own column.
However, GA4 is schema-less by design. Users can define custom dimensions and metrics at any time. If BigQuery used a dedicated column for every possible custom parameter, the table schema would break effectively immediately due to column limits and sparse data inefficiencies.
Instead, Google uses the Entity-Attribute-Value (EAV) model nested within a single column called event_params.
The event_params Schema
In BigQuery, the data type is ARRAY<STRUCT>. Visually, a single row in your database looks like this JSON representation:
"event_name": "page_view",
"event_params": [
{ "key": "page_location", "value": { "string_value": "/home", "int_value": null } },
{ "key": "ga_session_id", "value": { "string_value": null, "int_value": 1699999999 } },
{ "key": "page_title", "value": { "string_value": "Home Page", "int_value": null } }
]
Because event_params is an array (a list), you cannot select page_location directly. You must tell SQL to search inside the list for the key page_location and extract the corresponding value.
Solution 1: Scalar Subqueries (Best for Selecting Columns)
The cleanest way to extract specific parameters as columns is using Scalar Subqueries. This method allows you to transform the nested array into a flat column inline.
This is the preferred modern syntax for Analytics Engineers building flattening models (e.g., in dbt).
SELECT
event_date,
event_name,
-- Extracting a String Parameter
(SELECT value.string_value
FROM UNNEST(event_params)
WHERE key = 'page_location') AS page_location,
-- Extracting a String Parameter
(SELECT value.string_value
FROM UNNEST(event_params)
WHERE key = 'page_title') AS page_title,
-- Extracting an Integer Parameter (Note: int_value)
(SELECT value.int_value
FROM UNNEST(event_params)
WHERE key = 'ga_session_id') AS session_id
FROM
`YOUR_PROJECT.analytics_123456789.events_20240101`
WHERE
event_name = 'page_view'
LIMIT 100;
Why this works
UNNEST(event_params): Temporarily explodes the array into a virtual table for that specific row.WHERE key = '...': Filters that virtual table to find the specific parameter you want.value.string_value: Selects the actual data. Note that you must target the correct type field (explained in the Data Types section below).
Solution 2: CROSS JOIN UNNEST (Best for Filtering)
While scalar subqueries are great for selecting columns, they are verbose if you need to filter rows based on a parameter.
If you want to query "All events where page_location equals /checkout", using CROSS JOIN UNNEST is often more performant and readable.
SELECT
event_name,
event_timestamp,
params.value.string_value AS page_location
FROM
`YOUR_PROJECT.analytics_123456789.events_20240101`,
UNNEST(event_params) AS params
WHERE
params.key = 'page_location'
AND params.value.string_value = 'https://www.example.com/checkout'
The Comma Syntax
In BigQuery Standard SQL, the comma , between the table name and UNNEST is shorthand for CROSS JOIN. This operation creates a row for every combination of the parent row and the array elements. If an event has 5 parameters, this join temporarily creates 5 rows, allowing you to filter against them directly.
Handling Polymorphic Data Types
One of the most frequent errors occurs when developers try to select value.string_value for a number, or vice versa. The GA4 value struct contains specific fields for different data types.
You must query the correct field based on the data type sent by GA4:
string_value: URLs, Titles, Campaign Names, IDs stored as strings.int_value:ga_session_id,engagement_time_msec, counts.float_value: Currency values, calculated metrics.double_value: Similar to float, often used for high-precision revenue.
The Coalesce Safety Net
If you are unsure whether a custom parameter (e.g., order_id) was sent as a string or a number, use COALESCE to check multiple fields. CAST the number to a string to unify the column type.
(SELECT COALESCE(value.string_value, CAST(value.int_value AS STRING))
FROM UNNEST(event_params)
WHERE key = 'order_id') AS order_id
Advanced Optimization: Creating a UDF
If you are a Principal Engineer setting up the environment for a team of analysts, do not make them write UNNEST logic every time. Create a Persistent User-Defined Function (UDF).
This encapsulates the logic and ensures consistency across your project.
CREATE OR REPLACE FUNCTION `YOUR_PROJECT.your_dataset.get_param`(
params ARRAY<STRUCT<key STRING, value STRUCT<string_value STRING, int_value INT64, float_value FLOAT64, double_value FLOAT64>>>,
param_key STRING,
type STRING
) AS (
(SELECT
CASE
WHEN type = 'string' THEN value.string_value
WHEN type = 'int' THEN CAST(value.int_value AS STRING)
WHEN type = 'double' THEN CAST(value.double_value AS STRING)
ELSE NULL
END
FROM UNNEST(params)
WHERE key = param_key)
);
Usage:
SELECT
event_name,
`YOUR_PROJECT.your_dataset.get_param`(event_params, 'page_location', 'string') AS page_url
FROM
`YOUR_PROJECT.analytics_123456789.events_2024*`
Summary and Best Practices
- Don't Panic: Nested data is a feature, not a bug. It allows GA4 to be flexible without altering database schemas.
- Use Scalar Subqueries: For transforming the raw table into a flat table (e.g.,
SELECT (SELECT ... FROM UNNEST...)). - Watch Your Types: Ensure you are selecting
int_valuefor IDs/Metrics andstring_valuefor text. - Mind the Partition: Always query strictly necessary date partitions (using
_TABLE_SUFFIXor partitioned tables) before unnesting large datasets to keep BigQuery costs low.
By mastering UNNEST, you unlock the full power of raw GA4 data, allowing for attribution modeling and user journey analysis that the standard GA4 interface cannot provide.