Skip to main content

Matter on ESP32: Troubleshooting 'Commissioning Failed' with Apple Home

 If you are developing a Matter-over-Wi-Fi device on the ESP32, you have likely encountered the specific problem that is the Apple Home "Commissioning Failed" error. Android devices commission instantly. The chip-tool on Linux works perfectly. Yet, when you scan the QR code with an iPhone, the Home app spins on "Connecting..." for 30 seconds before unceremoniously dropping the connection.

Apple provides zero logs to the user. However, the issue almost always boils down to two factors specific to the ESP32 implementation: BLE Advertisement intervals violating Apple's strict accessory design guidelines and Memory exhaustion during the PASE (Passcode Authenticated Session Establishment) handshake.

Here is the root cause analysis and the production-grade fix to stabilize your Matter commissioning.

The Root Cause: Timing and Fragmentation

Matter commissioning (specifically PASE) begins over Bluetooth Low Energy (BLE). The Commissioner (the iPhone) must discover, connect, and perform a heavy cryptographic handshake with the Commissionee (ESP32) before handing off Wi-Fi credentials.

1. The BLE Interval Mismatch

Apple's CoreBluetooth stack is aggressive. If your BLE advertising interval is too "lazy" (to save power), the iPhone’s scanning window often misses the advertisement packet entirely, or the connection negotiation times out before the handshake begins. The default ESP-IDF Bluetooth settings prioritize balanced power consumption, which is often too slow for the strict timeouts in HomeKit/Matter on iOS.

2. The LwIP/mbedTLS Bottleneck

Once BLE creates the link, Matter establishes a secure session using SPAKE2+. This requires significant heap allocation for mbedTLS context and large packet buffers (pbufs) in the LwIP stack.

On a standard ESP32 (WROOM/WROVER), if your partition table and memory configs aren't tuned, the intense burst of memory usage during the crypto-exchange causes a silent stack overflow or an allocation failure. The ESP32 doesn't crash; it simply drops the BLE connection. Apple Home interprets this drop as a generic failure.

The Fix: Tuning the Stack

We will solve this by enforcing strict BLE parameters and optimizing the memory map for cryptographic operations.

Part 1: The Partition Table (partitions.csv)

Do not use the default partition table. Matter requires extensive Non-Volatile Storage (NVS) for storing Fabrics, ACLs, and OTA data. If NVS is too small, key storage fails.

Create a file named partitions.csv in your project root:

# Name,   Type, SubType, Offset,  Size, Flags
nvs,      data, nvs,     ,        0x15000,
otadata,  data, ota,     ,        0x2000,
phy_init, data, phy,     ,        0x1000,
factory,  app,  factory, ,        1M,
ota_0,    app,  ota_0,   ,        1M,
ota_1,    app,  ota_1,   ,        1M,

Part 2: SDK Configuration (sdkconfig.defaults)

This is the critical step. You must force the BLE controller to adhere to Apple's 20ms-high-priority intervals and offload crypto from the software stack to the ESP32 hardware accelerator.

Append the following to your sdkconfig.defaults:

# --- MATTER MANDATORY ---
CONFIG_ESP_MATTER_ENABLE_MATTER_SERVER=y

# --- BLUETOOTH OPTIMIZATION FOR APPLE HOME ---
# Force the controller to use DRAM, freeing up IRAM for the app
CONFIG_BT_CTRL_MODE_BLE_ONLY=y
CONFIG_BT_CTRL_HCI_MODE_VHCI=y
# Critical: Increase ACL buffer size for large crypto payloads
CONFIG_BT_BLE_42_FEATURES_SUPPORTED=y
CONFIG_BT_BLE_50_FEATURES_SUPPORTED=n
CONFIG_BT_ACL_CONNECTIONS=3
# Apple requires tight advertising intervals (approx 20ms-30ms min) during commissioning
# These settings ensure the BLE stack processes events immediately
CONFIG_BT_NIMBLE_PINNED_TO_CORE_1=y
CONFIG_BT_NIMBLE_TASK_STACK_SIZE=5120

# --- MBEDTLS / CRYPTO ACCELERATION ---
# If this is off, the handshake takes too long and iOS times out
CONFIG_MBEDTLS_HARDWARE_MPI=y
CONFIG_MBEDTLS_HARDWARE_AES=y
CONFIG_MBEDTLS_HARDWARE_SHA=y
# Allow larger allocations for certificates
CONFIG_MBEDTLS_ASYMMETRIC_CONTENT_LEN=3072

# --- LWIP MEMORY TUNING ---
# Increase PBUF size to handle IPv6 multicast bursts during discovery
CONFIG_LWIP_IRAM_OPTIMIZATION=n
CONFIG_LWIP_TCPIP_RECVMBOX_SIZE=32
CONFIG_LWIP_UDP_RECVMBOX_SIZE=32
# Crucial for MDNS visibility after Wi-Fi connection
CONFIG_LWIP_MAX_SOCKETS=16
CONFIG_LWIP_SO_RCVBUF=y

Part 3: The C++ Implementation

In your main.cpp, we must explicitly handle the Matter stack events. When Apple Home fails, it often triggers a kCommissioningWindowClosed or kFailSafeTimerExpired event. We will hook into the AppDelegate to ensure we aren't blocking the main loop during these critical callbacks.

Here is a modern C++ implementation using the esp_matter SDK (based on Connected Home over IP).

#include <esp_log.h>
#include <esp_err.h>
#include <esp_matter.h>
#include <esp_matter_console.h>
#include <app/server/Server.h>
#include <platform/CHIPDeviceLayer.h>

#define TAG "MATTER_MAIN"

using namespace esp_matter;
using namespace chip;
using namespace chip::app;
using namespace chip::DeviceLayer;

// Custom AppDelegate to intercept Commissioning Lifecycle Events
class AppCallbacks : public AppDelegate {
public:
    void OnCommissioningSessionStarted() override {
        ESP_LOGI(TAG, "Commissioning session started. BLE connection active.");
        // Optimization: Increase CPU frequency to max during crypto handshake
        // to prevent timeout.
        rtc_clk_cpu_freq_set_config_fast(&rtc_clk_cpu_freq_config_240);
    }

    void OnCommissioningSessionStopped() override {
        ESP_LOGI(TAG, "Commissioning session stopped.");
    }

    void OnCommissioningWindowOpened() override {
        ESP_LOGI(TAG, "Commissioning window is now OPEN.");
    }

    void OnCommissioningWindowClosed() override {
        ESP_LOGW(TAG, "Commissioning window CLOSED. If this was unexpected, check BLE intervals.");
    }
};

static AppCallbacks sCallbacks;

// Event handler to debug network handoff issues
static void app_event_handler(const ChipDeviceEvent *event, intptr_t arg) {
    switch (event->Type) {
    case DeviceEventType::kInternetConnectivityChange:
        if (event->InternetConnectivityChange.IPv4 == kConnectivity_Established) {
            ESP_LOGI(TAG, "IPv4 Connected");
        }
        if (event->InternetConnectivityChange.IPv6 == kConnectivity_Established) {
            ESP_LOGI(TAG, "IPv6 Connected - Critical for Matter Discovery");
        }
        break;
        
    case DeviceEventType::kInterfaceIpAddressChanged:
        ESP_LOGI(TAG, "Interface IP Address Changed");
        break;
        
    case DeviceEventType::kCommissioningComplete:
        ESP_LOGI(TAG, "Commissioning Complete! Device is now part of the Fabric.");
        // Reset CPU frequency to save power if needed
        rtc_clk_cpu_freq_set_config_fast(&rtc_clk_cpu_freq_config_160);
        break;

    default:
        break;
    }
}

extern "C" void app_main() {
    esp_err_t err = ESP_OK;

    // 1. Initialize NVS (Crucial for storing certificates)
    err = nvs_flash_init();
    if (err == ESP_ERR_NVS_NO_FREE_PAGES || err == ESP_ERR_NVS_NEW_VERSION_FOUND) {
        ESP_ERROR_CHECK(nvs_flash_erase());
        err = nvs_flash_init();
    }
    ESP_ERROR_CHECK(err);

    // 2. Configure Matter Node
    node::config_t node_config;
    node_ptr_t node = node::create(&node_config, app_attribute_update_cb, app_identification_cb);
    if (!node) {
        ESP_LOGE(TAG, "Failed to create Matter node");
        abort();
    }

    // 3. Add a standard endpoint (e.g., OnOff Light)
    endpoint::on_off_light::config_t light_config;
    light_config.on_off.on_off = false;
    light_config.on_off.lighting.start_up_on_off = nullptr;
    endpoint_t *endpoint = endpoint::on_off_light::create(node, &light_config, ENDPOINT_FLAG_NONE, nullptr);
    if (!endpoint) {
        ESP_LOGE(TAG, "Failed to create light endpoint");
        abort();
    }

    // 4. Register Device Callbacks & Start Matter
    // This allows us to see exactly when the PASE handshake begins/ends
    Server::GetInstance().SetAppDelegate(&sCallbacks);
    
    // Register the raw event handler for debugging connectivity steps
    PlatformMgr().AddEventHandler(app_event_handler, 0);

    err = esp_matter::start(app_event_cb);
    ESP_ERROR_CHECK(err);

    // 5. Explicitly log the QR Code Payload
    // This ensures we aren't relying on a stale QR code from a previous flash
    PrintOnboardingCodes(chip::RendezvousInformationFlags(chip::RendezvousInformationFlag::kBLE));
}

Why This Works

Hardware Acceleration

By setting CONFIG_MBEDTLS_HARDWARE_MPI=y, we force the ESP32's hardware crypto accelerator to handle the heavy math involved in certificate validation. Without this, the software implementation (on the Xtensa core) is too slow. Apple's strict timeouts (approx 10-15 seconds for the entire handshake) will trigger a disconnect if the device lags here.

Memory Optimization

The CONFIG_BT_CTRL_MODE_BLE_ONLY and partitions.csv changes ensure that when the Matter stack requests 20KB+ of heap for the secure session context, the memory is physically available and not fragmented by Wi-Fi or Classic BT structures.

CPU Clock Management

Notice the rtc_clk_cpu_freq_set_config_fast call in OnCommissioningSessionStarted. We ensure the CPU runs at 240MHz during the handshake. While ESP32 power management usually handles this, forcing the frequency high during the BLE connection guarantees that no tick-interrupt latency causes a packet miss during the critical key exchange.

Conclusion

When Matter commissioning fails on Apple Home but works elsewhere, it is rarely a logic bug in your code. It is almost always a resource contention issue. By tightening the BLE advertising parameters and offloading cryptography to hardware, you satisfy Apple's strict QoS requirements and ensure the ESP32 can survive the memory pressure of the PASE handshake.