Unlocking EEG Model Inner Workings with Sparse Autoencoders

Published on May 15, 2026

EEG foundation models have set new benchmarks in clinical performance, but understanding their internal mechanics has remained a challenge. Clinicians often express concerns over the black-box nature of these models, which limits their trust in automated predictions. Traditional approaches struggle to clarify what these systems actually process and how they reach their conclusions.

Recent research introduces a method utilizing TopK Sparse Autoencoders (SAEs) applied to three unique EEG architectures: SleepFM, REVE, and LaBraM. This approach aims to extract interpretable features from model embeddings in a clinical framework. The study reveals a new metric to measure steering selectivity, helping delineate the operational states of these models.

The findings highlight critical weaknesses in EEG models, particularly concerning “wrecking-ball” interventions that can significantly degrade performance. Issues of age-pathology entanglement emerge, complicating attempts to independently manage key variables such as age and diagnosis. These insights prompt a reevaluation of model interactions and their clinical implications.

The introduction of a spectral decoder further enhances understanding manipulations to physiological signatures. This facilitates a clearer view of how the model’s internal manipulations correspond to recognizable clinical phenomena, paving the way for better integration of AI tools in healthcare settings.

Related News