in Dyslexia: Anomaly Patterns in Auditory Processing

. The search for a dyslexia diagnosis based on exclusively objective methods is currently a challenging task. Usually, this disorder is analyzed by means of behavioral tests prone to errors due to their subjective nature; e.g. the subject’s mood while doing the test can aﬀect the results. Understanding the brain processes involved is key to proportionate a correct analysis and avoid these types of problems. It is in this task, biomarkers like electroencephalograms can help to obtain an objective measurement of the brain behavior that can be used to perform several analyses and ultimately making a diagnosis, keeping the human interaction at minimum. In this work, we used recorded electroencephalograms of children with and without dyslexia while a sound stimulus is played. We aim to detect whether there are signiﬁcant diﬀerences in adaptation when the same stimulus is applied at diﬀerent times. Our results show that following this process, a machine learning pipeline can be built with AUC values up to 0.73.


Introduction
Developmental Dyslexia (DD) is a learning disorder with an estimate prevalence of 7% [17]. It is characterized by slow and inaccurate word recognition and by poor spelling and decoding abilities, despite individuals having adequate intelligence and sensory abilities. Diagnosis is usually made by means of behavioral tests. These tests, although performed by specialists, are not free from human error. For example, children's mood can affect results and the test analysis is subjective to some degree. In addition, this type of testing can only be done by children who already have writing and reading skills, thus limiting the impact of a possible intervention. This is why approaches using biomarkers are studied; they are an objective way to obtain information of the processes underlying the disorder and at the same time provide a method to diagnose those children in pre-read age, giving the opportunity to make an early intervention. Several biomarkers such as those obtained from electroencephalograms (EEG) and magnetoencephalograms (MEG), among others, have been used in the literature to further study the disorder and its causes. Additionally, it is common to derive connectivity metrics from these markers to determine how brain areas collaborate. A review of some of these metrics can be found in [10]. In this work, we use spectral phase lag index (PLI) that it is described in more detail in the methods section.
As mentioned above, the use of biomarkers such as EEG and MEG is widely used in the search for answers in neurological disorders and brain behavior. Proof of that are the works [1,2,8,9,13] where the signal brains are used to explore different diseases such as Alzheimer's disease, Parkinsonian syndromes or epileptic disorders. In a closer approach to this work, where auditory stimuli are used to trigger a reaction in the brain in the study of DD, Molinaro et al. [14] found that atypical neural entrainment at different rates may arise in affected subjects. Other works try to found the origin of DD is caused by the atypical synchronization in the right hemisphere [4,6]. Typically, these works are based on composing a set of features, from the entire EEG and comparing the differences between groups, without taking into account the possible brain adaptation of the brain over time.
In this work, we hypothesize that a stimulus adaptation must occur and be different in the control group and the dyslexic one, thus providing a way to differentiate both groups. This has to be reflected by a change in the brain behavior that can be analyzed through EEG signals. The metrics used here are typically used to measure the degree of synchronization between different EEG channels and bands, whereas in this work we take a different approach. Instead, we split the EEG in several segments and compute the same metrics this time between different segments representing different time slots.
The rest of the paper is organized as follows: In the next section, the materials and methods used in this work are explained, as well as the data acquisition. In addition, the synchrony metric is presented and the classification pipeline is detailed. The next section shows the results obtained applying the so-called methodology, and in the last section are discussed.

Data
The EEG data was recorded 1 by the Leeduca group at the University of Málaga. The signals were recorded by a Brainvision actiCHamp Plus with a 32 channels amplifier with a sampling rate 500 Hz. The montage 10-20 standardized system can be consulted in Fig. 1. EEG were obtained while several auditory stimuli were presented to the subject. These stimuli consisted of white noise, amplitude modulated at 4.8 Hz, 16 Hz and 20 Hz. They were presented to the subject in 2.5-min sessions each in the next order: 4.8 Hz-16 Hz-20 Hz-20 Hz-16 Hz-4.8 Hz for a total of 15 min. Stimuli were determined by expert linguistic psychologists studying the main frequency components present in voice, corresponding to syllables and phonemes. In the next table we show the subjects recorded in this experiment. They were extracted from a cohort (N = 700) of children from different primary schools of Andalucía (Spain). Comorbidities with other neurodevelopmental disorders such as Language Impairment (LI), Speech Sound Disorder (SSD), Attention Deficit Hyperactivity Disorder (ADHD), Autism, and other auditory or visual sensory deficit disorders were taken into account in the screening process, along with information about other relevant conditions which can affect reading achievement, as immigration or bilingualism [3] (Table 1). The raw EEGs were preprocessed in order to remove eye-blinking artifacts related as well as impedance variations due to movements. Also, ocular artifacts were removed by source separation using Independent Component Analysis (ICA) [11]. Finally, all channels were reference to the Cz channel and bandpass filtered by means of a finite response filter (FIR) to extract 5 different brain waves: Delta, 1.5-4 Hz; Theta, 4-8 Hz, Alpha, 8-13 Hz; Beta, 13-30 Hz; and Gamma, 30-80 Hz.

Connectivity Metric
As stated in the introduction section, there are several metrics that can be used to assess the connectivity. These metrics represent the synchronization strength between two signals. When both signals are from distinct brain areas, this can be seen as a measurement of cooperation between the areas, allowing a way of studying the brain behavior. However, in this work, we use them to compare the same signal at different times.
Phase Lag Index. Phase Lag Index (PLI) is defined as follows: where imag(S txy ) is the imaginary part of the cross-spectral density at time t and sgn is the sign function (−1 for negative values, +1 for positive values and 0 for zero values). The cross-spectral density S txy can be computed as S txy = abs(x)abs(y)e i θ x − θ y . This is a metric that mitigates the effects of volume conduction. That is, spurious connectivity caused by the recording of the same source by two different electrodes [18]. These connections will have phase lags of zero or π.
Hilbert Transform. Usually, the connectivity metrics use the instantaneous amplitude or phase, that can be computed from the analytic version of the raw (time) signal. This is a complex version of the signal obtained by means of the Hilbert Transform (HT). Once computed, it is possible to obtain the instantaneous amplitude and phase. Instantaneous frequency can be also obtained by differentiating the instantaneous phase. Hilbert Transform is defined as follows: and the analytic signal z i (t) for a signal x(t) can be obtained as From z i (t), it is straightforward to compute the instantaneous amplitude as and the instantaneous, unwrapped phase is

Classification Pipeline
After processing the EEGs as shown in Sect. 2.1. The next steps in the pipeline are summarized in Fig. 2 and are as follows: -First, we split and compute the metrics for each subject. We split the signal into 10 segments. That is, for each subject, we have 2 sets 40 Hz EEG records. We call 40 Hz UP (obtained by applying the auditory stimuly in an ascending frequency way, up 40 Hz) 40 Hz DOWN (obtained during the application of auditory stimuli in a descending frequency way, down to 4.8 Hz). Each one composed of 32 EEG channels, filtered to obtain 5 different bands. Following this method, we compute the metrics described in Sect. 2.2 between the first segment and the rest of them by pairs, thus obtaining 9 metric values for every channel and band. -The following steps are done inside a cross-validation loop: • In this step, we performed the selection of the best channels and bands separately for the dyslexic and control groups. We do this by an anomaly detection approach using a one class support vector machine (OCSVM). The goal is to detect which channels present significant differences between the applications of the different stimuli, 40 Hz [UP-DOWN].
For each channel and band, we train a OCSVM with the UP data and test it with the DOWN data. If there are differences, the DOWN data should be detected almost entirely as outliers. To ensure that those anomalies are not occurring by chance, a permutation test is performed. The criterion is based on the following equation: where N is the number of permutation test run (1000) and P ERMS are the number of those permutations whose classification score is equal or higher than the ground classification score. The threshold at which we consider a channel significant is 0.05.
• The last stage is the classification with two different classifiers: KNeigh-borsClassifier and Support Vector Machine (SVM). We take the most significant channels from the previous step and use them to create a mask that is applied to the dataset. The significant channels are multiplied by 1 and non-significant by 0. In order to avoid biasing the data, the mask for dyslexia and the control group is applied to each subject regardless of the group to which he or she belongs, although data are duplicated.

Results
In this section, the results of applying the methods of the last section are presented. We carried out experiments exploring all the possible combinations of bands and classifiers for 40 Hz stimulus. The classification strategy follows a cross validation pattern with 5 folds. This is true for both, mask and classification stage (Fig. 2). In Table 2 the results of these combinations are shown. The best overall values are obtained in the Alpha band. Although both of the classifiers used show similar AUC results, KNN discriminates better between both groups, thus yielding better sensitivity values. This is better seen in the Fig. 3.

Discussion
The proposed method seeks to find differences in adaptation through time of the brain when exposed to certain stimuli. To this end, a metric usually used to determine the synchronization of different brain regions is applied but between different time instants instead. First, we get the most discriminative channels and bands through an anomaly detection approach. Then, a mask is applied to the entire dataset to highlight these channels and bands and proceed with the classification stage. The results show the best values for the Alpha band and using a KNN classifier. Although the majority of works focused in dyslexia are based on an exploratory analysis, it is worth noting the efforts made in the search of an automatic diagnosis method using biomarkers. In Table 3 our method is compared with previous ones in this context. Works varying in the use of biomarkers from using structural imaging [19], MEG [5] to EEG [7,12,15,16]. The best values are often found when the features are extracted from MEGs, but it requires a 252 channel acquisition system, and MEG data is usually harder to obtain. Works using EEG like in [16] use interactive task like writing and typing, limiting the diagnosis age like the behavioural tests. The use of auditory stimuli overcomes this limitation and avoid possible bias introduced by the task. Works in which these kinds of stimuli are applied are found in works [12,15]. Although all of them search for the differences/synchrony in between several brain regions at a same time instant, unlike our work.

Conclusions
In this work, we present a method to detect differences in how dyslexic children vs non-dyslexic adapt to auditory stimuli at different frequencies (4.8 Hz, 16 Hz, 20 Hz) while recording an EEG. The same frequency stimulus were presented to the children twice. Then, we measure the synchrony through time at both stimulus application and seek for differences in adaptation. To this end, PLI is used as connectivity metric and a two steps pipeline is built to automate the classification process. The first step is an anomaly detection approach step to detect which channels and bands present the most significant differences, and the last step being a classification step. We found that there are differences in the Alpha band that allow a classifier to distinguish between control and dyslexic group with sensitivity values up to 0.82, specificity up to 0.76 and AUC of 0.76. These differences imply that dyslexic children adapt different when certain stimulus is applied to them and afterwards.
As a future work, a more intensive exploratory analysis of the rest of the stimuli is planned, as well as try other metrics others than PLI.