Database.jl

This module implements tools to facilitate the work with EEG databases, in particular, BCI databases in NY format — see the BCI Databases Overview.

To learn how to use BCI databases, see Tutorial ML 1.

Methods

FunctionDescription
Eegle.Database.infoDBimmutable structure holding the information summarizing an EEG database
Eegle.Database.loadNYdbreturn a list of .npz files in a directory (this is considered a database)
Eegle.Database.infoNYdbprint, save and return metadata about a database
Eegle.Database.selectDBselect databases and sessions based on inclusion criteria
Eegle.Database.weightsDBget weights for each session of a database for statistical analysis

📖

Eegle.Database.infoDBType
struct infoDB
    dbName              :: String
    condition           :: String
    paradigm            :: String
    files               :: Vector{String}
    nSessions           :: Vector{Int}
    nTrials             :: Dict{String, Vector{Int}}
    nSubjects           :: Int
    nSensors            :: Int
    sensors             :: Vector{String}
    sensorType          :: String
    nClasses            :: Int
    cLabels             :: Vector{String}
    sr                  :: Int
    wl                  :: Int
    offset              :: Int
    filter              :: String
    doi                 :: String
    hardware            :: String
    software            :: String
    reference           :: String
    ground              :: String
    place               :: String
    investigators       :: String
    repository          :: String
    description         :: String
    timestamp           :: Int
    formatVersion       :: String
end

Immutable structure holding the summary information and metadata of an EEG database (DB) in NY format.

It is created by functions infoNYdb and selectDB.

Fields

  • .files returns a list of .npz files, each corresponding to a session in the database. The length of .files is equal to the total number of sessions
  • .nSessions: vector holding the number of sessions per subject
  • .nTrials: a dictionary mapping each class label to a vector containing the number of trials per session for that class. For example, nTrials["left_hand"] returns a vector with the number of trials for "left_hand" across all sessions.

The following fields are assumed constant across all sessions of the database. This is checked by Eegle when a database is read.

  • .dbName: name or identifier of the database
  • .condition: experimental condition under which the DB has been recorded
  • .paradigm: for BCI data, this may be :P300, :ERP or :MI — see BCI paradigm
  • .nSubjects: total number of subjects composing the DB — see subject
  • .nSensors: number of sensors composing the recordings (e.g., EEG electrodes)
  • .sensors: list of sensor labels (e.g., [Fz, Cz, ...,Oz])
  • .sensorType: type of sensors (wet, dry, Ag/Cl, ...)
  • .nClasses: number of classes for which labels are available
  • .cLabels: list of class labels
  • .sr: sampling rate of the recordings (in samples)
  • .wl: for BCI, this is the duration of trials (in samples)
  • .offset: shift to be applied to markers in order to determine the trial onset (in samples)
  • .filter: temporal filter that has been applied to the data
  • .hardware: equipment used to obtain the recordings (typically, the EEG amplifier)
  • .software: software used to obtain the recordings
  • .reference: label of the reference electrode for EEG differential amplifiers
  • .ground: label of the electrical ground electrode
  • .doi: digital object identifier (DOI) of the database
  • .place: place where the recordings have been obtained
  • .investigators: investigator(s) that have obtained the recordings
  • .repository: public repository where the DB has made accessible
  • .description: general description of the DB
  • .timestamp: date of the publication of the DB
  • .formatVersion: version of the NY format in which the recordings have been stored.
Eegle.Database.loadNYdbFunction
    function loadNYdb(dbDir=AbstractString, isin::String="")

Return a list of the complete paths of all .npz files found in a directory given as argument dbDir. For each NPZ file, there must be a corresponding YAML metadata file with the same name and extension .yml, otherwise the file is not included in the list.

If a string is provided as kwarg isin, only the files whose name contains the string will be included.

See Also

infoNYdb, FileSystem.getFilesInDir

Examples xxx

Eegle.Database.infoNYdbFunction
    function infoNYdb(dbDir)

Create a infoDB structure and show it in Julia's REPL.

The only argument (dbDir) is the directory holding all files of a database — see NY format.

This function carry out a sanity checks on the database and prints warnings if the checks fail.

Examples

db = infoNYdb(dbDir)
Eegle.Database.selectDBFunction
function selectDB(rootDir       :: String,
                  paradigm      :: Symbol;
        classes     :: Union{Vector{String}, Nothing} = 
                        paradigm == :P300 ? ["target", "nontarget"] : nothing,
        minTrials   :: Union{Int, Nothing} = nothing,
        summarize   :: Bool = true)

Select BCI databases pertaining to the given BCI paradigm. Optionally, each session of the selected databases is scrutinized to meet the provided inclusion criteria.

Return the selected databases as a list of infoDB structures, wherein, if inclusion criteria are provided, the infoDB.files field lists the included sessions only.

Arguments

  • rootDir: the directory on the local computer where to start the search. Any folder in this directory is a candidate database to be selected.
  • paradigm: the BCI paradigm to be used. Supported paradigms at this time are: :P300, :ERP or :MI.
Tip

If a folder with the same name of the paradigm (for example: "MI") is found in rootDir, the search starts therein and not in rootDir.

Optional Keyword Arguments

  • classes: the labels of the classes the databases must include:
    • for the P300 paradigm the default classes are ["target", "nontarget"], as in the FII corpus.
    • for the MI and ERP paradigm there is no inclusion criterion based on class labels by default.
Tip

In the FII corpus, available MI class labels are: "lefthand", "righthand", "feet", "rest", "both_hands", and "tongue".

  • minTrials: the minimum number of trials for all classes in the sessions to be included.
  • summarize: if true (default) a summary table of the selected databases is printed in the REPL.

Examples

selectedDB = selectDB(.../directory_to_start_searching/, :P300)

selectedDB = selectDB(.../directory_to_start_searching/, :MI;
                      classes = ["left_hand", "right_hand"])

selectedDB = selectDB(.../directory_to_start_searching/, :MI;
                      classes = ["rest", "both_hands", "feet"],
                      minTrials = 50,
                      summarize = false)
Eegle.Database.weightsDBFunction
    function weightsDB(files)

Given a database provided by argument files as a list of .npz files, compute a weight for each session to be used in statistical analysis when merging the classification performance or any other relevant index across databases.

The goal of the weighting is to balance the contribution of different databases and the different subjects therein, considering both the number of unique subjects in each database and the fact that the number of session for each subject may be different.

The weight assigned to each session is inversely proportional to the square root of the number of unique subjects in the database and to the square root of the number of sessions available for the same subject.

Let $s_m$ be one of the $S_m$ sessions for each unique subject $m$, the weight $w_{m,s_m}$ for session $s_m$ is given by:

\[ w_{m,s_m} = \frac{\sqrt{M} \cdot \sqrt{S_m}}{N}\]

where $M$ is the number of unique subjects in the database and $N$ is the total number of sessions (i.e., length(files)).

This weighting ensures that the sum of the weights for each subject is proportional to

\[\sqrt{M} \cdot \sqrt{S_m}\]

For example,

  • if the database has $M = 100$ subjects and each has 1 session, the total weight for each subject is $\sqrt{100} \cdot \sum_{m=1}^{100} \frac{\sqrt{1}}{N} = 10$
  • if each of the 100 subjects has 4 sessions, the total weight for each subject is $\sqrt{100} \cdot \sum_{m=1}^{100} \frac{\sqrt{4}}{N} = 20$.

This is a compromise between two extreme strategies commonly used when merging indices across databases, which are both inadequate:

  • Uniform per-session weights (i.e., all sessions contribute equally), which favors larger databases or those with many sessions
  • Uniform per-database weights (i.e., all databases contribute equally), which overemphasizes small databases.

Once obtained the weights for several databases, they can be globally normalized in any desired way.

Return

  • weights: a vector of length $N$, containing the weight for each session in files
  • schedule: an $N × 2$ matrix of integers where:
    • the first column contains the index of the subject to which the session belongs
    • the second column contains the number of sessions for that subject.

Examples

w, schedule = weightsDB(files)

Tutorials xxx