Readers

DataAxesFormats.Readers Module

The DafReader interface specifies a high-level API for reading Daf data. This API is implemented here, on top of the low-level FormatReader API. The high-level API provides thread safety so the low-level API can (mostly) ignore this issue.

Each data set is given a name to use in error messages etc. You can explicitly set this name when creating a Daf object. Otherwise, when opening an existing data set, if it contains a scalar "name" property, it is used. Otherwise some reasonable default is used. In all cases, object names are passed through unique_name to avoid ambiguity.

Data properties are identified by a unique name given the axes they are based on. That is, there is a separate namespace for scalar properties, vector properties for each specific axis, and matrix properties for each unordered pair of axes.

For matrices, we keep careful track of their MatrixLayouts . Returned matrices are always in column-major layout, using relayout! if necessary. As this is an expensive operation, we'll cache the result in memory. Similarly, we cache the results of applying a query to the data. We allow clearing the cache to reduce memory usage, if necessary.

The data API is the high-level API intended to be used from outside the package, and is therefore re-exported from the top-level Daf namespace. It provides additional functionality on top of the low-level FormatReader implementation, accepting more general data types, automatically dealing with relayout! when needed. In particular, it enforces single-writer multiple-readers for each data set, so the format code can ignore multi-threading and still be thread-safe.

Note

In the APIs below, when getting a value, specifying a default of undef means that it is an error for the value not to exist. In contrast, specifying a default of nothing means it is OK for the value not to exist, returning nothing . Specifying an actual value for default means it is OK for the value not to exist, returning the default instead. This is in spirit with, but not identical to, undef being used as a flag for array construction saying "there is no initializer". If you feel this is an abuse of the undef value, take some comfort in that it is the default value for the default , so you almost never have to write it explicitly in your code.

DataAxesFormats.Readers.description Function
description(daf::DafReader[; deep::Bool = false, cache::Bool = false])::AbstractString

Return a (multi-line) description of the contents of daf . This tries to hit a sweet spot between usefulness and terseness. If cache , also describes the content of the cache. If deep , also describes any data set nested inside this one (if any).

Scalar properties

DataAxesFormats.Readers.scalars_set Function
scalars_set(daf::DafReader)::AbstractSet{<:AbstractString}

The names of the scalar properties in daf .

Note

There's no immutable set type in Julia for us to return. If you do modify the result set, bad things will happen.

DataAxesFormats.Readers.get_scalar Function
get_scalar(
    daf::DafReader,
    name::AbstractString;
    [default::Union{StorageScalar, Nothing, UndefInitializer} = undef]
)::Maybe{StorageScalar}

Get the value of a scalar property with some name in daf .

If default is undef (the default), this first verifies the name scalar property exists in daf . Otherwise default will be returned if the property does not exist.

Readers axes

DataAxesFormats.Readers.axes_set Function
axes_set(daf::DafReader)::AbstractSet{<:AbstractString}

The names of the axes of daf .

Note

There's no immutable set type in Julia for us to return. If you do modify the result set, bad things will happen.

DataAxesFormats.Readers.axis_vector Function
axis_vector(
    daf::DafReader,
    axis::AbstractString;
    [default::Union{Nothing, UndefInitializer} = undef]
)::Maybe{AbstractVector{<:AbstractString}}

The array of unique names of the entries of some axis of daf . This is similar to doing get_vector for the special name property, except that it returns a simple vector (array) of strings instead of a NamedVector .

If default is undef (the default), this verifies the axis exists in daf . Otherwise, the default is nothing , which is returned if the axis does not exist.

DataAxesFormats.Readers.axis_dict Function
axis_dict(daf::DafReader, axis::AbstractString)::AbstractDict{<:AbstractString, <:Integer}

Return a dictionary converting axis entry names to their integer index.

DataAxesFormats.Readers.axis_indices Function
axis_indices(daf::DafReader, axis::AbstractString, entries::AbstractVector{<:AbstractString})::AbstractVector{<:Integer}

Return a vector of the indices of the entries in the axis .

DataAxesFormats.Readers.axis_length Function
axis_length(daf::DafReader, axis::AbstractString)::Int64

The number of entries along the axis in daf .

This first verifies the axis exists in daf .

Vector properties

DataAxesFormats.Readers.has_vector Function
has_vector(daf::DafReader, axis::AbstractString, name::AbstractString)::Bool

Check whether a vector property with some name exists for the axis in daf . This is always true for the special name property.

This first verifies the axis exists in daf .

DataAxesFormats.Readers.vectors_set Function
vectors_set(daf::DafReader, axis::AbstractString)::AbstractSet{<:AbstractString}

The names of the vector properties for the axis in daf , not including the special name property.

This first verifies the axis exists in daf .

Note

There's no immutable set type in Julia for us to return. If you do modify the result set, bad things will happen.

DataAxesFormats.Readers.get_vector Function
get_vector(
    daf::DafReader,
    axis::AbstractString,
    name::AbstractString;
    [default::Union{StorageScalar, StorageVector, Nothing, UndefInitializer} = undef]
)::Maybe{NamedVector}

Get the vector property with some name for some axis in daf . The names of the result are the names of the vector entries (same as returned by axis_vector ). The special property name returns an array whose values are also the (read-only) names of the entries of the axis.

This first verifies the axis exists in daf . If default is undef (the default), this first verifies the name vector exists in daf . Otherwise, if default is nothing , it will be returned. If it is a StorageVector , it has to be of the same size as the axis , and is returned. If it is a StorageScalar . Otherwise, a new Vector is created of the correct size containing the default , and is returned.

Matrix properties

DataAxesFormats.Readers.has_matrix Function
has_matrix(
    daf::DafReader,
    rows_axis::AbstractString,
    columns_axis::AbstractString,
    name::AbstractString;
    [relayout::Bool = true]
)::Bool

Check whether a matrix property with some name exists for the rows_axis and the columns_axis in daf . Since this is Julia, this means a column-major matrix. A daf may contain two copies of the same data, in which case it would report the matrix under both axis orders.

If relayout (the default), this will also check whether the data exists in the other layout (that is, with flipped axes).

This first verifies the rows_axis and columns_axis exists in daf .

DataAxesFormats.Readers.matrices_set Function
matrices_set(
    daf::DafReader,
    rows_axis::AbstractString,
    columns_axis::AbstractString;
    [relayout::Bool = true]
)::AbstractSet{<:AbstractString}

The names of the matrix properties for the rows_axis and columns_axis in daf .

If relayout (default), then this will include the names of matrices that exist in the other layout (that is, with flipped axes).

This first verifies the rows_axis and columns_axis exist in daf .

Note

There's no immutable set type in Julia for us to return. If you do modify the result set, bad things will happen.

DataAxesFormats.Readers.get_matrix Function
get_matrix(
    daf::DafReader,
    rows_axis::AbstractString,
    columns_axis::AbstractString,
    name::AbstractString;
    [default::Union{StorageReal, StorageMatrix, Nothing, UndefInitializer} = undef,
    relayout::Bool = true]
)::Maybe{NamedMatrix}

Get the column-major matrix property with some name for some rows_axis and columns_axis in daf . The names of the result axes are the names of the relevant axes entries (same as returned by axis_vector ).

If relayout (the default), then if the matrix is only stored in the other memory layout (that is, with flipped axes), then automatically call relayout! to compute the result. If daf isa DafWriter , then store the result for future use; otherwise, just cache it as MemoryData . This may lock up very large amounts of memory; you can call empty_cache! to release it.

This first verifies the rows_axis and columns_axis exist in daf . If default is undef (the default), this first verifies the name matrix exists in daf . Otherwise, if default is nothing , it is returned. If default is a StorageMatrix , it has to be of the same size as the rows_axis and columns_axis , and is returned. Otherwise, a new Matrix is created of the correct size containing the default , and is returned.

Utilities

DataAxesFormats.Readers.axis_version_counter Function
axis_version_counter(daf::DafReader, axis::AbstractString)::UInt32

Return the version number of the axis. This is incremented every time delete_axis! is called. It is used by interfaces to other programming languages to minimize copying data.

Note

This is purely in-memory per-instance, and not a global persistent version counter. That is, the version counter starts at zero even if opening a persistent disk daf data set.

DataAxesFormats.Readers.vector_version_counter Function
vector_version_counter(daf::DafReader, axis::AbstractString, name::AbstractString)::UInt32

Return the version number of the vector. This is incremented every time set_vector! , empty_dense_vector! or empty_sparse_vector! are called. It is used by interfaces to other programming languages to minimize copying data.

Note

This is purely in-memory per-instance, and not a global persistent version counter. That is, the version counter starts at zero even if opening a persistent disk daf data set.

DataAxesFormats.Readers.matrix_version_counter Function
matrix_version_counter(
    daf::DafReader,
    rows_axis::AbstractString,
    columns_axis::AbstractString,
    name::AbstractString
)::UInt32

Return the version number of the matrix. The order of the axes does not matter. This is incremented every time set_matrix! , empty_dense_matrix! or empty_sparse_matrix! are called. It is used by interfaces to other programming languages to minimize copying data.

Note

This is purely in-memory per-instance, and not a global persistent version counter. That is, the version counter starts at zero even if opening a persistent disk daf data set.

Index