Views

DataAxesFormats.Views Module

Create a different view of Daf data using queries. This is a very flexible mechanism which can be used for a variety of use cases. A simple way of using this is to view a subset of the data as a Daf data set. A variant of this also renames the data properties to adapt them to the requirements of some computation. This makes it simpler to create such tools (using fixed, generic property names) and apply them to arbitrary data (with arbitrary specific property names).

DataAxesFormats.Views.viewer Function
viewer(
    daf::DafReader;
    [name::Maybe{AbstractString} = nothing,
    axes::Maybe{ViewAxes} = nothing,
    data::Maybe{ViewData} = nothing]
)::DafReadOnly

Wrap daf data with a read-only DafView . The exposed view is defined by a set of queries applied to the original data. These queries are evaluated only when data is actually accessed. Therefore, creating a view is a relatively cheap operation.

If the name is not specified, the result name will be based on the name of daf , with a .view suffix.

Queries are listed separately for axes and data.

Note

As an optimization, calling viewer with all-empty (default) arguments returns a simple DafReadOnlyWrapper , that is, it is equivalent to calling read_only . Additionally, saying data = ALL_DATA will expose all the data using any of the exposed axes; you can write data = [ALL_DATA..., key => nothing] to hide specific data based on its key .

DataAxesFormats.Views.ViewAxis Type

Specify an axis to expose from a view.

This is specified as a vector of pairs (similar to initializing a Dict ). The order of the pairs matter (last one wins). We also allow specifying tuples instead of pairs to make it easy to invoke the API from other languages such as Python which do not have the concept of a Pair .

If the key is "*" , then it is replaced by all the names of the axes of the wrapped daf data. Otherwise, the key is just the name of an axis.

If the value is nothing , then the axis will not be exposed by the view. If the value is "=" , then the axis will be exposed with the same entries as in the original daf data. Otherwise the value is any valid query that returns a vector of (unique!) strings to serve as the vector entries.

That is, specifying "*" (or, ALL_AXES will expose all the original daf data axes from the view. Following this by saying "type" => nothing will hide the type from the view. Saying "batch" => q"/ batch & age > 1 will expose the batch axis, but only including the batches whose age property is greater than 1.

DataAxesFormats.Views.ViewAxes Type

Specify all the axes to expose from a view. We would have liked to specify this as AbstractVector{<:ViewAxis} but Julia in its infinite wisdom considers ["a", "b" => "c"] to be a Vector{Any} , which would require literals to be annotated with the type.

DataAxesFormats.Views.ViewDatum Type

Specify a single datum to expose from view. This is specified as a vector of pairs (similar to initializing a Dict ). The order of the pairs matter (last one wins). We also allow specifying tuples instead of pairs to make it easy to invoke the API from other languages such as Python which do not have the concept of a Pair .

Scalars are specified similarly to ViewAxes , except that the query should return a scalar instead of a vector. That is, saying "*" (or ALL_SCALARS ) will expose all the original daf data scalars from the view. Following this by saying "version" => nothing will hide the version from the view. Adding "total_umis" => q"/ cell / gene : UMIs %> Sum %> Sum" will expose a total_umis scalar containing the total sum of all UMIs of all genes in all cells, etc.

Vectors are specified similarly to scalars, but require a key specifying both an axis and a property name. The axis must be exposed by the view (based on the axes parameter). If the axis is "*" , it is replaces by all the exposed axis names specified by the axes parameter. Similarly, if the property name is "*" (e.g., ("gene", "*") ), then it is replaced by all the vector properties of the exposed axis in the base data. Therefore specifying ("*", "*") (or ALL_VECTORS )`, all vector properties of all the (exposed) axes will also be exposed.

The value for vectors must be the suffix of a vector query based on the appropriate axis; a value of "=" is again used to expose the property as-is.

For example, specifying axes = ["cell" => q"/ cell & type = TCell"] , and then data = [("cell", "total_noisy_UMIs") => q"/ gene & noisy : UMIs %> Sum will expose total_noisy_UMIs as a per- cell vector property, using the query / gene & noisy / cell & type = TCell : UMIs %> Sum , which will compute the sum of the UMIs of all the noisy genes for each cell (whose type is TCell ).

Matrices require a key specifying both axes and a property name. The axes must both be exposed by the view (based on the axes parameter). Again if any or both of the axes are "*" , they are replaced by all the exposed axes (based on the axes parameter), and likewise if the name is "*" , it replaced by all the matrix properties of the axes. The value for matrices can again be "=" to expose the property as is, or the suffix of a matrix query. Therefore specifying ("*", "*", "*") (or, ALL_MATRICES ), all matrix properties of all the (exposed) axes will also be exposed.

That is, assuming a gene and cell axes were exposed by the axes parameter, then specifying that ("cell", "gene", "log_UMIs") => q": UMIs % Log base 2 eps" will expose the matrix log_UMIs for each cell and gene.

The order of the axes does not matter, so data = [("gene", "cell", "UMIs") => "="] has the same effect as data = [("cell", "gene", "UMIs") => "="] .

3D Tensors require a key specifying the main axis, followed by two axes, and a property name. All the axes must be exposed by the view (based on the axes parameter). In this cases, none of the axes may be "*" . The value can only be be "=" to expose all the matrix properties of the tensor as they are or nothing to hide all of them; that is, views can expose or hide existing (possibly masked) 3D tensors, but can't be used to create new ones.

That is, assuming a gene , cell and batch axes were exposed by the axes parameters, then specifying that ("batch", "cell", "gene", "is_measured") => "=" will expose the set of per-cell-per-gene matrices batch1_is_measured , batch2_is_measured , etc.

DataAxesFormats.Views.ViewData Type

Specify all the data to expose from a view. We would have liked to specify this as AbstractVector{<:ViewDatum} but Julia in its infinite wisdom considers ["a", "b" => "c"] to be a Vector{Any} , which would require literals to be annotated with the type.

Note

TensorKey s are interpreted after interpreting all MatrixKey s, so they will override them even if they appear earlier in the list of keys. For clarity it is best to list them at the very end of the list.

DataAxesFormats.Views.VIEW_ALL_DATA Constant

A vector to use in the data parameters of viewer to specify the view exposes all the data of the exposed axes. This is the default, so the only reason do this is to say VIEW_ALL_DATA... followed by some modifications.

Index