Views
DataAxesFormats.Views
—
Module
Create a different view of
Daf
data using queries. This is a very flexible mechanism which can be used for a variety of use cases. A simple way of using this is to view a subset of the data as a
Daf
data set. A variant of this also renames the data properties to adapt them to the requirements of some computation. This makes it simpler to create such tools (using fixed, generic property names) and apply them to arbitrary data (with arbitrary specific property names).
DataAxesFormats.Views.DafView
—
Type
struct DafView(daf::DafReader) <: DafReader
A read-only wrapper for any
DafReader
data, which exposes an arbitrary view of it as another
DafReadOnly
. This isn't typically created manually; instead call
viewer
.
DataAxesFormats.Views.viewer
—
Function
viewer(
daf::DafReader;
[name::Maybe{AbstractString} = nothing,
axes::Maybe{ViewAxes} = nothing,
data::Maybe{ViewData} = nothing]
)::DafReadOnly
Wrap
daf
data with a read-only
DafView
. The exposed view is defined by a set of queries applied to the original data. These queries are evaluated only when data is actually accessed. Therefore, creating a view is a relatively cheap operation.
If the
name
is not specified, the result name will be based on the name of
daf
, with a
.view
suffix.
Queries are listed separately for axes and data.
As an optimization, calling
viewer
with all-empty (default) arguments returns a simple
DafReadOnlyWrapper
, that is, it is equivalent to calling
read_only
. Additionally, saying
data = ALL_DATA
will expose all the data using any of the exposed axes; you can write
data = [ALL_DATA..., key => nothing]
to hide specific data based on its
key
.
DataAxesFormats.Views.ViewAxis
—
Type
Specify an axis to expose from a view.
This is specified as a vector of pairs (similar to initializing a
Dict
). The order of the pairs matter (last one wins). We also allow specifying tuples instead of pairs to make it easy to invoke the API from other languages such as Python which do not have the concept of a
Pair
.
If the key is
"*"
, then it is replaced by all the names of the axes of the wrapped
daf
data. Otherwise, the key is just the name of an axis.
If the value is
nothing
, then the axis will
not
be exposed by the view. If the value is
"="
, then the axis will be exposed with the same entries as in the original
daf
data. Otherwise the value is any valid query that returns a vector of (unique!) strings to serve as the vector entries.
That is, specifying
"*"
(or,
ALL_AXES
will expose all the original
daf
data axes from the view. Following this by saying
"type" => nothing
will hide the
type
from the view. Saying
"batch" => q"/ batch & age > 1
will expose the
batch
axis, but only including the batches whose
age
property is greater than 1.
DataAxesFormats.Views.ViewAxes
—
Type
Specify all the axes to expose from a view. We would have liked to specify this as
AbstractVector{<:ViewAxis}
but Julia in its infinite wisdom considers
["a", "b" => "c"]
to be a
Vector{Any}
, which would require literals to be annotated with the type.
DataAxesFormats.Views.ViewDatum
—
Type
Specify a single datum to expose from view. This is specified as a vector of pairs (similar to initializing a
Dict
). The order of the pairs matter (last one wins). We also allow specifying tuples instead of pairs to make it easy to invoke the API from other languages such as Python which do not have the concept of a
Pair
.
Scalars
are specified similarly to
ViewAxes
, except that the query should return a scalar instead of a vector. That is, saying
"*"
(or
ALL_SCALARS
) will expose all the original
daf
data scalars from the view. Following this by saying
"version" => nothing
will hide the
version
from the view. Adding
"total_umis" => q"/ cell / gene : UMIs %> Sum %> Sum"
will expose a
total_umis
scalar containing the total sum of all UMIs of all genes in all cells, etc.
Vectors
are specified similarly to scalars, but require a key specifying both an axis and a property name. The axis must be exposed by the view (based on the
axes
parameter). If the axis is
"*"
, it is replaces by all the exposed axis names specified by the
axes
parameter. Similarly, if the property name is
"*"
(e.g.,
("gene", "*")
), then it is replaced by all the vector properties of the exposed axis in the base data. Therefore specifying
("*", "*")
(or
ALL_VECTORS
)`, all vector properties of all the (exposed) axes will also be exposed.
The value for vectors must be the suffix of a vector query based on the appropriate axis; a value of
"="
is again used to expose the property as-is.
For example, specifying
axes = ["cell" => q"/ cell & type = TCell"]
, and then
data = [("cell", "total_noisy_UMIs") => q"/ gene & noisy : UMIs %> Sum
will expose
total_noisy_UMIs
as a per-
cell
vector property, using the query
/ gene & noisy / cell & type = TCell : UMIs %> Sum
, which will compute the sum of the
UMIs
of all the noisy genes for each cell (whose
type
is
TCell
).
Matrices
require a key specifying both axes and a property name. The axes must both be exposed by the view (based on the
axes
parameter). Again if any or both of the axes are
"*"
, they are replaced by all the exposed axes (based on the
axes
parameter), and likewise if the name is
"*"
, it replaced by all the matrix properties of the axes. The value for matrices can again be
"="
to expose the property as is, or the suffix of a matrix query. Therefore specifying
("*", "*", "*")
(or,
ALL_MATRICES
), all matrix properties of all the (exposed) axes will also be exposed.
That is, assuming a
gene
and
cell
axes were exposed by the
axes
parameter, then specifying that
("cell", "gene", "log_UMIs") => q": UMIs % Log base 2 eps"
will expose the matrix
log_UMIs
for each cell and gene.
The order of the axes does not matter, so
data = [("gene", "cell", "UMIs") => "="]
has the same effect as
data = [("cell", "gene", "UMIs") => "="]
.
3D Tensors
require a key specifying the main axis, followed by two axes, and a property name. All the axes must be exposed by the view (based on the
axes
parameter). In this cases, none of the axes may be
"*"
. The value can only be be
"="
to expose all the matrix properties of the tensor as they are or
nothing
to hide all of them; that is, views can expose or hide existing (possibly masked) 3D tensors, but can't be used to create new ones.
That is, assuming a
gene
,
cell
and
batch
axes were exposed by the
axes
parameters, then specifying that
("batch", "cell", "gene", "is_measured") => "="
will expose the set of per-cell-per-gene matrices
batch1_is_measured
,
batch2_is_measured
, etc.
DataAxesFormats.Views.ViewData
—
Type
Specify all the data to expose from a view. We would have liked to specify this as
AbstractVector{<:ViewDatum}
but Julia in its infinite wisdom considers
["a", "b" => "c"]
to be a
Vector{Any}
, which would require literals to be annotated with the type.
DataAxesFormats.Views.ALL_SCALARS
—
Constant
A key to use in the
data
parameter of
viewer
to specify all the base data scalars.
DataAxesFormats.Views.VIEW_ALL_SCALARS
—
Constant
A pair to use in the
data
parameter of
viewer
to specify all the base data scalars.
DataAxesFormats.Views.ALL_AXES
—
Constant
A key to use in the
axes
parameter of
viewer
to specify all the base data axes.
DataAxesFormats.Views.VIEW_ALL_AXES
—
Constant
A pair to use in the
axes
parameter of
viewer
to specify all the base data axes.
DataAxesFormats.Views.ALL_VECTORS
—
Constant
A key to use in the
data
parameter of
viewer
to specify all the vectors of the exposed axes.
DataAxesFormats.Views.VIEW_ALL_VECTORS
—
Constant
A pair to use in the
data
parameter of
viewer
to specify all the vectors of the exposed axes.
DataAxesFormats.Views.ALL_MATRICES
—
Constant
A key to use in the
data
parameter of
viewer
to specify all the matrices of the exposed axes.
DataAxesFormats.Views.VIEW_ALL_MATRICES
—
Constant
A pair to use in the
data
parameter of
viewer
to specify all the matrices of the exposed axes.
DataAxesFormats.Views.VIEW_ALL_DATA
—
Constant
A vector to use in the
data
parameters of
viewer
to specify the view exposes all the data of the exposed axes. This is the default, so the only reason do this is to say
VIEW_ALL_DATA...
followed by some modifications.
Index
-
DataAxesFormats.Views -
DataAxesFormats.Views.ALL_AXES -
DataAxesFormats.Views.ALL_MATRICES -
DataAxesFormats.Views.ALL_SCALARS -
DataAxesFormats.Views.ALL_VECTORS -
DataAxesFormats.Views.VIEW_ALL_AXES -
DataAxesFormats.Views.VIEW_ALL_DATA -
DataAxesFormats.Views.VIEW_ALL_MATRICES -
DataAxesFormats.Views.VIEW_ALL_SCALARS -
DataAxesFormats.Views.VIEW_ALL_VECTORS -
DataAxesFormats.Views.DafView -
DataAxesFormats.Views.ViewAxes -
DataAxesFormats.Views.ViewAxis -
DataAxesFormats.Views.ViewData -
DataAxesFormats.Views.ViewDatum -
DataAxesFormats.Views.viewer