Queries
DataAxesFormats.Queries
—
Module
Extract data from a
DafReader
.
Construction
DataAxesFormats.Queries.Query
—
Type
Query(
query::QueryString,
operand_only::Maybe{Type{QueryOperation}} = nothing,
) <: QueryOperation
A query is a description of a (sub-)process for extracting some data from a
DafReader
. A full query is a sequence of
QueryOperation
, that when applied one at a time on some
DafReader
, result in a scalar, vector or matrix result.
To apply a query, invoke
get_query
to apply a query to some
DafReader
data (you can also use the shorthand
$daf[query]$
instead of
$get_query(daf, query)$
). By default, query operations will cache their results in memory as
QueryData
, to speed up repeated queries. This may lock up large amounts of memory; you can
empty_cache!
to release it.
Queries can be constructed in two ways. In code, a query can be built by chaining query operations (e.g., the expression
Axis("gene") |> Lookup("is_marker")
looks up the
is_marker
vector property of the
gene
axis).
Alternatively, a query can be parsed from a string, which needs to be parsed into a
Query
object (e.g., the above can be written as
Query("/gene:is_marker")
). See the
QUERY_OPERATORS
for a table of supported operators. Spaces (and comments) around the operators are optional; see
tokenize
for details. You can also convert a
Query
to a
string
(or
print
it, etc.) to see its representation. This is used for
error
messages and as a key when caching query results.
Since query strings use
\
as an escape character, it is easier to use
raw
string literals for queries (e.g.,
Query(raw"cell = ATCG\:B1 : age")
vs.
Query("cell = ATCG\\:B1 : age")
). To make this even easier we provide the
q
macro (e.g.,
q"cell = ATCG\:B1 : batch"
) which works similarly to Julia's standard
r
macro for literal
Regex
strings.
If the provided query string contains only an operand, and
operand_only
is specified, it is used as the operator (i.e.,
Query("metacell")
is an error, but
Query("metacell", Axis)
is the same as
Axis("metacell")
). This is useful when providing suffix queries (e.g., for
get_frame
).
Being able to represent queries as strings allows for reading them from configuration files and letting the user input them in an application UI (e.g., allowing the user to specify the X, Y and/or colors of a scatter plot using queries). At the same time, being able to incrementally build queries using code allows for convenient reuse (e.g., reusing axis sub-queries in
Daf
views), without having to go through the string representation.
Daf
provides a comprehensive set of
QueryOperation
s that can be used to construct queries. The
QUERY_OPERATORS
listed below provide the basic functionality (e.g., specifying an
Axis
or a property
Lookup
). In addition,
Daf
provides computation operations (
EltwiseOperation
and
ReductionOperation
), allowing for additional operations to be provided by external packages.
Obviously not all possible combinations of operations make sense (e.g.,
Lookup("is_marker") |> Axis("cell")
will not work). For the full list of valid combinations, see
NAMES_QUERY
,
SCALAR_QUERY
,
VECTOR_QUERY
and
MATRIX_QUERY
below.
This has started as a very simple query language (which it still is, for the simple cases) but became complex to allow for useful but complicated scenarios. In particular, the approach here of using a concatenative language (similar to
ggplot
) makes simple things simpler, but becames somewhat unnatural and restrictive for some of the more advanced operations. However, using an RPN or a LISP notation to better support such cases would have ended up with a much less nice syntax for the simple cases.
Hopefully we have covered sufficient ground so that we won't need to add further operations. Also, In most cases, you can write code that accesses the vectors/matrix data and performs whatever computation you want instead of writing a complex query; however, this isn't an option when defining views or adapters, which rely on the query mechanism for specifying the data.
DataAxesFormats.Queries.@q_str
—
Macro
q"..."
Shorthand for parsing a literal string as a
Query
. This is equivalent to
Query
(raw"...")
, that is, a
\
can be placed in the string without escaping it (except for before a
"
). This is very convenient for literal queries (e.g.,
q"/ cell = ATCG\:B1 : batch"
==
Query(raw"/ cell = ATCG\:B1 : batch")
==
Query("/ cell = ATCG\\:B1 : batch")
== `Axis("cell") |> IsEqual("ATCG:B1") |> Lookup("batch")).
DataAxesFormats.Queries.QueryString
—
Type
Most operations that take a query allow passing a string to be parsed into a query, or an actual
Query
object. This type is used as a convenient notation for such query parameters.
Functions
DataAxesFormats.Queries.get_query
—
Function
get_query(
daf::DafReader,
query::QueryString;
[cache::Bool = true]
)::Union{StorageScalar, NamedVector, NamedMatrix}
Apply the full
query
to the
Daf
data and return the result. By default, this will cache results, so repeated queries will be accelerated. This may consume a large amount of memory. You can disable it by specifying
cache = false
, or release the cached data using
empty_cache!
.
As a shorthand syntax you can also invoke this using
getindex
, that is, using the
[]
operator (e.g.,
daf[q"/ cell"]
is equivalent to
get_query(daf, q"/ cell")
).
DataAxesFormats.Queries.get_frame
—
Function
get_frame(
daf::DafReader,
axis::QueryString,
[columns::Maybe{FrameColumns} = nothing;
cache::Bool = true]
)::DataFrame end
Return a
DataFrame
containing multiple vectors of the same
axis
.
The
axis
can be either just the name of an axis (e.g.,
"cell"
), or a query for the axis (e.g.,
q"/ cell"
), possibly using a mask (e.g.,
q"/ cell & age > 1"
). The result of the query must be a vector of unique axis entry names.
If
columns
is not specified, the data frame will contain all the vector properties of the axis, in alphabetical order (since
DataFrame
has no concept of named rows, the 1st column will contain the name of the axis entry).
By default, this will cache results of all queries. This may consume a large amount of memory. You can disable it by specifying
cache = false
, or release the cached data using
empty_cache!
.
DataAxesFormats.Queries.FrameColumn
—
Type
Specify a column for
get_frame
for some axis. The most generic form is a pair
"column_name" => query
. Two shorthands apply: the pair
"column_name" => "="
is a shorthand for the pair
"column_name" => ": column_name"
, and so is the shorthand
"column_name"
(simple string).
We also allow specifying tuples instead of pairs to make it easy to invoke the API from other languages such as Python which do not have the concept of a
Pair
.
The query is combined with the axis query as follows (using
full_vector_query
:
- If the query contains
GroupBy, then the query must repeat any mask specified for the axis query. That is, if the axis query ismetacell & type = B, then the column query must be/ cell & metacell => type = B @ metacell : age %> Mean. Sorry for the inconvenience. TODO: Automatically inject the mask intoGroupBycolumn queries. - Otherwise, if the query starts with a (single) axis, then it should only contain a reduction; the axis query is automatically injected following it. That is, if the axis query is
gene & is_marker, then the full query for the column query/ metacell : fraction %> Meanwill be/ metacell / gene : fraction %> Mean(the mean gene expression in all metacells). We can't just concatenate the axis query and the columns query here, is because Julia, in its infinite wisdom, uses column-major matrices, like R and matlab; so reduction eliminates the rows instead of the columns of the matrix. - Otherwise (the typical case), we simply concatenate the axis query and the column query. That is, of the axis query is
cell & batch = B1and the column query is: age, then the full query will becell & batch = B1 : age. This is the simplest and most common case.
In all cases the (full) query must return a value for each entry of the axis.
DataAxesFormats.Queries.FrameColumns
—
Type
Specify all the columns to collect for a frame. We would have liked to specify this as
AbstractVector{<:FrameColumn}
but Julia in its infinite wisdom considers
["a", "b" => "c"]
to be a
Vector{Any}
, which would require literals to be annotated with the type.
DataAxesFormats.Queries.full_vector_query
—
Function
full_vector_query(
axis_query::Query,
vector_query::QueryString,
vector_name::Maybe{AbstractString} = nothing,
)::Query
Given a query for an axis, and some suffix query for a vector property, combine them into a full query for the vector values for the axis. This is used by
FrameColumn
for
get_frame
and also for queries of vector data in views.
DataAxesFormats.Queries.query_result_dimensions
—
Function
query_result_dimensions(query::QueryString)::Int
Return the number of dimensions (-1 - names, 0 - scalar, 1 - vector, 2 - matrix) of the results of a
query
. This also verifies the query is syntactically valid, though it may still fail if applied to specific data due to invalid data values or types.
DataAxesFormats.Queries.query_requires_relayout
—
Function
query_requires_relayout(daf::DafReader, query::QueryString)::Bool
Whether computing the
query
for the
daf
data requires
relayout!
of some matrix. This also verifies the query is syntactically valid and that the query can be computed, though it may still fail if applied to specific data due to invalid values or types.
DataAxesFormats.Queries.is_axis_query
—
Function
is_axis_query(query::QueryString)::Bool
Returns whether the
query
specifies a (possibly masked) axis. This also verifies the query is syntactically valid, though it may still fail if applied to specific data due to invalid data values or types.
Syntax
DataAxesFormats.Queries.QUERY_OPERATORS
—
Constant
Operators used to represent a
Query
as a string.
| Operator | Implementation | Description |
|---|---|---|
/
|
Axis
|
Specify a vector or matrix axis (e.g.,
/ cell : batch
or
/ cell / gene : UMIs
).
|
?
|
Names
|
1. Names of scalars or axes (
? axes
,
? scalars
).
|
2. Names of vectors of axis (e.g.,
/ cell ?
).
|
||
3. Names of matrices of axes (e.g.,
/ cell / gene ?
).
|
||
:
|
Lookup
|
Lookup a property (e.g.,
: version
,
/ cell : batch
or
/ cell / gene : UMIs
).
|
=>
|
Fetch
|
Fetch a property from another axis (e.g.,
/ cell : batch => age
).
|
;
|
MaskSlice
|
Slice a matrix mask (e.g.
/ cell & UMIs ; gene = FOX1 > 0
).
|
;=
|
SquareMaskColumn
|
Slice a square matrix mask column (e.g.
/ cell & outgoing ;= ATCG > 0
).
|
,=
|
SquareMaskRow
|
Slice a square matrix mask column (e.g.
/ cell & outgoing ,= ATCG > 0
).
|
!
|
AsAxis
|
1. Specify axis name when fetching a property (e.g.,
/ cell : manual ! type => color
).
|
2. Force all axis values when counting (e.g.,
/ cell : batch ! * manual ! type
).
|
||
3. Force all axis values when grouping (e.g.,
/ cell : age @ batch ! %> Mean
).
|
||
??
|
IfNot
|
1. Mask excluding false-ish values (e.g.,
/ cell : batch ?? => age
).
|
2. Default for false-ish lookup values (e.g.,
/ cell : type ?? Outlier
).
|
||
3. Default for false-ish fetched values (e.g.,
/ cell : batch ?? 1 => age
).
|
||
││
|
IfMissing
|
1. Value for missing lookup properties (e.g.,
/ gene : is_marker ││ false
).
|
| 2. Value for missing fetched properties (e.g., `/ cell : type | ||
| 3. Value for empty reduced vectors (e.g., `/ cell : type = LMPP => age %> Max | ||
%
|
EltwiseOperation
|
Apply an element-wise operation (e.g.,
/ cell / gene : UMIs % Log base 2 eps 1
).
|
%>
|
ReductionOperation
|
Apply a reduction operation (e.g.,
/ cell / gene : UMIs %> Sum
).
|
*
|
CountBy
|
Compute counts matrix (e.g.,
/ cell : age * type
).
|
@
|
GroupBy
|
1. Aggregate vector entries by a group (e.g.,
/ cell : age @ type %> Mean
).
|
2. Aggregate matrix row entries by a group (e.g.,
/ cell / gene : UMIs @ type %> Max
).
|
||
&
|
And
|
Restrict axis entries (e.g.,
/ gene & is_marker
).
|
&!
|
AndNot
|
Restrict axis entries (e.g.,
/ gene &! is_marker
).
|
│
|
Or
|
Expand axis entries (e.g.,
/ gene & is_marker │ is_noisy
).
|
│!
|
OrNot
|
Expand axis entries (e.g.,
/ gene & is_marker │! is_noisy
).
|
^
|
Xor
|
Flip axis entries (e.g.,
/ gene & is_marker ^ is_noisy
).
|
^!
|
XorNot
|
Flip axis entries (e.g.,
/ gene & is_marker ^! is_noisy
).
|
=
|
IsEqual
|
1. Select an entry from an axis (e.g.,
/ cell / gene = FOX1 : UMIs
).
|
2. Compare equal (e.g.,
/ cell & age = 1
).
|
||
!=
|
IsNotEqual
|
Compare not equal (e.g.,
/ cell & age != 1
).
|
<
|
IsLess
|
Compare less than (e.g.,
/ cell & age < 1
).
|
<=
|
IsLessEqual
|
Compare less or equal (e.g.,
/ cell & age <= 1
).
|
>
|
IsGreater
|
Compare greater than (e.g.,
/ cell & age > 1
).
|
>=
|
IsGreaterEqual
|
Compare greater or equal (e.g.,
/ cell & age >= 1
).
|
~
|
IsMatch
|
Compare match (e.g.,
/ gene & name ~ RP\[SL\]
).
|
!~
|
IsNotMatch
|
Compare not match (e.g.,
/ gene & name !~ RP\[SL\]
).
|
Due to Julia's Documenter limitations, the ASCII
|
character (
|
) is replaced by the Unicode
│
character (
│
) in the above table. Sigh.
DataAxesFormats.Queries.NAMES_QUERY
—
Constant
NAMES_QUERY
:= (
Names
scalars
|
Names
axes
|
Axis
Names
|
Axis
Axis
Names
)
A query returning a set of names:
- Looking up the set of names of the scalar properties (
? scalars). - Looking up the set of names of the axes (
? axes). - Looking up the set of names of the vector properties of an axis (e.g.,
/ cell ?). - Looking up the set of names of the matrix properties of a pair of axes (e.g.,
/ cell / gene ?).
DataAxesFormats.Queries.SCALAR_QUERY
—
Constant
SCALAR_QUERY
:= (
LOOKUP_PROPERTY
](@ref) |
VECTOR_ENTRY
|
MATRIX_ENTRY
|
REDUCE_VECTOR
)
EltwiseOperation
*
A query returning a scalar can be one of:
- Looking up the value of a scalar property (e.g.,
: versionwill return the value of the version scalar property). - Picking a single entry of a vector property (e.g.,
/ gene = FOX1 : is_markerwill return whether the gene named FOX1 is a marker gene). - Picking a single entry of a matrix property (e.g.,
/ gene = FOX1 / cell = ATCG : UMIswill return the number of UMIs of the FOX1 gene of the ATCG cell). - Reducing some vector into a single value (e.g.,
/ donor : age %> Meanwill compute the mean age of all the donors).
Either way, this can be followed by a series of
EltwiseOperation
to modify the scalar result (e.g.,
/ donor : age %> Mean % Log base 2 % Abs
will compute the absolute value of the log base 2 of the mean age of all the donors).
DataAxesFormats.Queries.LOOKUP_PROPERTY
—
Constant
LOOKUP_PROPERTY
:=
Lookup
IfMissing
?
Lookup the value of a scalar or matrix property. This is used on its own to access a scalar property (e.g.,
: version
) or combined with two axes to access a matrix property (e.g.,
/ cell / gene : UMIs
).
By default, it is an error if the property does not exist. However, if an
IfMissing
is provided, then this value is used instead (e.g.,
: version || Unknown
will return a
Unknown
if there is no
version
scalar property, and
/ cell / gene : UMIs || 0
will return an all-zero matrix if there is no
UMIs
matrix property).
Accessing a
VECTOR_PROPERTY
allows for more complex operations.
DataAxesFormats.Queries.VECTOR_ENTRY
—
Constant
VECTOR_ENTRY
:=
Axis
IsEqual
VECTOR_LOOKUP
Lookup the scalar value of some entry of a vector property of some axis (e.g.,
/ gene = FOX1 : is_marker
will return whether the FOX1 gene is a marker gene).
DataAxesFormats.Queries.MATRIX_ENTRY
—
Constant
MATRIX_ENTRY
:=
Axis
IsEqual
Axis
IsEqual
LOOKUP_PROPERTY
Lookup the scalar value of the named entry of a matrix property (e.g.,
/ gene = FOX1 / cell = ATCG : UMIs
will return the number of UMIs of the FOX1 gene of the ATCG cell).
DataAxesFormats.Queries.REDUCE_VECTOR
—
Constant
REDUCE
VECTOR := [`VECTOR
QUERY
](@ref) [
ReductionOperation
](@ref) [
IfMissing`](@ref)?
Perform an arbitrary vector query, and reduce the result into a single scalar value (e.g.,
/ donor : age %> Mean
will compute the mean age of the ages of the donors).
By default, it is an error if the vector query results in an empty vector. However, if an
IfMissing
suffix is provided, then this value is used instead (e.g.,
/ cell & type = LMPP : age %> Mean || 0
will return zero if there are no cells whose type is LMPP).
DataAxesFormats.Queries.VECTOR_QUERY
—
Constant
VECTOR_QUERY
:= (
VECTOR_PROPERTY
|
MATRIX_ROW
|
MATRIX_COLUMN
|
REDUCE_MATRIX
)
POST_PROCESS
*
A query returning a vector can be one of:
- Looking up the value of a vector property (e.g.,
/ gene : is_markerwill return a mask of the marker genes). - Picking a single row or column of a matrix property (e.g.,
/ gene = FOX1 / cell : UMIswill return a vector of the UMIs of the FOX1 gene of all the cells). - Reducing each column of some matrix into a scalar, resulting in a vector (e.g.,
/ gene / cell : UMIs %> Sumwill compute the sum of the UMIs of all the genes in each cell).
Either way, this can be followed by further processing of the vector (e.g.,
/ gene / cell : UMIs % Log base 2 eps 1
will compute the log base 2 of one plus the of the UMIs of each gene in each cell).
DataAxesFormats.Queries.VECTOR_PROPERTY
—
Constant
VECTOR_PROPERTY
:=
Axis
AXIS_MASK
*
VECTOR_LOOKUP
VECTOR_FETCH
*
Lookup the values of some vector property (e.g.,
/ gene : is_marker
will return a mask of the marker genes). This can be restricted to a subset of the vector using masks (e.g.,
/ gene & is_marker : is_noisy
will return a mask of the noisy genes out of the marker genes), and/or fetch the property value from indirect axes (e.g.,
/ cell : batch => donor => age
will return the age of the donor of the batch of each cell).
DataAxesFormats.Queries.VECTOR_LOOKUP
—
Constant
VECTOR_LOOKUP
:=
Lookup
IfMissing
? (
IfNot
|
AsAxis
)?
A
Lookup
of a vector property (e.g.,
/ cell : type
will return the type of each cell).
By default, it is an error if the property does not exist. However, if an
IfMissing
is provided, then this value is used instead (e.g.,
/ cell : type || Unknown
will return a vector of
Unknown
types if there is no
type
property for the
cell
axis).
If the
IfNot
suffix is provided, it controls how to modify "false-ish" (empty string, zero numeric value, or false Boolean value) entries (e.g.,
/ cell : type ?
will return a vector of the type of each cell that has a non-empty type, while
/ cell : type ? Outlier
will return a vector of the type of each cell, where cells with an empty type are given the type
Outlier
).
Only when the vector property is used for
CountBy
or for
GroupBy
, providing the
AsAxis
suffix indicates that the property is associated with an axis (similar to an indirect axis in
Fetch
), and the set of groups is forced to be the values of that axis; in this case, empty string values are always ignored (e.g.,
/ cell : age @ type ! %> Mean || 0
will return a vector of the mean age of the cells of each type, with a value of zero for types which have no cells, and ignoring cells which have an empty type; similarly,
/ cell : batch => donor ! * type !
will return a matrix whose rows are donors and columns are types, counting the number of cells of each type that were sampled from each donor, ignoring cells which have an empty type or whose batch has an empty donor).
DataAxesFormats.Queries.MATRIX_ROW
—
Constant
DataAxesFormats.Queries.MATRIX_COLUMN
—
Constant
DataAxesFormats.Queries.REDUCE_MATRIX
—
Constant
REDUCE_MATRIX
:=
MATRIX_QUERY
ReductionOperation
Perform an arbitrary matrix query, and reduce the result into a vector by converting each column into a single value, eliminating the rows axis (e.g.,
/ gene / cell : UMIs %> Sum
will evaluate to a vector of the total UMIs of each cell).
DataAxesFormats.Queries.MATRIX_QUERY
—
Constant
MATRIX_QUERY
:= (
MATRIX_LOOKUP
|
COUNTS_MATRIX
)
POST_PROCESS
*
A query returning a matrix can be one of:
- Looking up the value of a matrix property (e.g.,
/ gene / cell : UMIswill return the matrix of UMIs for each gene and cell). - Counting the number of times each combination of two vector properties occurs in the data (e.g.,
/ cell : batch => donor => age * typewill return a matrix whose rows are ages and columns are types, where each entry contains the number of cells which have the specific type and age).
Either way, this can be followed by a series of
EltwiseOperation
to modify the results (e.g.,
/ gene / cell : UMIs % Log base 2 eps 1
will compute the log base 2 of 1 plus the UMIs of each gene in each cell).
DataAxesFormats.Queries.MATRIX_LOOKUP
—
Constant
MATRIX_LOOKUP
:=
Axis
AXIS_MASK
*
Axis
AXIS_MASK
*
Lookup
Lookup the values of some matrix property (e.g.,
/ gene / cell : UMIs
will return the matrix of UMIs of each gene in each cell). This can be restricted to a subset of the vector using masks (e.g.,
/ gene & is_marker / cell & type = LMPP : UMIs
will return a matrix of the UMIs of each marker gene in cells whose type is LMPP).
DataAxesFormats.Queries.COUNTS_MATRIX
—
Constant
COUNTS_MATRIX
:=
VECTOR_QUERY
CountBy
VECTOR_FETCH
*
Compute a matrix of counts of each combination of values given two vectors (e.g.,
/ cell : batch => donor => age * batch => donor => sex
will return a matrix whose rows are ages and columns are sexes, where each entry contains the number of cells which have the specific age and sex).
DataAxesFormats.Queries.POST_PROCESS
—
Constant
POST_PROCESS
:=
EltwiseOperation
|
GROUP_BY
A vector or a matrix result may be processed by one of:
- Applying an
EltwiseOperationoperation to each value (e.g.,/ donor : age % Log base 2will compute the log base 2 of the ages of all donors, and/ gene / cell : UMIs % Log base 2 eps 1will compute the log base 2 of 1 plus the UMIs count of each gene in each cell). - Reducing each group of vector entries or matrix rows into a single value (e.g.,
/ cell : batch => donor => age @ type %> Meanwill compute a vector of the mean age of the cells of each type, and/ cell / gene : UMIs @ type %> Meanwill compute a matrix of the mean UMIs of each gene for the cells of each type).
DataAxesFormats.Queries.GROUP_BY
—
Constant
GROUP_BY
:=
GroupBy
VECTOR_FETCH
*
ReductionOperation
IfMissing
The entries of a vector or the rows of a matrix result may be grouped, where all the values that have the same group value are reduced to a single value using a
ReductionOperation
(e.g.,
/ cell : batch => donor => age @ type %> Mean
will compute the mean age of all the cells of each type, and
/ cell / gene : UMIs @ type %> Mean
will compute a matrix of the mean UMIs of each gene for the cells of each type).
If the group property is suffixed by
AsAxis
, then the result will have a value for each entry of the axis (e.g.,
/ cell : age @ type ! %> Mean
will compute the mean age of the cells of each type). In this case, some groups may have no values at all, which by default, is an error. Providing an
IfMissing
suffix will use the specified value for such empty groups instead (e.g.,
/ cell : age @ type ! %> Mean || 0
will compute the mean age for the cells of each type, with a zero value for types for which there are no cells).
DataAxesFormats.Queries.AXIS_MASK
—
Constant
AXIS_MASK
:=
MASK_OPERATION
(
VECTOR_FETCH
)* (
MASK_SLICE
)? (
ComparisonOperation
)?
Restrict the set of entries of an axis to lookup results for (e.g.,
/ gene & is_marker
). If the mask is based on a non-
Bool
property, it is converted to a Boolean by comparing with the empty string or a zero value (depending on its data type); alternatively, you can explicitly compare it with a value (e.g.,
/ cell & batch => donor => age > 1
).
DataAxesFormats.Queries.MASK_OPERATION
—
Constant
MASK_OPERATION
:=
And
|
AndNot
|
Or
|
OrNot
|
Xor
|
XorNot
A query operation for restricting the set of entries of an
Axis
. The mask operations are applied to the current mask, so if several operations are applied, they are applied in order from left to right (e.g.,
/ gene & is_marker | is_noisy &! is_lateral
will first restrict the set of genes to marker genes, then expand it to include noisy genes as well, then remove all the lateral genes; this would be different from
/ gene & is_marker &! is_lateral | is_noisy
, which will include all noisy genes even if they are lateral).
DataAxesFormats.Queries.MASK_SLICE
—
Constant
MASK_SLICE
:=
MaskSlice
IsEqual
|
SquareMaskColumn
|
SquareMaskRow
Allow using a row or a column of a matrix as a mask. If the matrix uses a different axis, then use
MaskSlice
to specify the axis followed by
IsEqual
to specify the slice to use (e.g.,
/ cell & UMIs ; gene > 0
). If the matrix is square use
SquareMaskColumn
or
SquareMaskRow
to slice a column or a row of the matrix (e.g.,
/ cell & outgoing ;= ATCG
or
/ cell & outgoing ,= ATCG
).
DataAxesFormats.Queries.VECTOR_FETCH
—
Constant
VECTOR_FETCH
:=
AsAxis
?
Fetch
IfMissing
? (
IfNot
|
AsAxis
)?
Fetch the value of a property of an indirect axis. That is, there is a common pattern where one axis (e.g., cell) has a property (e.g., type) which has the same name as an axis, and whose values are (string) entry names of that axis. In this case, we often want to lookup a property of the other axis (e.g.,
/ cell : type => color
will evaluate to a vector of the color of the type of each cell). Sometimes one walks a chain of such properties (e.g.,
/ cell : batch => donor => age
).
Sometimes it is needed to store several alternate properties that refer to the same indirect axis. In this case, the name of the property can begin with the axis name, followed by
.
and a suffix (e.g.,
/ cell : type.manual => color
will fetch the color of the manual type of each cell, still using the type axis).
If the property does not follow this convention, it is possible to manually specify the name of the axis using an
AsAxis
prefix (e.g.,
/ cell : manual ! type => color
will assume the value of the
manual
property is a vector of names of entries of the
type
axis).
As usual, if the property does not exist, this is an error, unless an
IfMissing
suffix is provided (e.g.,
/ cell : type || red => color
will assign all cells the color
red
if the
type
property does not exist).
If the value of the property is the empty string for some vector entries, by default this is again an error (as the empty string is not one of the values of the indirect axis). If an
IfNot
suffix is provided, such entries can be removed from the result (e.g.,
/ cell : type ? => color
will return a vector of the colors of the cells which have a non-empty type), or can be given an specific value (e.g.,
/ cell : type ? red => color
will return a vector of a color for each cell, giving the
red
color to cells with an empty type).
When using
IfMissing
and/or
IfNot
, the default value provided is always of the final value (e.g.,
/ cell : batch || -1 ? -2 => donor || -3 ? -4 => age || -5 ? -6
will compute a vector if age per cell; if there's no
batch
property, all cells will get the age
-1
). If there is such property, then cells with an empty batch will get the age
-2
. For cells with a non-empty batch, if there's no
donor
property, they will get the value
-3
. If there is such a property, cells with an empty donor will get the value
-4
. Finally, for cells with a batch and donor, if there is no
age
property, they will be given an age of
-5
. Otherwise, if their age is zero, it will be changed to
-6
.
DataAxesFormats.Queries.guess_typed_value
—
Function
guess_typed_value(value::AbstractString)::StorageScalar
Given a string value, guess the typed value it represents:
-
trueandfalseare assumed to beBool. - Integers are assumed to be
Int64. - Floating point numbers are assumed to be
Float64, as areeandpi. - Anything else is assumed to be a string.
This doesn't have to be 100% accurate; it is intended to allow omitting the data type in most cases when specifying an
IfMissing
value. If it guesses wrong, just specify an explicit type (e.g.,
@ version || 1.0 String
).
Query Operators
DataAxesFormats.Queries.QuerySequence
—
Type
struct QuerySequence{N} <: Query where {N<:Integer}
A sequence of
N
QueryOperation
s.
Data Operators
DataAxesFormats.Queries.AsAxis
—
Type
AsAxis([axis::AbstractString = nothing]) <: QueryOperation
There are three cases where we may want to take a vector property and consider each value to be the name of an entry of some axis:
Fetch
,
CountBy
and
GroupBy
. In a string
Query
, this is indicated by the
!
operators, optionally followed by the name of the axis to use.
When using
Fetch
, we always lookup in some axis, so
AsAxis
is implied (e.g.,
/ cell : type => color
is identical to
/ cell : type ! => color
). In contrast, when using
CountBy
and
GroupBy
, one has to explicitly specify
AsAxis
to force using all the entries of the axis for the counting or grouping (e.g.,
/ cell : age @ type %> Mean
will return a vector of the mean age of every type that has cells associated with it, while
/ cell : age @ type ! %> Mean
will return a vector of the mean age of each and every value of the type axis; similarly,
/ cell : type * age
will generate a counts matrix whose rows are types that have cells associated with them, while
/ cell : type ! * age
will generate a counts matrix whose rows are exactly the entries of the type axis).
Since the set of values is fixed by the axis matching the vector property, it is possible that, when using this for
GroupBy
, some groups would have no values, causing an error. This can be avoided by providing an
IfMissing
suffix to the reduction (e.g.,
/ cell : age @ type ! %> Mean
will fail if some type has no cells associated with it, while
/ cell : age @ type ! %> Mean || 0
will give such types a zero mean age).
Typically, the name of the base property is identical to the name of the axis. In this case, there is no need to specify the name of the axis (as in the examples above). Sometimes it is useful to be able to store several vector properties which all map to the same axis. To support this, we support a naming convention where the property name begins with the axis name followed by a
.suffix
. (e.g., both
/ cell : type => color
and
/ cell : type.manual => color
will look up the
color
of the
type
of some property of the
cell
axis - either "the"
type
of each
cell
, or the alternate
type.manual
of each cell).
If the property name does not follow the above conventions, then it is possible to explicitly specify the name of the axis (e.g.,
/ cell : manual ! type => color
will consider each value of the
manual
property as the name of an entry of the
type
axis and look up the matching
color
property value of this axis).
DataAxesFormats.Queries.Axis
—
Type
Axis(axis::AbstractString) <: QueryOperation
A query operation for specifying a result axis. In a string
Query
, this is specified using the
/
operator followed by the axis name.
This needs to be specified at least once for a vector query (e.g.,
/ cell : batch
), and twice for a matrix (e.g.,
/ cell / gene : UMIs
). Axes can be filtered using Boolean masks using
And
,
AndNot
,
Or
,
OrNot
,
Xor
and
XorNot
(e.g.,
/ gene & is_marker : is_noisy
). Alternatively, a single entry can be selected from the axis using
IsEqual
(e.g.,
/ gene = FOX1 : is_noisy
,
/ cell / gene = FOX1 : UMIs
,
/ cell = C1 / gene = FOX1 : UMIs
). Finally, a matrix can be reduced into a vector, and a vector to a scalar, using
ReductionOperation
(e.g.,
/ gene / cell : UMIs %> Sum %> Mean
).
This,
Names
and
Lookup
are the only
QueryOperation
s that also works as a complete
Query
.
DataAxesFormats.Queries.CountBy
—
Type
CountBy(property::AbstractString) <: QueryOperation
A query operation that generates a matrix of counts of combinations of pairs of values for the same entries of an axis. That is, it follows fetching some vector property, and is followed by fetching a second vector property of the same axis. The result is a matrix whose rows are the values of the 1st property and the columns are the values of the 2nd property, and the values are the number of times the combination of values appears. In a string
Query
, this is specified using the
*
operator, followed by the property name to look up (e.g.,
/ cell : type * batch
will generate a matrix whose rows correspond to cell types, whose columns correspond to cell batches, and whose values are the number of cells of each combination of batch and type).
By default, the rows and/or columns only contain actually seen values and are ordered alphabetically. However, it is common that one or both of the properties correspond to an axis. In this case, you can use an
AsAxis
suffix to force the rows and/or columns of the matrix to be exactly the entries of the specific axis (e.g.,
/ cell : type ! * batch
will generate a matrix whose rows are exactly the entries of the
type
axis, even if there is a type without any cells). This is especially useful when both properties are axes, as the result can be stored as a matrix property (e.g.,
/ cell : type ! * batch !
will generate a matrix whose rows are the entries of the type axis, and whose columns are the entries of the batch axis, so it can be given to
set_matrix!(daf, "type", "batch", ...)
).
The raw counts matrix can be post-processed like any other matrix (using
ReductionOperation
or an
EltwiseOperation
). This allows computing useful aggregate properties (e.g.,
/ cell : type * batch % Fractions
will generate a matrix whose columns correspond to batches and whose rows are the fraction of the cells from each type within each batch).
DataAxesFormats.Queries.Fetch
—
Type
Fetch(property::AbstractString) <: QueryOperation
A query operation for fetching the value of a property from another axis, based on a vector property whose values are entry names of the axis. In a string
Query
, this is specified using the
=>
operator, followed by the name to look up.
That is, if you query for the values of a vector property (e.g.,
batch
for each
cell
), and the name of this property is identical to some axis name, then we assume each value is the name of an entry of this axis. We use this to fetch the value of some other property (e.g.,
age
) of that axis (e.g.,
/ cell : batch => age
).
It is useful to be able to store several vector properties which all map to the same axis. To support this, we support a naming convention where the property name begins with the axis name followed by a
.suffix
. (e.g., both
/ cell : type => color
and
/ cell : type.manual => color
will look up the
color
of the
type
of some property of the
cell
axis - either "the"
type
of each
cell
, or the alternate
type.manual
of each cell).
Fetching can be chained (e.g.,
/ cell : batch => donor => age
will fetch the
age
of the
donor
of the
batch
of each
cell
).
If the property does not exist, this is an error, unless this is followed by
IfMissing
(e.g.,
/ cell : type => color || red
). If the property contains an empty value, this is also an error, unless it is followed by an
IfNot
(e.g.,
/ cell : type ? => color
will compute a vector of the colors of the type of the cells that have a non-empty type, and
/ cell : batch ? 0 => donor => age
will assign a zero age for cells which have an empty batch).
DataAxesFormats.Queries.GroupBy
—
Type
GroupBy(property::AbstractString) <: QueryOperation
A query operation that uses a (following)
ReductionOperation
to aggregate the values of each group of values. Will fetch the specified
property_name
(possibly followed by additional
Fetch
operations) and use the resulting vector for the name of the group of each value.
If applied to a vector, the result is a vector with one entry per group (e.g.,
/ cell : age @ type %> Mean
will generate a vector with an entry per cell type and whose values are the mean age of the cells of each type). If applied to a matrix, the result is a matrix with one row per group (e.g.,
/ cell / gene : UMIs @ type %> Max
will generate a matrix with one row per type and one column per gene, whose values are the maximal UMIs count of the gene in the cells of each type).
By default, the result uses only group values we actually observe, in sorted order. However, if the operation is followed by an
AsAxis
suffix, then the fetched property must correspond to an existing axis (similar to when using
Fetch
), and the result will use the entries of the axis, even if we do not observe them in the data (and will ignore vector entries with an empty value). In this case, the reduction operation will fail if there are no values for some group, unless it is followed by an
IfMissing
suffix (e.g.,
/ cell : age @ type ! %> Mean
will generate a vector whose entries are all the entries of the
type
axis, and will ignore cells with an empty type; this will fail if there are types which are not associated with any cell. In contrast,
/ cell : age @ type ! %> Mean || 0
will succeed, assigning a value of zero for types which have no cells associated with them).
DataAxesFormats.Queries.IfMissing
—
Type
IfMissing(value::StorageScalar; dtype::Maybe{Type} = nothing) <: QueryOperation
A query operation providing a value to use if the data is missing some property. In a string
Query
, this is specified using the
||
operator, followed by the value to use, and optionally followed by the data type of the value (e.g.,
: score || 1 Float32
).
If the data type is not specified, and the
value
isa
AbstractString
, then the data type is deduced using
guess_typed_value
of the
value
.
DataAxesFormats.Queries.IfNot
—
Type
IfNot(value::Maybe{StorageScalar} = nothing) <: QueryOperation
A query operation providing a value to use for "false-ish" values in a vector (empty strings, zero numeric values, or false Boolean values). In a string
Query
, this is indicated using the
??
operator, optionally followed by a value to use.
If the value is
nothing
(the default), then these entries are dropped (masked out) of the result (e.g.,
/ cell : type ?
behaves the same as
/ cell & type : type
, that is, returns the type of the cells which have a non-empty type). Otherwise, this value is used instead of the "false-ish" value (e.g.,
/ cell : type ? Outlier
will return a vector of the type of each cell, with the value
Outlier
for cells with an empty type). When fetching properties, this is the final value (e.g.,
/ cell : type ? red => color
will return a vector of the color of the type of each cell, with a
red
color for the cells with an empty type).
If the
value
isa
AbstractString
, then it is automatically converted to the data type of the elements of the results vector.
DataAxesFormats.Queries.Lookup
—
Type
Lookup(property::AbstractString) <: Query
A query operation for looking up the value of a property with some name. In a string
Query
, this is specified using the
:
operator, followed by the property name to look up.
- If the query state is empty, this looks up the value of a scalar property (e.g.,
: version). - If the query state contains a single axis, this looks up the value of a vector property (e.g.,
/ cell : batch). - If the query state contains two axes, this looks up the value of a matrix property (e.g.,
/ cell / gene : UMIs).
If the property does not exist, this is an error, unless this is followed by
IfMissing
(e.g.,
: version || 1.0
).
If any of the axes has a single entry selected using
IsEqual
, this will reduce the dimension of the result (e.g.,
/ cell / gene = FOX1 : UMIs
is a vector, and both
/ cell = C1 / gene = FOX1 : UMIs
and
/ gene = FOX1 : is_marker
are scalars).
This,
Names
and
Axis
are the only
QueryOperation
s that also works as a complete
Query
.
DataAxesFormats.Queries.Names
—
Type
Names(kind::Maybe{AbstractString} = nothing) <: Query
A query operation for looking up a set of names. In a string
Query
, this is specified using the
?
operator, optionally followed by the kind of objects to name.
- If the query state is empty, a
kindmust be specified, one ofscalarsoraxes, and the result is the set of their names (? scalars,? axes). - If the query state contains a single axis (without any masks), the
kindmust not be specified, and the result is the set of names of vector properties of the axis (e.g.,/ cell ?). - If the query state contains two axes (without any masks), the
kindmust not be specified, and the result is the set of names of matrix properties of the axes (e.g.,/ cell / gene ?).
This,
Lookup
and
Axis
are the only
QueryOperation
s that also works as a complete
Query
.
Comparison Operators
DataAxesFormats.Queries.ComparisonOperation
—
Type
ComparisonOperation
:= (
IsLess
|
IsLessEqual
|
IsEqual
|
IsNotEqual
|
IsGreater
|
IsGreaterEqual
|
IsMatch
|
IsNotMatch
)
A query operation computing a mask by comparing the values of a vector with some constant (e.g.,
/ cell & age > 0
). In addition, the
IsEqual
operation can be used to slice an entry from a vector (e.g.,
/ gene = FOX1 : is_marker
) or a matrix (e.g.,
/ cell / gene = FOX1 & UMIs
,
/ cell = ATCG / gene = FOX1 : UMIs
).
DataAxesFormats.Queries.IsEqual
—
Type
IsEqual(value::StorageScalar) <: QueryOperation
Equality is used for two purposes:
- As a comparison operator, similar to
IsLessexcept that uses=instead of<for the comparison. - To select a single entry from a vector. This allows a query to select a single scalar from a vector (e.g.,
/ gene = FOX1 : is_marker) or from a matrix (e.g.,/ cell = ATCG / gene = FOX1 : UMIs); or to slice a single vector from a matrix (e.g.,/ cell = ATCG / gene : UMIsor/ cell / gene = FOX1 : UMIs).
DataAxesFormats.Queries.IsGreater
—
Type
IsGreater(value::StorageScalar) <: QueryOperation
Similar to
IsLess
except that uses
>
instead of
<
for the comparison.
DataAxesFormats.Queries.IsGreaterEqual
—
Type
IsGreaterEqual(value::StorageScalar) <: QueryOperation
Similar to
IsLess
except that uses
>=
instead of
<
for the comparison.
DataAxesFormats.Queries.IsLess
—
Type
IsLess(value::StorageScalar) <: QueryOperation
A query operation for converting a vector value to a Boolean mask by comparing it some value. In a string
Query
, this is specified using the
<
operator, followed by the value to compare with.
A string value is automatically converted into the same type as the vector values (e.g.,
/ cell & probability < 0.5
will restrict the result vector only to cells whose probability is less than half).
DataAxesFormats.Queries.IsLessEqual
—
Type
IsLessEqual(value::StorageScalar) <: QueryOperation
Similar to
IsLess
except that uses
<=
instead of
<
for the comparison.
DataAxesFormats.Queries.IsMatch
—
Type
IsMatch(value::Union{AbstractString, Regex}) <: QueryOperation
Similar to
IsLess
except that the compared values must be strings, and the mask is of the values that match the given regular expression.
DataAxesFormats.Queries.IsNotEqual
—
Type
IsNotEqual(value::StorageScalar) <: QueryOperation
Similar to
IsLess
except that uses
!=
instead of
<
for the comparison.
DataAxesFormats.Queries.IsNotMatch
—
Type
IsNotMatch(value::Union{AbstractString, Regex}) <: QueryOperation
Similar to
IsMatch
except that looks for entries that do not match the pattern.
Mask Operators
DataAxesFormats.Queries.And
—
Type
And(property::AbstractString) <: QueryOperation
A query operation for restricting the set of entries of an
Axis
. In a string
Query
, this is specified using the
&
operator, followed by the name of an axis property to look up to compute the mask.
The mask may be just the fetched property (e.g.,
/ gene & is_marker
will restrict the result vector to only marker genes). If the value of the property is not Boolean, it is automatically compared to
0
or the empty string, depending on its type (e.g.,
/ cell & type
will restrict the result vector to only cells which were given a non-empty-string type annotation). It is also possible to fetch properties from other axes, and use an explicit
ComparisonOperation
to compute the Boolean mask (e.g.,
/ cell & batch => age > 1
will restrict the result vector to cells whose batch has an age larger than 1).
DataAxesFormats.Queries.AndNot
—
Type
DataAxesFormats.Queries.Or
—
Type
Or(property::AbstractString) <: QueryOperation
A query operation for expanding the set of entries of an
Axis
. In a string
Query
, this is specified using the
|
operator, followed by the name of an axis property to look up to compute the mask.
This works similarly to
And
, except that it adds to the mask (e.g.,
/ gene & is_marker | is_noisy
will restrict the result vector to either marker or noisy genes).
DataAxesFormats.Queries.OrNot
—
Type
DataAxesFormats.Queries.Xor
—
Type
Xor(property::AbstractString) <: QueryOperation
A query operation for flipping the set of entries of an
Axis
. In a string
Query
, this is specified using the
^
operator, followed by the name of an axis property to look up to compute the mask.
This works similarly to
Or
, except that it flips entries in the mask (e.g.,
/ gene & is_marker ^ is_noisy
will restrict the result vector to either marker or noisy genes, but not both).
DataAxesFormats.Queries.XorNot
—
Type
DataAxesFormats.Queries.MaskSlice
—
Type
MaskSlice(axis_name::AbstractString) <: QueryOperation
A query operation for using a slice of a matrix as a mask, when the other axis of the matrix is different from the mask axis. This needs to be followed by the axis entry to slice. In a string
Query
, this is specified using the
;
operator, followed by the name of the axis for looking up the matrix, then followed by
=
and the value identifying the slice.
That is, suppose we have a UMIs matrix per cell per gene, and we'd like to select all the cells which have non-zero UMIs for the FOX1 gene. Then we can say
/ cell & UMIs ; gene = FOX1 > 0
(or just
/ cell & UMIs ; gene = FOX1
.
DataAxesFormats.Queries.SquareMaskColumn
—
Type
SquareMaskColumn(comparison_value::AbstractString) <: QueryOperation
Similar to
MaskSlice
but is used when the mask matrix is square and we'd like to use a column as a mask. This therefore only needs specifying the column to slice. In a string
Query
, this is specified using the
;=
operator followed by the value identifying the slice.
That is, suppose we have a KNN graph between cells as a cell-cell matrix where each column contains the weights of the outgoing edges from each cell to the rest. To select all the cells reachable from a particular one. Then we can say
/ cell & outgoing ;= ATCG > 0
(or just
/ cell & outgoing ;= ATCG
). If we also want to include the source cell we'd need to say
/ cell & name = ATCG | outgoing ;= ATCG
, etc.
DataAxesFormats.Queries.SquareMaskRow
—
Type
SquareMaskRow(comparison_value::AbstractString) <: QueryOperation
Similar to
SquareMaskRow
but is used when the mask matrix is square and we'd like to use a row as a mask. This therefore only needs specifying the row to slice. In a string
Query
, this is specified using the
,=
operator followed by the value identifying the slice.
That is, suppose we have a KNN graph as above and we'd like to select all cells that can reach a particular one. Then
/ cell & outgoing ,= ATCG > 0
(or just
/ cell & outgoing ,= ATCG
). If we also want to include the source cell we'd need to say
/ cell & name = ATCG | outgoing ,= ATCG
, etc.
Index
-
DataAxesFormats.Queries -
DataAxesFormats.Queries.AXIS_MASK -
DataAxesFormats.Queries.COUNTS_MATRIX -
DataAxesFormats.Queries.GROUP_BY -
DataAxesFormats.Queries.LOOKUP_PROPERTY -
DataAxesFormats.Queries.MASK_OPERATION -
DataAxesFormats.Queries.MASK_SLICE -
DataAxesFormats.Queries.MATRIX_COLUMN -
DataAxesFormats.Queries.MATRIX_ENTRY -
DataAxesFormats.Queries.MATRIX_LOOKUP -
DataAxesFormats.Queries.MATRIX_QUERY -
DataAxesFormats.Queries.MATRIX_ROW -
DataAxesFormats.Queries.NAMES_QUERY -
DataAxesFormats.Queries.POST_PROCESS -
DataAxesFormats.Queries.QUERY_OPERATORS -
DataAxesFormats.Queries.REDUCE_MATRIX -
DataAxesFormats.Queries.REDUCE_VECTOR -
DataAxesFormats.Queries.SCALAR_QUERY -
DataAxesFormats.Queries.VECTOR_ENTRY -
DataAxesFormats.Queries.VECTOR_FETCH -
DataAxesFormats.Queries.VECTOR_LOOKUP -
DataAxesFormats.Queries.VECTOR_PROPERTY -
DataAxesFormats.Queries.VECTOR_QUERY -
DataAxesFormats.Queries.And -
DataAxesFormats.Queries.AndNot -
DataAxesFormats.Queries.AsAxis -
DataAxesFormats.Queries.Axis -
DataAxesFormats.Queries.ComparisonOperation -
DataAxesFormats.Queries.CountBy -
DataAxesFormats.Queries.Fetch -
DataAxesFormats.Queries.FrameColumn -
DataAxesFormats.Queries.FrameColumns -
DataAxesFormats.Queries.GroupBy -
DataAxesFormats.Queries.IfMissing -
DataAxesFormats.Queries.IfNot -
DataAxesFormats.Queries.IsEqual -
DataAxesFormats.Queries.IsGreater -
DataAxesFormats.Queries.IsGreaterEqual -
DataAxesFormats.Queries.IsLess -
DataAxesFormats.Queries.IsLessEqual -
DataAxesFormats.Queries.IsMatch -
DataAxesFormats.Queries.IsNotEqual -
DataAxesFormats.Queries.IsNotMatch -
DataAxesFormats.Queries.Lookup -
DataAxesFormats.Queries.MaskSlice -
DataAxesFormats.Queries.Names -
DataAxesFormats.Queries.Or -
DataAxesFormats.Queries.OrNot -
DataAxesFormats.Queries.Query -
DataAxesFormats.Queries.QuerySequence -
DataAxesFormats.Queries.QueryString -
DataAxesFormats.Queries.SquareMaskColumn -
DataAxesFormats.Queries.SquareMaskRow -
DataAxesFormats.Queries.Xor -
DataAxesFormats.Queries.XorNot -
DataAxesFormats.Queries.full_vector_query -
DataAxesFormats.Queries.get_frame -
DataAxesFormats.Queries.get_query -
DataAxesFormats.Queries.guess_typed_value -
DataAxesFormats.Queries.is_axis_query -
DataAxesFormats.Queries.query_requires_relayout -
DataAxesFormats.Queries.query_result_dimensions -
DataAxesFormats.Queries.@q_str