Queries

Construction

DataAxesFormats.Queries.Query Type
Query(
    query::QueryString,
    operand_only::Maybe{Type{QueryOperation}} = nothing,
) <: QueryOperation

A query is a description of a (sub-)process for extracting some data from a DafReader . A full query is a sequence of QueryOperation , that when applied one at a time on some DafReader , result in a scalar, vector or matrix result.

To apply a query, invoke get_query to apply a query to some DafReader data (you can also use the shorthand $daf[query]$ instead of $get_query(daf, query)$ ). By default, query operations will cache their results in memory as QueryData , to speed up repeated queries. This may lock up large amounts of memory; you can empty_cache! to release it.

Queries can be constructed in two ways. In code, a query can be built by chaining query operations (e.g., the expression Axis("gene") |> Lookup("is_marker") looks up the is_marker vector property of the gene axis).

Alternatively, a query can be parsed from a string, which needs to be parsed into a Query object (e.g., the above can be written as Query("/gene:is_marker") ). See the QUERY_OPERATORS for a table of supported operators. Spaces (and comments) around the operators are optional; see tokenize for details. You can also convert a Query to a string (or print it, etc.) to see its representation. This is used for error messages and as a key when caching query results.

Since query strings use \ as an escape character, it is easier to use raw string literals for queries (e.g., Query(raw"cell = ATCG\:B1 : age") vs. Query("cell = ATCG\\:B1 : age") ). To make this even easier we provide the q macro (e.g., q"cell = ATCG\:B1 : batch" ) which works similarly to Julia's standard r macro for literal Regex strings.

If the provided query string contains only an operand, and operand_only is specified, it is used as the operator (i.e., Query("metacell") is an error, but Query("metacell", Axis) is the same as Axis("metacell") ). This is useful when providing suffix queries (e.g., for get_frame ).

Being able to represent queries as strings allows for reading them from configuration files and letting the user input them in an application UI (e.g., allowing the user to specify the X, Y and/or colors of a scatter plot using queries). At the same time, being able to incrementally build queries using code allows for convenient reuse (e.g., reusing axis sub-queries in Daf views), without having to go through the string representation.

Daf provides a comprehensive set of QueryOperation s that can be used to construct queries. The QUERY_OPERATORS listed below provide the basic functionality (e.g., specifying an Axis or a property Lookup ). In addition, Daf provides computation operations ( EltwiseOperation and ReductionOperation ), allowing for additional operations to be provided by external packages.

Obviously not all possible combinations of operations make sense (e.g., Lookup("is_marker") |> Axis("cell") will not work). For the full list of valid combinations, see NAMES_QUERY , SCALAR_QUERY , VECTOR_QUERY and MATRIX_QUERY below.

Note

This has started as a very simple query language (which it still is, for the simple cases) but became complex to allow for useful but complicated scenarios. In particular, the approach here of using a concatenative language (similar to ggplot ) makes simple things simpler, but becames somewhat unnatural and restrictive for some of the more advanced operations. However, using an RPN or a LISP notation to better support such cases would have ended up with a much less nice syntax for the simple cases.

Hopefully we have covered sufficient ground so that we won't need to add further operations. Also, In most cases, you can write code that accesses the vectors/matrix data and performs whatever computation you want instead of writing a complex query; however, this isn't an option when defining views or adapters, which rely on the query mechanism for specifying the data.

DataAxesFormats.Queries.@q_str Macro
q"..."

Shorthand for parsing a literal string as a Query . This is equivalent to Query (raw"...") , that is, a \ can be placed in the string without escaping it (except for before a " ). This is very convenient for literal queries (e.g., q"/ cell = ATCG\:B1 : batch" == Query(raw"/ cell = ATCG\:B1 : batch") == Query("/ cell = ATCG\\:B1 : batch") == `Axis("cell") |> IsEqual("ATCG:B1") |> Lookup("batch")).

DataAxesFormats.Queries.QueryString Type

Most operations that take a query allow passing a string to be parsed into a query, or an actual Query object. This type is used as a convenient notation for such query parameters.

Functions

DataAxesFormats.Queries.get_query Function
get_query(
    daf::DafReader,
    query::QueryString;
    [cache::Bool = true]
)::Union{StorageScalar, NamedVector, NamedMatrix}

Apply the full query to the Daf data and return the result. By default, this will cache results, so repeated queries will be accelerated. This may consume a large amount of memory. You can disable it by specifying cache = false , or release the cached data using empty_cache! .

As a shorthand syntax you can also invoke this using getindex , that is, using the [] operator (e.g., daf[q"/ cell"] is equivalent to get_query(daf, q"/ cell") ).

DataAxesFormats.Queries.get_frame Function
get_frame(
    daf::DafReader,
    axis::QueryString,
    [columns::Maybe{FrameColumns} = nothing;
    cache::Bool = true]
)::DataFrame end

Return a DataFrame containing multiple vectors of the same axis .

The axis can be either just the name of an axis (e.g., "cell" ), or a query for the axis (e.g., q"/ cell" ), possibly using a mask (e.g., q"/ cell & age > 1" ). The result of the query must be a vector of unique axis entry names.

If columns is not specified, the data frame will contain all the vector properties of the axis, in alphabetical order (since DataFrame has no concept of named rows, the 1st column will contain the name of the axis entry).

By default, this will cache results of all queries. This may consume a large amount of memory. You can disable it by specifying cache = false , or release the cached data using empty_cache! .

DataAxesFormats.Queries.FrameColumn Type

Specify a column for get_frame for some axis. The most generic form is a pair "column_name" => query . Two shorthands apply: the pair "column_name" => "=" is a shorthand for the pair "column_name" => ": column_name" , and so is the shorthand "column_name" (simple string).

We also allow specifying tuples instead of pairs to make it easy to invoke the API from other languages such as Python which do not have the concept of a Pair .

The query is combined with the axis query as follows (using full_vector_query :

  • If the query contains GroupBy , then the query must repeat any mask specified for the axis query. That is, if the axis query is metacell & type = B , then the column query must be / cell & metacell => type = B @ metacell : age %> Mean . Sorry for the inconvenience. TODO: Automatically inject the mask into GroupBy column queries.
  • Otherwise, if the query starts with a (single) axis, then it should only contain a reduction; the axis query is automatically injected following it. That is, if the axis query is gene & is_marker , then the full query for the column query / metacell : fraction %> Mean will be / metacell / gene : fraction %> Mean (the mean gene expression in all metacells). We can't just concatenate the axis query and the columns query here, is because Julia, in its infinite wisdom, uses column-major matrices, like R and matlab; so reduction eliminates the rows instead of the columns of the matrix.
  • Otherwise (the typical case), we simply concatenate the axis query and the column query. That is, of the axis query is cell & batch = B1 and the column query is : age , then the full query will be cell & batch = B1 : age . This is the simplest and most common case.

In all cases the (full) query must return a value for each entry of the axis.

DataAxesFormats.Queries.FrameColumns Type

Specify all the columns to collect for a frame. We would have liked to specify this as AbstractVector{<:FrameColumn} but Julia in its infinite wisdom considers ["a", "b" => "c"] to be a Vector{Any} , which would require literals to be annotated with the type.

DataAxesFormats.Queries.full_vector_query Function
full_vector_query(
    axis_query::Query,
    vector_query::QueryString,
    vector_name::Maybe{AbstractString} = nothing,
)::Query

Given a query for an axis, and some suffix query for a vector property, combine them into a full query for the vector values for the axis. This is used by FrameColumn for get_frame and also for queries of vector data in views.

DataAxesFormats.Queries.query_result_dimensions Function
query_result_dimensions(query::QueryString)::Int

Return the number of dimensions (-1 - names, 0 - scalar, 1 - vector, 2 - matrix) of the results of a query . This also verifies the query is syntactically valid, though it may still fail if applied to specific data due to invalid data values or types.

DataAxesFormats.Queries.query_requires_relayout Function
query_requires_relayout(daf::DafReader, query::QueryString)::Bool

Whether computing the query for the daf data requires relayout! of some matrix. This also verifies the query is syntactically valid and that the query can be computed, though it may still fail if applied to specific data due to invalid values or types.

DataAxesFormats.Queries.is_axis_query Function
is_axis_query(query::QueryString)::Bool

Returns whether the query specifies a (possibly masked) axis. This also verifies the query is syntactically valid, though it may still fail if applied to specific data due to invalid data values or types.

Syntax

DataAxesFormats.Queries.QUERY_OPERATORS Constant

Operators used to represent a Query as a string.

Operator Implementation Description
/ Axis Specify a vector or matrix axis (e.g., / cell : batch or / cell / gene : UMIs ).
? Names 1. Names of scalars or axes ( ? axes , ? scalars ).
2. Names of vectors of axis (e.g., / cell ? ).
3. Names of matrices of axes (e.g., / cell / gene ? ).
: Lookup Lookup a property (e.g., : version , / cell : batch or / cell / gene : UMIs ).
=> Fetch Fetch a property from another axis (e.g., / cell : batch => age ).
; MaskSlice Slice a matrix mask (e.g. / cell & UMIs ; gene = FOX1 > 0 ).
;= SquareMaskColumn Slice a square matrix mask column (e.g. / cell & outgoing ;= ATCG > 0 ).
,= SquareMaskRow Slice a square matrix mask column (e.g. / cell & outgoing ,= ATCG > 0 ).
! AsAxis 1. Specify axis name when fetching a property (e.g., / cell : manual ! type => color ).
2. Force all axis values when counting (e.g., / cell : batch ! * manual ! type ).
3. Force all axis values when grouping (e.g., / cell : age @ batch ! %> Mean ).
?? IfNot 1. Mask excluding false-ish values (e.g., / cell : batch ?? => age ).
2. Default for false-ish lookup values (e.g., / cell : type ?? Outlier ).
3. Default for false-ish fetched values (e.g., / cell : batch ?? 1 => age ).
││ IfMissing 1. Value for missing lookup properties (e.g., / gene : is_marker ││ false ).
2. Value for missing fetched properties (e.g., `/ cell : type
3. Value for empty reduced vectors (e.g., `/ cell : type = LMPP => age %> Max
% EltwiseOperation Apply an element-wise operation (e.g., / cell / gene : UMIs % Log base 2 eps 1 ).
%> ReductionOperation Apply a reduction operation (e.g., / cell / gene : UMIs %> Sum ).
* CountBy Compute counts matrix (e.g., / cell : age * type ).
@ GroupBy 1. Aggregate vector entries by a group (e.g., / cell : age @ type %> Mean ).
2. Aggregate matrix row entries by a group (e.g., / cell / gene : UMIs @ type %> Max ).
& And Restrict axis entries (e.g., / gene & is_marker ).
&! AndNot Restrict axis entries (e.g., / gene &! is_marker ).
Or Expand axis entries (e.g., / gene & is_marker │ is_noisy ).
│! OrNot Expand axis entries (e.g., / gene & is_marker │! is_noisy ).
^ Xor Flip axis entries (e.g., / gene & is_marker ^ is_noisy ).
^! XorNot Flip axis entries (e.g., / gene & is_marker ^! is_noisy ).
= IsEqual 1. Select an entry from an axis (e.g., / cell / gene = FOX1 : UMIs ).
2. Compare equal (e.g., / cell & age = 1 ).
!= IsNotEqual Compare not equal (e.g., / cell & age != 1 ).
< IsLess Compare less than (e.g., / cell & age < 1 ).
<= IsLessEqual Compare less or equal (e.g., / cell & age <= 1 ).
> IsGreater Compare greater than (e.g., / cell & age > 1 ).
>= IsGreaterEqual Compare greater or equal (e.g., / cell & age >= 1 ).
~ IsMatch Compare match (e.g., / gene & name ~ RP\[SL\] ).
!~ IsNotMatch Compare not match (e.g., / gene & name !~ RP\[SL\] ).
Note

Due to Julia's Documenter limitations, the ASCII | character ( &#124; ) is replaced by the Unicode character ( &#9474; ) in the above table. Sigh.

DataAxesFormats.Queries.NAMES_QUERY Constant

NAMES_QUERY := ( Names scalars | Names axes | Axis Names | Axis Axis Names )

A query returning a set of names:

  • Looking up the set of names of the scalar properties ( ? scalars ).
  • Looking up the set of names of the axes ( ? axes ).
  • Looking up the set of names of the vector properties of an axis (e.g., / cell ? ).
  • Looking up the set of names of the matrix properties of a pair of axes (e.g., / cell / gene ? ).
DataAxesFormats.Queries.SCALAR_QUERY Constant

SCALAR_QUERY := ( LOOKUP_PROPERTY ](@ref) | VECTOR_ENTRY | MATRIX_ENTRY | REDUCE_VECTOR ) EltwiseOperation *

A query returning a scalar can be one of:

  • Looking up the value of a scalar property (e.g., : version will return the value of the version scalar property).
  • Picking a single entry of a vector property (e.g., / gene = FOX1 : is_marker will return whether the gene named FOX1 is a marker gene).
  • Picking a single entry of a matrix property (e.g., / gene = FOX1 / cell = ATCG : UMIs will return the number of UMIs of the FOX1 gene of the ATCG cell).
  • Reducing some vector into a single value (e.g., / donor : age %> Mean will compute the mean age of all the donors).

Either way, this can be followed by a series of EltwiseOperation to modify the scalar result (e.g., / donor : age %> Mean % Log base 2 % Abs will compute the absolute value of the log base 2 of the mean age of all the donors).

DataAxesFormats.Queries.LOOKUP_PROPERTY Constant

LOOKUP_PROPERTY := Lookup IfMissing ?

Lookup the value of a scalar or matrix property. This is used on its own to access a scalar property (e.g., : version ) or combined with two axes to access a matrix property (e.g., / cell / gene : UMIs ).

By default, it is an error if the property does not exist. However, if an IfMissing is provided, then this value is used instead (e.g., : version || Unknown will return a Unknown if there is no version scalar property, and / cell / gene : UMIs || 0 will return an all-zero matrix if there is no UMIs matrix property).

Accessing a VECTOR_PROPERTY allows for more complex operations.

DataAxesFormats.Queries.REDUCE_VECTOR Constant

REDUCE VECTOR := [`VECTOR QUERY ](@ref) [ ReductionOperation ](@ref) [ IfMissing`](@ref)?

Perform an arbitrary vector query, and reduce the result into a single scalar value (e.g., / donor : age %> Mean will compute the mean age of the ages of the donors).

By default, it is an error if the vector query results in an empty vector. However, if an IfMissing suffix is provided, then this value is used instead (e.g., / cell & type = LMPP : age %> Mean || 0 will return zero if there are no cells whose type is LMPP).

DataAxesFormats.Queries.VECTOR_QUERY Constant

VECTOR_QUERY := ( VECTOR_PROPERTY | MATRIX_ROW | MATRIX_COLUMN | REDUCE_MATRIX ) POST_PROCESS *

A query returning a vector can be one of:

  • Looking up the value of a vector property (e.g., / gene : is_marker will return a mask of the marker genes).
  • Picking a single row or column of a matrix property (e.g., / gene = FOX1 / cell : UMIs will return a vector of the UMIs of the FOX1 gene of all the cells).
  • Reducing each column of some matrix into a scalar, resulting in a vector (e.g., / gene / cell : UMIs %> Sum will compute the sum of the UMIs of all the genes in each cell).

Either way, this can be followed by further processing of the vector (e.g., / gene / cell : UMIs % Log base 2 eps 1 will compute the log base 2 of one plus the of the UMIs of each gene in each cell).

DataAxesFormats.Queries.VECTOR_PROPERTY Constant

VECTOR_PROPERTY := Axis AXIS_MASK * VECTOR_LOOKUP VECTOR_FETCH *

Lookup the values of some vector property (e.g., / gene : is_marker will return a mask of the marker genes). This can be restricted to a subset of the vector using masks (e.g., / gene & is_marker : is_noisy will return a mask of the noisy genes out of the marker genes), and/or fetch the property value from indirect axes (e.g., / cell : batch => donor => age will return the age of the donor of the batch of each cell).

DataAxesFormats.Queries.VECTOR_LOOKUP Constant

VECTOR_LOOKUP := Lookup IfMissing ? ( IfNot | AsAxis )?

A Lookup of a vector property (e.g., / cell : type will return the type of each cell).

By default, it is an error if the property does not exist. However, if an IfMissing is provided, then this value is used instead (e.g., / cell : type || Unknown will return a vector of Unknown types if there is no type property for the cell axis).

If the IfNot suffix is provided, it controls how to modify "false-ish" (empty string, zero numeric value, or false Boolean value) entries (e.g., / cell : type ? will return a vector of the type of each cell that has a non-empty type, while / cell : type ? Outlier will return a vector of the type of each cell, where cells with an empty type are given the type Outlier ).

Only when the vector property is used for CountBy or for GroupBy , providing the AsAxis suffix indicates that the property is associated with an axis (similar to an indirect axis in Fetch ), and the set of groups is forced to be the values of that axis; in this case, empty string values are always ignored (e.g., / cell : age @ type ! %> Mean || 0 will return a vector of the mean age of the cells of each type, with a value of zero for types which have no cells, and ignoring cells which have an empty type; similarly, / cell : batch => donor ! * type ! will return a matrix whose rows are donors and columns are types, counting the number of cells of each type that were sampled from each donor, ignoring cells which have an empty type or whose batch has an empty donor).

DataAxesFormats.Queries.MATRIX_QUERY Constant

MATRIX_QUERY := ( MATRIX_LOOKUP | COUNTS_MATRIX ) POST_PROCESS *

A query returning a matrix can be one of:

  • Looking up the value of a matrix property (e.g., / gene / cell : UMIs will return the matrix of UMIs for each gene and cell).
  • Counting the number of times each combination of two vector properties occurs in the data (e.g., / cell : batch => donor => age * type will return a matrix whose rows are ages and columns are types, where each entry contains the number of cells which have the specific type and age).

Either way, this can be followed by a series of EltwiseOperation to modify the results (e.g., / gene / cell : UMIs % Log base 2 eps 1 will compute the log base 2 of 1 plus the UMIs of each gene in each cell).

DataAxesFormats.Queries.MATRIX_LOOKUP Constant

MATRIX_LOOKUP := Axis AXIS_MASK * Axis AXIS_MASK * Lookup

Lookup the values of some matrix property (e.g., / gene / cell : UMIs will return the matrix of UMIs of each gene in each cell). This can be restricted to a subset of the vector using masks (e.g., / gene & is_marker / cell & type = LMPP : UMIs will return a matrix of the UMIs of each marker gene in cells whose type is LMPP).

DataAxesFormats.Queries.COUNTS_MATRIX Constant

COUNTS_MATRIX := VECTOR_QUERY CountBy VECTOR_FETCH *

Compute a matrix of counts of each combination of values given two vectors (e.g., / cell : batch => donor => age * batch => donor => sex will return a matrix whose rows are ages and columns are sexes, where each entry contains the number of cells which have the specific age and sex).

DataAxesFormats.Queries.POST_PROCESS Constant

POST_PROCESS := EltwiseOperation | GROUP_BY

A vector or a matrix result may be processed by one of:

  • Applying an EltwiseOperation operation to each value (e.g., / donor : age % Log base 2 will compute the log base 2 of the ages of all donors, and / gene / cell : UMIs % Log base 2 eps 1 will compute the log base 2 of 1 plus the UMIs count of each gene in each cell).
  • Reducing each group of vector entries or matrix rows into a single value (e.g., / cell : batch => donor => age @ type %> Mean will compute a vector of the mean age of the cells of each type, and / cell / gene : UMIs @ type %> Mean will compute a matrix of the mean UMIs of each gene for the cells of each type).
DataAxesFormats.Queries.GROUP_BY Constant

GROUP_BY := GroupBy VECTOR_FETCH * ReductionOperation IfMissing

The entries of a vector or the rows of a matrix result may be grouped, where all the values that have the same group value are reduced to a single value using a ReductionOperation (e.g., / cell : batch => donor => age @ type %> Mean will compute the mean age of all the cells of each type, and / cell / gene : UMIs @ type %> Mean will compute a matrix of the mean UMIs of each gene for the cells of each type).

If the group property is suffixed by AsAxis , then the result will have a value for each entry of the axis (e.g., / cell : age @ type ! %> Mean will compute the mean age of the cells of each type). In this case, some groups may have no values at all, which by default, is an error. Providing an IfMissing suffix will use the specified value for such empty groups instead (e.g., / cell : age @ type ! %> Mean || 0 will compute the mean age for the cells of each type, with a zero value for types for which there are no cells).

DataAxesFormats.Queries.MASK_OPERATION Constant

MASK_OPERATION := And | AndNot | Or | OrNot | Xor | XorNot

A query operation for restricting the set of entries of an Axis . The mask operations are applied to the current mask, so if several operations are applied, they are applied in order from left to right (e.g., / gene & is_marker | is_noisy &! is_lateral will first restrict the set of genes to marker genes, then expand it to include noisy genes as well, then remove all the lateral genes; this would be different from / gene & is_marker &! is_lateral | is_noisy , which will include all noisy genes even if they are lateral).

DataAxesFormats.Queries.VECTOR_FETCH Constant

VECTOR_FETCH := AsAxis ? Fetch IfMissing ? ( IfNot | AsAxis )?

Fetch the value of a property of an indirect axis. That is, there is a common pattern where one axis (e.g., cell) has a property (e.g., type) which has the same name as an axis, and whose values are (string) entry names of that axis. In this case, we often want to lookup a property of the other axis (e.g., / cell : type => color will evaluate to a vector of the color of the type of each cell). Sometimes one walks a chain of such properties (e.g., / cell : batch => donor => age ).

Sometimes it is needed to store several alternate properties that refer to the same indirect axis. In this case, the name of the property can begin with the axis name, followed by . and a suffix (e.g., / cell : type.manual => color will fetch the color of the manual type of each cell, still using the type axis).

If the property does not follow this convention, it is possible to manually specify the name of the axis using an AsAxis prefix (e.g., / cell : manual ! type => color will assume the value of the manual property is a vector of names of entries of the type axis).

As usual, if the property does not exist, this is an error, unless an IfMissing suffix is provided (e.g., / cell : type || red => color will assign all cells the color red if the type property does not exist).

If the value of the property is the empty string for some vector entries, by default this is again an error (as the empty string is not one of the values of the indirect axis). If an IfNot suffix is provided, such entries can be removed from the result (e.g., / cell : type ? => color will return a vector of the colors of the cells which have a non-empty type), or can be given an specific value (e.g., / cell : type ? red => color will return a vector of a color for each cell, giving the red color to cells with an empty type).

When using IfMissing and/or IfNot , the default value provided is always of the final value (e.g., / cell : batch || -1 ? -2 => donor || -3 ? -4 => age || -5 ? -6 will compute a vector if age per cell; if there's no batch property, all cells will get the age -1 ). If there is such property, then cells with an empty batch will get the age -2 . For cells with a non-empty batch, if there's no donor property, they will get the value -3 . If there is such a property, cells with an empty donor will get the value -4 . Finally, for cells with a batch and donor, if there is no age property, they will be given an age of -5 . Otherwise, if their age is zero, it will be changed to -6 .

DataAxesFormats.Queries.guess_typed_value Function
guess_typed_value(value::AbstractString)::StorageScalar

Given a string value, guess the typed value it represents:

  • true and false are assumed to be Bool .
  • Integers are assumed to be Int64 .
  • Floating point numbers are assumed to be Float64 , as are e and pi .
  • Anything else is assumed to be a string.

This doesn't have to be 100% accurate; it is intended to allow omitting the data type in most cases when specifying an IfMissing value. If it guesses wrong, just specify an explicit type (e.g., @ version || 1.0 String ).

Query Operators

Data Operators

DataAxesFormats.Queries.AsAxis Type
AsAxis([axis::AbstractString = nothing]) <: QueryOperation

There are three cases where we may want to take a vector property and consider each value to be the name of an entry of some axis: Fetch , CountBy and GroupBy . In a string Query , this is indicated by the ! operators, optionally followed by the name of the axis to use.

When using Fetch , we always lookup in some axis, so AsAxis is implied (e.g., / cell : type => color is identical to / cell : type ! => color ). In contrast, when using CountBy and GroupBy , one has to explicitly specify AsAxis to force using all the entries of the axis for the counting or grouping (e.g., / cell : age @ type %> Mean will return a vector of the mean age of every type that has cells associated with it, while / cell : age @ type ! %> Mean will return a vector of the mean age of each and every value of the type axis; similarly, / cell : type * age will generate a counts matrix whose rows are types that have cells associated with them, while / cell : type ! * age will generate a counts matrix whose rows are exactly the entries of the type axis).

Since the set of values is fixed by the axis matching the vector property, it is possible that, when using this for GroupBy , some groups would have no values, causing an error. This can be avoided by providing an IfMissing suffix to the reduction (e.g., / cell : age @ type ! %> Mean will fail if some type has no cells associated with it, while / cell : age @ type ! %> Mean || 0 will give such types a zero mean age).

Typically, the name of the base property is identical to the name of the axis. In this case, there is no need to specify the name of the axis (as in the examples above). Sometimes it is useful to be able to store several vector properties which all map to the same axis. To support this, we support a naming convention where the property name begins with the axis name followed by a .suffix . (e.g., both / cell : type => color and / cell : type.manual => color will look up the color of the type of some property of the cell axis - either "the" type of each cell , or the alternate type.manual of each cell).

If the property name does not follow the above conventions, then it is possible to explicitly specify the name of the axis (e.g., / cell : manual ! type => color will consider each value of the manual property as the name of an entry of the type axis and look up the matching color property value of this axis).

DataAxesFormats.Queries.Axis Type
Axis(axis::AbstractString) <: QueryOperation

A query operation for specifying a result axis. In a string Query , this is specified using the / operator followed by the axis name.

This needs to be specified at least once for a vector query (e.g., / cell : batch ), and twice for a matrix (e.g., / cell / gene : UMIs ). Axes can be filtered using Boolean masks using And , AndNot , Or , OrNot , Xor and XorNot (e.g., / gene & is_marker : is_noisy ). Alternatively, a single entry can be selected from the axis using IsEqual (e.g., / gene = FOX1 : is_noisy , / cell / gene = FOX1 : UMIs , / cell = C1 / gene = FOX1 : UMIs ). Finally, a matrix can be reduced into a vector, and a vector to a scalar, using ReductionOperation (e.g., / gene / cell : UMIs %> Sum %> Mean ).

Note

This, Names and Lookup are the only QueryOperation s that also works as a complete Query .

DataAxesFormats.Queries.CountBy Type
CountBy(property::AbstractString) <: QueryOperation

A query operation that generates a matrix of counts of combinations of pairs of values for the same entries of an axis. That is, it follows fetching some vector property, and is followed by fetching a second vector property of the same axis. The result is a matrix whose rows are the values of the 1st property and the columns are the values of the 2nd property, and the values are the number of times the combination of values appears. In a string Query , this is specified using the * operator, followed by the property name to look up (e.g., / cell : type * batch will generate a matrix whose rows correspond to cell types, whose columns correspond to cell batches, and whose values are the number of cells of each combination of batch and type).

By default, the rows and/or columns only contain actually seen values and are ordered alphabetically. However, it is common that one or both of the properties correspond to an axis. In this case, you can use an AsAxis suffix to force the rows and/or columns of the matrix to be exactly the entries of the specific axis (e.g., / cell : type ! * batch will generate a matrix whose rows are exactly the entries of the type axis, even if there is a type without any cells). This is especially useful when both properties are axes, as the result can be stored as a matrix property (e.g., / cell : type ! * batch ! will generate a matrix whose rows are the entries of the type axis, and whose columns are the entries of the batch axis, so it can be given to set_matrix!(daf, "type", "batch", ...) ).

The raw counts matrix can be post-processed like any other matrix (using ReductionOperation or an EltwiseOperation ). This allows computing useful aggregate properties (e.g., / cell : type * batch % Fractions will generate a matrix whose columns correspond to batches and whose rows are the fraction of the cells from each type within each batch).

DataAxesFormats.Queries.Fetch Type
Fetch(property::AbstractString) <: QueryOperation

A query operation for fetching the value of a property from another axis, based on a vector property whose values are entry names of the axis. In a string Query , this is specified using the => operator, followed by the name to look up.

That is, if you query for the values of a vector property (e.g., batch for each cell ), and the name of this property is identical to some axis name, then we assume each value is the name of an entry of this axis. We use this to fetch the value of some other property (e.g., age ) of that axis (e.g., / cell : batch => age ).

It is useful to be able to store several vector properties which all map to the same axis. To support this, we support a naming convention where the property name begins with the axis name followed by a .suffix . (e.g., both / cell : type => color and / cell : type.manual => color will look up the color of the type of some property of the cell axis - either "the" type of each cell , or the alternate type.manual of each cell).

Fetching can be chained (e.g., / cell : batch => donor => age will fetch the age of the donor of the batch of each cell ).

If the property does not exist, this is an error, unless this is followed by IfMissing (e.g., / cell : type => color || red ). If the property contains an empty value, this is also an error, unless it is followed by an IfNot (e.g., / cell : type ? => color will compute a vector of the colors of the type of the cells that have a non-empty type, and / cell : batch ? 0 => donor => age will assign a zero age for cells which have an empty batch).

DataAxesFormats.Queries.GroupBy Type
GroupBy(property::AbstractString) <: QueryOperation

A query operation that uses a (following) ReductionOperation to aggregate the values of each group of values. Will fetch the specified property_name (possibly followed by additional Fetch operations) and use the resulting vector for the name of the group of each value.

If applied to a vector, the result is a vector with one entry per group (e.g., / cell : age @ type %> Mean will generate a vector with an entry per cell type and whose values are the mean age of the cells of each type). If applied to a matrix, the result is a matrix with one row per group (e.g., / cell / gene : UMIs @ type %> Max will generate a matrix with one row per type and one column per gene, whose values are the maximal UMIs count of the gene in the cells of each type).

By default, the result uses only group values we actually observe, in sorted order. However, if the operation is followed by an AsAxis suffix, then the fetched property must correspond to an existing axis (similar to when using Fetch ), and the result will use the entries of the axis, even if we do not observe them in the data (and will ignore vector entries with an empty value). In this case, the reduction operation will fail if there are no values for some group, unless it is followed by an IfMissing suffix (e.g., / cell : age @ type ! %> Mean will generate a vector whose entries are all the entries of the type axis, and will ignore cells with an empty type; this will fail if there are types which are not associated with any cell. In contrast, / cell : age @ type ! %> Mean || 0 will succeed, assigning a value of zero for types which have no cells associated with them).

DataAxesFormats.Queries.IfMissing Type
IfMissing(value::StorageScalar; dtype::Maybe{Type} = nothing) <: QueryOperation

A query operation providing a value to use if the data is missing some property. In a string Query , this is specified using the || operator, followed by the value to use, and optionally followed by the data type of the value (e.g., : score || 1 Float32 ).

If the data type is not specified, and the value isa AbstractString , then the data type is deduced using guess_typed_value of the value .

DataAxesFormats.Queries.IfNot Type
IfNot(value::Maybe{StorageScalar} = nothing) <: QueryOperation

A query operation providing a value to use for "false-ish" values in a vector (empty strings, zero numeric values, or false Boolean values). In a string Query , this is indicated using the ?? operator, optionally followed by a value to use.

If the value is nothing (the default), then these entries are dropped (masked out) of the result (e.g., / cell : type ? behaves the same as / cell & type : type , that is, returns the type of the cells which have a non-empty type). Otherwise, this value is used instead of the "false-ish" value (e.g., / cell : type ? Outlier will return a vector of the type of each cell, with the value Outlier for cells with an empty type). When fetching properties, this is the final value (e.g., / cell : type ? red => color will return a vector of the color of the type of each cell, with a red color for the cells with an empty type).

If the value isa AbstractString , then it is automatically converted to the data type of the elements of the results vector.

DataAxesFormats.Queries.Lookup Type
Lookup(property::AbstractString) <: Query

A query operation for looking up the value of a property with some name. In a string Query , this is specified using the : operator, followed by the property name to look up.

  • If the query state is empty, this looks up the value of a scalar property (e.g., : version ).
  • If the query state contains a single axis, this looks up the value of a vector property (e.g., / cell : batch ).
  • If the query state contains two axes, this looks up the value of a matrix property (e.g., / cell / gene : UMIs ).

If the property does not exist, this is an error, unless this is followed by IfMissing (e.g., : version || 1.0 ).

If any of the axes has a single entry selected using IsEqual , this will reduce the dimension of the result (e.g., / cell / gene = FOX1 : UMIs is a vector, and both / cell = C1 / gene = FOX1 : UMIs and / gene = FOX1 : is_marker are scalars).

Note

This, Names and Axis are the only QueryOperation s that also works as a complete Query .

DataAxesFormats.Queries.Names Type
Names(kind::Maybe{AbstractString} = nothing) <: Query

A query operation for looking up a set of names. In a string Query , this is specified using the ? operator, optionally followed by the kind of objects to name.

  • If the query state is empty, a kind must be specified, one of scalars or axes , and the result is the set of their names ( ? scalars , ? axes ).
  • If the query state contains a single axis (without any masks), the kind must not be specified, and the result is the set of names of vector properties of the axis (e.g., / cell ? ).
  • If the query state contains two axes (without any masks), the kind must not be specified, and the result is the set of names of matrix properties of the axes (e.g., / cell / gene ? ).
Note

This, Lookup and Axis are the only QueryOperation s that also works as a complete Query .

Comparison Operators

DataAxesFormats.Queries.IsEqual Type
IsEqual(value::StorageScalar) <: QueryOperation

Equality is used for two purposes:

  • As a comparison operator, similar to IsLess except that uses = instead of < for the comparison.
  • To select a single entry from a vector. This allows a query to select a single scalar from a vector (e.g., / gene = FOX1 : is_marker ) or from a matrix (e.g., / cell = ATCG / gene = FOX1 : UMIs ); or to slice a single vector from a matrix (e.g., / cell = ATCG / gene : UMIs or / cell / gene = FOX1 : UMIs ).
DataAxesFormats.Queries.IsLess Type
IsLess(value::StorageScalar) <: QueryOperation

A query operation for converting a vector value to a Boolean mask by comparing it some value. In a string Query , this is specified using the < operator, followed by the value to compare with.

A string value is automatically converted into the same type as the vector values (e.g., / cell & probability < 0.5 will restrict the result vector only to cells whose probability is less than half).

DataAxesFormats.Queries.IsMatch Type
IsMatch(value::Union{AbstractString, Regex}) <: QueryOperation

Similar to IsLess except that the compared values must be strings, and the mask is of the values that match the given regular expression.

Mask Operators

DataAxesFormats.Queries.And Type
And(property::AbstractString) <: QueryOperation

A query operation for restricting the set of entries of an Axis . In a string Query , this is specified using the & operator, followed by the name of an axis property to look up to compute the mask.

The mask may be just the fetched property (e.g., / gene & is_marker will restrict the result vector to only marker genes). If the value of the property is not Boolean, it is automatically compared to 0 or the empty string, depending on its type (e.g., / cell & type will restrict the result vector to only cells which were given a non-empty-string type annotation). It is also possible to fetch properties from other axes, and use an explicit ComparisonOperation to compute the Boolean mask (e.g., / cell & batch => age > 1 will restrict the result vector to cells whose batch has an age larger than 1).

DataAxesFormats.Queries.AndNot Type
AndNot(property::AbstractString) <: QueryOperation

Same as And but use the inverse of the mask. In a string Query , this is specified using the &! operator, followed by the name of an axis property to look up to compute the mask.

DataAxesFormats.Queries.Or Type
Or(property::AbstractString) <: QueryOperation

A query operation for expanding the set of entries of an Axis . In a string Query , this is specified using the | operator, followed by the name of an axis property to look up to compute the mask.

This works similarly to And , except that it adds to the mask (e.g., / gene & is_marker | is_noisy will restrict the result vector to either marker or noisy genes).

DataAxesFormats.Queries.OrNot Type
OrNot(property::AbstractString) <: QueryOperation

Same as Or but use the inverse of the mask. In a string Query , this is specified using the |! operator, followed by the name of an axis property to look up to compute the mask.

DataAxesFormats.Queries.Xor Type
Xor(property::AbstractString) <: QueryOperation

A query operation for flipping the set of entries of an Axis . In a string Query , this is specified using the ^ operator, followed by the name of an axis property to look up to compute the mask.

This works similarly to Or , except that it flips entries in the mask (e.g., / gene & is_marker ^ is_noisy will restrict the result vector to either marker or noisy genes, but not both).

DataAxesFormats.Queries.XorNot Type
XorNot(property::AbstractString) <: QueryOperation

Same as Xor but use the inverse of the mask. In a string Query , this is specified using the ^! operator, followed by the name of an axis property to look up to compute the mask.

DataAxesFormats.Queries.MaskSlice Type
MaskSlice(axis_name::AbstractString) <: QueryOperation

A query operation for using a slice of a matrix as a mask, when the other axis of the matrix is different from the mask axis. This needs to be followed by the axis entry to slice. In a string Query , this is specified using the ; operator, followed by the name of the axis for looking up the matrix, then followed by = and the value identifying the slice.

That is, suppose we have a UMIs matrix per cell per gene, and we'd like to select all the cells which have non-zero UMIs for the FOX1 gene. Then we can say / cell & UMIs ; gene = FOX1 > 0 (or just / cell & UMIs ; gene = FOX1 .

DataAxesFormats.Queries.SquareMaskColumn Type
SquareMaskColumn(comparison_value::AbstractString) <: QueryOperation

Similar to MaskSlice but is used when the mask matrix is square and we'd like to use a column as a mask. This therefore only needs specifying the column to slice. In a string Query , this is specified using the ;= operator followed by the value identifying the slice.

That is, suppose we have a KNN graph between cells as a cell-cell matrix where each column contains the weights of the outgoing edges from each cell to the rest. To select all the cells reachable from a particular one. Then we can say / cell & outgoing ;= ATCG > 0 (or just / cell & outgoing ;= ATCG ). If we also want to include the source cell we'd need to say / cell & name = ATCG | outgoing ;= ATCG , etc.

DataAxesFormats.Queries.SquareMaskRow Type
SquareMaskRow(comparison_value::AbstractString) <: QueryOperation

Similar to SquareMaskRow but is used when the mask matrix is square and we'd like to use a row as a mask. This therefore only needs specifying the row to slice. In a string Query , this is specified using the ,= operator followed by the value identifying the slice.

That is, suppose we have a KNN graph as above and we'd like to select all cells that can reach a particular one. Then / cell & outgoing ,= ATCG > 0 (or just / cell & outgoing ,= ATCG ). If we also want to include the source cell we'd need to say / cell & name = ATCG | outgoing ,= ATCG , etc.

Index