Helpers

This page contains information on additional helper functions in this package.

JudiLingMeasures.compute_all_measuresMethod
compute_all_measures(data_val::DataFrame,
                     cue_obj_train::JudiLing.Cue_Matrix_Struct,
                     cue_obj_val::JudiLing.Cue_Matrix_Struct,
                     Chat_val::Union{JudiLing.SparseMatrixCSC, Matrix},
                     S_val::Union{JudiLing.SparseMatrixCSC, Matrix},
                     Shat_val::Union{JudiLing.SparseMatrixCSC, Matrix},
                     res_learn::Array{Array{JudiLing.Result_Path_Info_Struct,1},1},
                     gpi_learn::Array{JudiLing.Gold_Path_Info_Struct,1},
                     rpi_learn::Array{JudiLing.Gold_Path_Info_Struct,1})

Compute all measures currently available in JudiLingMeasures for the data of interest.

Arguments

  • data_val::DataFrame: The data for which measures should be calculated (the data of interest).
  • cue_obj_train::JudiLing.Cue_Matrix_Struct: The cue object of the training data.
  • cue_obj_val::JudiLing.Cue_Matrix_Struct: The cue object of the data of interest.
  • Chat_val::Union{JudiLing.SparseMatrixCSC, Matrix}: The Chat matrix of the data of interest.
  • S_val::Union{JudiLing.SparseMatrixCSC, Matrix}: The S matrix of the data of interest.
  • Shat_val::Union{JudiLing.SparseMatrixCSC, Matrix}: The Shat matrix of the data of interest.
  • res_learn::Array{Array{JudiLing.Result_Path_Info_Struct,1},1}: The first output of JudiLingMeasures.learnpathsrpi (with check_gold_path=true)
  • gpi_learn::Array{JudiLing.Gold_Path_Info_Struct,1}: The second output of JudiLingMeasures.learnpathsrpi (with check_gold_path=true)
  • rpi_learn::Array{JudiLing.Gold_Path_Info_Struct,1}: The third output of JudiLingMeasures.learnpathsrpi (with check_gold_path=true)

Returns

  • results::DataFrame: A dataframe with all information in data_val plus all the computed measures.
source
JudiLingMeasures.correlation_diagonal_rowwiseMethod
function correlation_diagonal_rowwise(S1, S2)

Computes the pairwise correlation of each row in S1 and S2, i.e. only the diagonal of the correlation matrix.

Example

julia> ma1 = [[1 2 3]; [-1 -2 -3]; [1 2 3]]
julia> ma4 = [[1 2 2]; [1 -2 -3]; [0 2 3]]
julia> correlation_diagonal_rowwise(ma1, ma4)
3-element Array{Float64,1}:
 0.8660254037844387
 0.9607689228305228
 0.9819805060619657
source
JudiLingMeasures.correlation_rowwiseMethod
correlation_rowwise(S1::Union{JudiLing.SparseMatrixCSC, Matrix},
                    S2::Union{JudiLing.SparseMatrixCSC, Matrix})

Compute the correlation between each row of S1 with all rows in S2.

Example

julia> ma2 = [[1 2 1 1]; [1 -2 3 1]; [1 -2 3 3]; [0 0 1 2]]
julia> ma3 = [[-1 2 1 1]; [1 2 3 1]; [1 2 0 1]; [0.5 -2 1.5 0]]
julia> correlation_rowwise(ma2, ma3)
4×4 Matrix{Float64}:
  0.662266   0.174078    0.816497  -0.905822
 -0.41762    0.29554    -0.990148   0.988623
 -0.308304   0.0368355  -0.863868   0.862538
  0.207514  -0.0909091  -0.426401   0.354787
source
JudiLingMeasures.cosine_similarityMethod
cosine_similarity(s_hat_collection, S)

Calculate cosine similarity between all predicted and all target semantic vectors

Example

julia> ma1 = [[1 2 3]; [-1 -2 -3]; [1 2 3]]
julia> ma4 = [[1 2 2]; [1 -2 -3]; [0 2 3]]
julia> cosine_similarity(ma1, ma4)
3×3 Array{Float64,2}:
  0.979958  -0.857143   0.963624
 -0.979958   0.857143  -0.963624
  0.979958  -0.857143   0.963624
source
JudiLingMeasures.count_rowsMethod
count_rows(dat::DataFrame)

Get the number of rows in dat.

Examples

julia> dat = DataFrame("text"=>[1,2,3])
julia> count_rows(dat)
 3
source
JudiLingMeasures.entropyMethod
entropy(ps::Union{Missing, Array, SubArray})

Compute the Shannon-Entropy of the values in ps bigger than 0.

Note: the result of this is entropy function is different to other entropy measures as a) the values are scaled between 0 and 1 first, and b) log2 instead of log is used

Examples

julia> ps = [0.1, 0.2, 0.9]
julia> entropy(ps)
1.0408520829727552
source
JudiLingMeasures.euclidean_distance_arrayMethod
euclidean_distance_array(Shat::Union{JudiLing.SparseMatrixCSC, Matrix},
                         S::Union{JudiLing.SparseMatrixCSC, Matrix})

Calculate the pairwise Euclidean distances between all rows in Shat and S.

Throws error if missing is included in any of the arrays.

Examples

julia> ma1 = [[1 2 3]; [-1 -2 -3]; [1 2 3]]
julia> ma4 = [[1 2 2]; [1 -2 -3]; [0 2 3]]
julia> euclidean_distance_array(ma1, ma4)
3×3 Matrix{Float64}:
 1.0     7.2111  1.0
 6.7082  2.0     7.28011
 1.0     7.2111  1.0
source
JudiLingMeasures.get_avg_levenshteinMethod
get_avg_levenshtein(targets::Array, preds::Array)

Get the average levenshtein distance between two lists of strings.

Examples

julia> targets = ["abc", "abc", "abc"]
julia> preds = ["abd", "abc", "ebd"]
julia> get_avg_levenshtein(targets, preds)
 1.0
source
JudiLingMeasures.get_nearest_neighbour_euclMethod
get_nearest_neighbour_eucl(eucl_sims::Matrix)

Get the nearest neighbour for each row in eucl_sims.

Examples

julia> ma1 = [[1 2 3]; [-1 -2 -3]; [1 2 3]]
julia> ma4 = [[1 2 2]; [1 -2 -3]; [0 2 3]]
julia> eucl_sims = euclidean_distance_array(ma1, ma4)
julia> get_nearest_neighbour_eucl(eucl_sims)
3-element Vector{Float64}:
 1.0
 2.0
 1.0
source
JudiLingMeasures.l1_rowwiseMethod
l1_rowwise(M::Union{JudiLing.SparseMatrixCSC, Matrix})

Compute the L1 Norm of each row of M.

Example

julia> ma1 = [[1 2 3]; [-1 -2 -3]; [1 2 3]]
julia> l1_rowwise(ma1)
3×1 Matrix{Int64}:
 6
 6
 6
source
JudiLingMeasures.l2_rowwiseMethod
l2_rowwise(M::Union{JudiLing.SparseMatrixCSC, Matrix})

Compute the L2 Norm of each row of M.

Example

julia> ma1 = [[1 2 3]; [-1 -2 -3]; [1 2 3]]
julia> l2_rowwise(ma1)
3×1 Matrix{Float64}:
 3.7416573867739413
 3.7416573867739413
 3.7416573867739413
source
JudiLingMeasures.learn_paths_rpiMethod
learn_paths_rpi(data_train, data_val, C_train, S_val, F_train, Chat_val, A, i2f, f2i)

Calculate learn_paths with results indices supports as well.

THIS IS A PATCH OF THE JudiLing.learn_paths_rpi as long as it is not fixed there. If it is, it will be removed from this package.

source
JudiLingMeasures.make_measure_preparationsMethod
function make_measure_preparations(data_val, S_val, Shat_val,
                                   res_learn, cue_obj_train, cue_obj_val,
                                   rpi_learn)

Returns all additional objects needed for measure calculations. The data for which measures are to be calculated is called "data of interest".

Arguments

  • data_val: The data for which the measures are to be calculated (data of interest).
  • S_val: The semantic matrix of the data of interest
  • Shat_val: The predicted semantic matrix of the data of interest.
  • res_learn: The first object return by the learn_paths_rpi algorithm for the data of interest.
  • cue_obj_train: The cue object of the training data.
  • cue_obj_val: The cue object of the data of interest.
  • rpi_learn: The second object return by the learn_paths_rpi algorithm for the data of interest.

Returns

  • results::DataFrame: A deepcopy of data_val.
  • cor_s::Matrix: Correlation matrix between Shat_val and S_val.
  • df::DataFrame: The output of res_learn (of the data of interest) in form of a dataframe
  • rpi_df::DataFrame: Stores the path information about the predicted forms (from learn_paths), which is needed to compute things like PathSum, PathCounts and PathEntropies.
source
JudiLingMeasures.max_rowwiseMethod
max_rowwise(S::Union{JudiLing.SparseMatrixCSC, Matrix})

Get the maximum of each row in S.

Examples

julia> ma1 = [[1 2 3]; [-1 -2 -3]; [1 2 3]]
julia> max_rowwise(ma1)
3×1 Matrix{Int64}:
 3
 -1
 3
source
JudiLingMeasures.mean_rowwiseMethod
mean_rowwise(S::Union{JudiLing.SparseMatrixCSC, Matrix})

Calculate the mean of each row in S.

Examples

julia> ma1 = [[1 2 3]; [-1 -2 -3]; [1 2 3]]
julia> mean_rowwise(ma1)
3×1 Matrix{Float64}:
  2.0
 -2.0
  2.0
source
JudiLingMeasures.safe_lengthMethod
safe_length(x::Union{Missing, String})

Compute length of x, if x is missing return missing

Example

julia> safe_length(missing)
missing
julia> safe_length("abc")
3
source
JudiLingMeasures.safe_sumMethod
safe_sum(x::Array)

Compute sum of all elements of x, if x is empty return missing

Example

julia> safe_sum([])
missing
julia> safe_sum([1,2,3])
6
source
JudiLingMeasures.sem_density_meanMethod
sem_density_mean(s_cor::Union{JudiLing.SparseMatrixCSC, Matrix},
                 n::Int)

Compute the average semantic density of the predicted semantic vector with its n most correlated semantic neighbours.

Arguments

  • s_cor::Union{JudiLing.SparseMatrixCSC, Matrix}: the correlation matrix between S and Shat
  • n::Int: the number of highest semantic neighbours to take into account

Example

julia> ma2 = [[1 2 1 1]; [1 -2 3 1]; [1 -2 3 3]; [0 0 1 2]]
julia> ma3 = [[-1 2 1 1]; [1 2 3 1]; [1 2 0 1]; [0.5 -2 1.5 0]]
julia> cor_s = correlation_rowwise(ma2, ma3)
julia> sem_density_mean(cor_s, 2)
4-element Vector{Float64}:
 0.7393813797301239
 0.6420816485652429
 0.4496869233815781
 0.281150888376636
source