JudiLingMeasures.jl
This is code for JudiLingMeasures.
Requires JudiLing 0.5.5. Update your JudiLing version by running
using Pkg
Pkg.update("JudiLing")If this step does not work, i.e. the version of JudiLing is still not 0.5.5, refer to this forum post for a workaround.
How to use
For a demo of this package, please see notebooks/measures_demo.ipynb.
Calculating measures in this package
The following gives an overview over all measures available in this package. For a closer description of the parameters, please refer to the documentation provided with the code. All measures come with examples. In order to run them, first run the following piece of code, taken from the Readme of the JudiLing package. For a detailed explanation of this code please refer to the JudiLing Readme and documentation.
using JudiLing
using CSV # read csv files into dataframes
using DataFrames # parse data into dataframes
using JudiLingMeasures
# if you haven't downloaded this file already, get it here:
download("https://osf.io/2ejfu/download", "latin.csv")
latin =
DataFrame(CSV.File(joinpath(@__DIR__, "latin.csv")));
cue_obj = JudiLing.make_cue_matrix(
latin,
grams = 3,
target_col = :Word,
tokenized = false,
keep_sep = false
);
n_features = size(cue_obj.C, 2);
S = JudiLing.make_S_matrix(
latin,
["Lexeme"],
["Person", "Number", "Tense", "Voice", "Mood"],
ncol = n_features
);
G = JudiLing.make_transform_matrix(S, cue_obj.C);
F = JudiLing.make_transform_matrix(cue_obj.C, S);
Chat = S * G;
Shat = cue_obj.C * F;
A = cue_obj.A;
max_t = JudiLing.cal_max_timestep(latin, :Word);At the moment, there is a bug in JudiLing.learn_paths_rpi. We therefore use the patched version from JudiLingMeasures. Make sure that you set check_gold_path=true.
res_learn, gpi_learn, rpi_learn = JudiLingMeasures.learn_paths_rpi(
latin,
latin,
cue_obj.C,
S,
F,
Chat,
A,
cue_obj.i2f,
cue_obj.f2i, # api changed in 0.3.1
gold_ind = cue_obj.gold_ind,
Shat_val = Shat,
check_gold_path = true,
max_t = max_t,
max_can = 10,
grams = 3,
threshold = 0.05,
tokenized = false,
sep_token = "_",
keep_sep = false,
target_col = :Word,
issparse = :dense,
verbose = false,
);All available measures can be simply computed with
all_measures = JudiLingMeasures.compute_all_measures(latin, # the data of interest
cue_obj, # the cue_obj of the training data
cue_obj, # the cue_obj of the data of interest
Chat, # the Chat of the data of interest
S, # the S matrix of the data of interest
Shat, # the Shat matrix of the data of interest
res_learn, # the output of learn_paths for the data of interest
gpi_learn, # the gpi_learn object of the data of interest
rpi_learn); # the rpi_learn object of the data of interestOverview over all available measures
Measures capturing comprehension (processing on the semantic side of the network)
Measures of semantic vector length/uncertainty/activation
L1Norm
Computes the L1-Norm (city-block distance) of the predicted semantic vectors $\hat{S}$:
Example:
JudiLingMeasures.L1Norm(Shat)Used in Schmitz et al. (2021), Stein and Plag (2021) (called Semantic Vector length in their paper)
L2Norm
Computes the L2-Norm (euclidean distance) of the predicted semantic vectors $\hat{S}$:
Example:
JudiLingMeasures.L2Norm(Shat)Used in Schmitz et al. (2021)
Measures of semantic neighbourhood
Density
Computes the average correlation/cosine similarity of each predicted semantic vector in $\hat{S}$ with the $n$ most correlated/closest semantic vectors in $S$:
Example:
_, cor_s = JudiLing.eval_SC(Shat, S, R=true) correlation_density = JudiLingMeasures.density(cor_s, 10) cosine_sims = JudiLingMeasures.cosine_similarity(Shat, S) cosine_density = JudiLingMeasures.density(cosine_sim, 10)Used in Heitmeier et al. (2022) (called Semantic Density, based on Cosine Similarity), Schmitz et al. (2021), Stein and Plag (2021) (called Semantic Density, based on correlation)
ALC
Average Lexical Correlation. Computes the average correlation between each predicted semantic vector and all semantic vectors in $S$.
Example:
_, cor_s = JudiLing.eval_SC(Shat, S, R=true) JudiLingMeasures.ALC(cor_s)Used in Schmitz et al. (2021), Chuang et al. (2020)
EDNN
Euclidean Distance Nearest Neighbour. Computes the euclidean distance between each predicted semantic vector and all semantic vectors in $S$ and returns for each predicted semantic vector the distance to the closest neighbour.
Example:
JudiLingMeasures.EDNN(Shat, S)Used in Schmitz et al. (2021), Chuang et al. (2020)
NNC
Nearest Neighbour Correlation. Computes the correlation between each predicted semantic vector and all semantic vectors in $S$ and returns for each predicted semantic vector the correlation to the closest neighbour.
Example:
_, cor_s = JudiLing.eval_SC(Shat, S, R=true) JudiLingMeasures.NNC(cor_s)Used in Schmitz et al. (2021), Chuang et al. (2020)
Measures of comprehension accuracy
TargetCorrelation
Correlation between each predicted semantic vector and its target semantic vector in $S$.
Example:
_, cor_s = JudiLing.eval_SC(Shat, S, R=true) JudiLingMeasures.TargetCorrelation(cor_s)Used in Stein and Plag (2021)
Rank
Rank of the correlation with the target semantics among the correlations between the predicted semantic vector and all semantic vectors in $S$.
Example:
_, cor_s = JudiLing.eval_SC(Shat, S, R=true) JudiLingMeasures.rank(cor_s)Recognition
Whether a word form was correctly comprehended. Not currently implemented.
NOT YET IMPLEMENTED
Measures of production accuracy/support/uncertainty for the predicted form
SCPP
The correlation between the predicted semantics of the word form produced by the path algorithm and the target semantics.
Example:
df = JudiLingMeasures.get_res_learn_df(res_learn, latin, cue_obj, cue_obj) JudiLingMeasures.SCPP(df, latin)Used in Chuang et al. (2020) (based on WpmWithLDL)
PathSum
The summed path supports for the highest supported predicted form, produced by the path algorithm. Path supports are taken from the $\hat{Y}$ matrices.
Example:
pred_df = JudiLing.write2df(rpi_learn) JudiLingMeasures.path_sum(pred_df)Used in Schmitz et al. (2021) (but based on WpmWithLDL)
TargetPathSum
The summed path supports for the target word form, produced by the path algorithm. Path supports are taken from the $\hat{Y}$ matrices.
Example:
JudiLingMeasures.target_path_sum(gpi_learn)Used in Chuang et al. (2022) (but called Triphone Support)
PathSumChat
The summed path supports for the highest supported predicted form, produced by the path algorithm. Path supports are taken from the $\hat{C}$ matrix.
Example:
JudiLingMeasures.path_sum_chat(res_learn, Chat)C-Precision
Correlation between the predicted form vector and the target form vector.
Example:
JudiLingMeasures.c_precision(Chat, cue_obj.C)Used in Heitmeier et al. (2022), Gahl and Baayen (2022) (called Semantics to Form Mapping Precision)
L1Chat
L1-Norm of the predicted $\hat{c}$ vectors.
Example:
JudiLingMeasures.L1Norm(Chat)Used in Heitmeier et al. (2022)
Semantic Support for Form
Sum of activation of ngrams in the target wordform.
Example:
JudiLingMeasures.semantic_support_for_form(cue_obj, Chat)Used in Gahl and Baayen (2022) (unclear which package this was based on?)
Measures of support for the predicted path, focusing on the path transitions and components of the path
LastSupport
The support for the last trigram of each target word in the Chat matrix.
Example:
JudiLingMeasures.last_support(cue_obj, Chat)Used in Schmitz et al. (2021) (called Support in their paper).
WithinPathEntropies
The entropy over path supports for the highest supported predicted form, produced by the path algorithm. Path supports are taken from the $\hat{Y}$ matrices.
Example:
pred_df = JudiLing.write2df(rpi_learn) JudiLingMeasures.within_path_entropies(pred_df)MeanWordSupport
Summed path support divided by each word form's length. Path supports are taken from the $\hat{Y}$ matrices.
Example:
pred_df = JudiLing.write2df(rpi_learn) JudiLingMeasures.mean_word_support(res_learn, pred_df)MeanWordSupportChat
Summed path support divided by each word form's length. Path supports are taken from the $\hat{C}$ matrix.
Example:
JudiLingMeasures.mean_word_support_chat(res_learn, Chat)Used in Stein and Plag (2021) (but based on WpmWithLDL)
lwlr
The ratio between the predicted form's length and its weakest support from the production algorithm. Supports taken from the $\hat{Y}$ matrices.
Example:
pred_df = JudiLing.write2df(rpi_learn) JudiLingMeasures.lwlr(res_learn, pred_df)lwlrChat
The ratio between the predicted form's length and its weakest support. Supports taken from the $\hat{C}$ matrix.
Example:
JudiLingMeasures.lwlr_chat(res_learn, Chat)
Measures of support for competing forms
PathCounts
The number of candidates predicted by the path algorithm.
Example:
df = JudiLingMeasures.get_res_learn_df(res_learn, latin, cue_obj, cue_obj) JudiLingMeasures.PathCounts(df)Used in Schmitz et al. (2021) (but based on WpmWithLDL)
PathEntropiesChat
The entropy over the summed path supports for the candidate forms produced by the path algorithm. Path supports are taken from the $\hat{C}$ matrix.
Example:
JudiLingMeasures.path_entropies_chat(res_learn, Chat)Used in Schmitz et al. (2021) (but based on WpmWithLDL), Stein and Plag (2021) (but based on WpmWithLDL)
PathEntropiesSCP
The entropy over the semantic supports for the candidate forms produced by the path algorithm.
Example:
df = JudiLingMeasures.get_res_learn_df(res_learn, latin, cue_obj, cue_obj) JudiLingMeasures.path_entropies_scp(df)ALDC
Average Levenstein Distance of Candidates. Average of Levenshtein distance between each predicted word form candidate and the target word form.
Example:
df = JudiLingMeasures.get_res_learn_df(res_learn, latin, cue_obj, cue_obj) JudiLingMeasures.ALDC(df)Used in Schmitz et al. (2021), Chuang et al. (2020) (both based on WpmWithLDL)