matchClasses              package:e1071              R Documentation

_F_i_n_d _s_i_m_i_l_a_r _c_l_a_s_s_e_s _i_n _t_w_o-_w_a_y _c_o_n_t_i_n_g_e_n_c_y _t_a_b_l_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     Try to find a mapping between the two groupings, such that as many
     cases as possible are in one of the matched pairs.

_U_s_a_g_e:

     matchClasses(tab, method="rowmax", iter=1, maxexact=9, verbose=TRUE)
     compareMatchedClasses(x, y, method="rowmax", iter=1,
                           maxexact=9, verbose=FALSE)

_A_r_g_u_m_e_n_t_s:

     tab: Two-way contingency table of class memberships

  method: One of '"rowmax"', '"greedy"' or '"exact"'.

    iter: Number of iterations used in greedy search.

 verbose: If 'TRUE', display some status messages during computation.

maxexact: Maximum number of variables for which all possible
          permutations are computed.

    x, y: Vectors or matrices with class memberships.

_D_e_t_a_i_l_s:

     If 'method="rowmax"', then each class defining a row in the
     contingency table is mapped to the column of the correspoding row
     maximum. Hence, some columns may be mapped to more than one row
     (while each row is mapped to a single column).

     If 'method="greedy"' or 'method="exact"', then the contingency
     table must be a square matrix and a unique mapping is computed.
     This corresponds to a permutation of columns and rows, such that
     sum of the main diagonal, i.e., the trace of the matrix, gets as
     large as possible. For both methods, first all pairs where row and
     columns maxima correspond and are bigger than the sum of all other
     elements in the corresponding columns and rows together are
     located and fixed (this is a necessary condition for maximal
     trace).

     If 'method="exact"', then for the remaining rows and columns, all
     possible permutations are computed and the optimum is returned.
     This can get computationally infeasible very fast. If more than
     'maxexact' rows and columns remain after applying the necessary
     condition, then 'method' is reset to '"greedy"'. If
     'method="greedy"', then a greedy heuristic is tried 'iter' times.
     Repeatedly a row is picked at random and matched to the free
     column with the maximum value.

     'compareMatchedClasses()' computes the contingency table for each
     combination of columns from 'x' and 'y' and applies 'matchClasses'
     to that table. The columns of the table are permuted accordingly
     and then the table is passed to 'classAgreement'. The resulting
     agreement coefficients (diag, kappa, ...) are returned. The return
     value of 'compareMatchedClasses()' is a list containing a matrix
     for each coefficient; with element (k,l) corresponding to the k-th
     column of 'x' and l-th column of 'y'. If 'y' is missing, then the
     columns of 'x' are compared with each other.

_A_u_t_h_o_r(_s):

     Friedrich Leisch

_S_e_e _A_l_s_o:

     'classAgreement'

_E_x_a_m_p_l_e_s:

     ## a stupid example with no class correlations:
     g1 <- sample(1:5, size=1000, replace=TRUE)
     g2 <- sample(1:5, size=1000, replace=TRUE)
     tab <- table(g1, g2)
     matchClasses(tab, "exact")

     ## let pairs (g1=1,g2=4) and (g1=3,g2=1) agree better
     k <- sample(1:1000, size=200)
     g1[k] <- 1
     g2[k] <- 4

     k <- sample(1:1000, size=200)
     g1[k] <- 3
     g2[k] <- 1

     tab <- table(g1, g2)
     matchClasses(tab, "exact")

     ## get agreement coefficients:
     compareMatchedClasses(g1, g2, method="exact")

