classAgreement             package:e1071             R Documentation

_C_o_e_f_f_i_c_i_e_n_t_s _c_o_m_p_a_r_i_n_g _c_l_a_s_s_i_f_i_c_a_t_i_o_n _a_g_r_e_e_m_e_n_t

_D_e_s_c_r_i_p_t_i_o_n:

     'classAgreement()' computes several coefficents of agreement
     between the columns and rows of a 2-way contingency table.

_U_s_a_g_e:

     classAgreement(tab, match.names=FALSE)

_A_r_g_u_m_e_n_t_s:

     tab: A 2-dimensional contingency table.

match.names: Flag whether row and columns should be matched by name.

_D_e_t_a_i_l_s:

     Suppose we want to compare two classifications summarized by the
     contingency table T=[t_{ij}] where i,j=1,...,K and t_{ij} denotes
     the number of data points which are in class i in the first
     partition and in class j in the second partition. If both
     classifications use the same labels, then obviously the two
     classification agree completely if only elements in the main
     diagonal of the table are non-zero. On the other hand, large
     off-diagonal elements correspond to smaller agreement between the
     two classifications. If 'match.names' is 'TRUE', the class labels
     as given by the row and column names are matched, i.e. only
     columns and rows with the same dimnames are used for the
     computation.

     If the two classification do not use the same set of labels, or if
     identical labels can have different meaning (e.g., two outcomes of
     cluster analysis on the same data set), then the situation is a
     little bit more complicated. Let A denote the number of all pairs
     of data points which are either put into the same cluster by both
     partitions or put into different clusters by both partitions.
     Conversely, let D denote the number of all pairs of data points
     that are put into one cluster in one partition, but into different
     clusters by the other partition.  Hence, the partitions disagree
     for all pairs D and agree for all pairs A. We can measure the
     agreement by the Rand index A/(A+D) which is invariant with
     respect to permutations of the columns or rows of T.

     Both indices have to be corrected for agreement by chance if the
     sizes of the classes are not uniform.

_V_a_l_u_e:

     A list with components 

    diag: Percentage of data points in the main diagonal of 'tab'.

   kappa: 'diag' corrected for agreement by chance.

    rand: Rand index.

   crand: Rand index corrected for agreement by chance.

_A_u_t_h_o_r(_s):

     Friedrich Leisch

_R_e_f_e_r_e_n_c_e_s:

     J.~Cohen. A coefficient of agreement for nominal scales.
     Educational and Psychological Measurement, 20, 37-46, 1960.

     Lawrence Hubert and Phipps Arabie. Comparing partitions. Journal
     of Classification, 2, 193-218, 1985.

_S_e_e _A_l_s_o:

     'matchClasses'

_E_x_a_m_p_l_e_s:

     ## no class correlations: both kappa and crand almost zero
     g1 <- sample(1:5, size=1000, replace=TRUE)
     g2 <- sample(1:5, size=1000, replace=TRUE)
     tab <- table(g1, g2)
     classAgreement(tab)

     ## let pairs (g1=1,g2=1) and (g1=3,g2=3) agree better
     k <- sample(1:1000, size=200)
     g1[k] <- 1
     g2[k] <- 1

     k <- sample(1:1000, size=200)
     g1[k] <- 3
     g2[k] <- 3

     tab <- table(g1, g2)
     ## both kappa and crand should be significantly larger than before
     classAgreement(tab)

