fclustIndex              package:e1071              R Documentation

_F_u_z_z_y _C_l_u_s_t_e_r _I_n_d_e_x_e_s (_V_a_l_i_d_i_t_y/_P_e_r_f_o_r_m_a_n_c_e _M_e_a_s_u_r_e_s)

_D_e_s_c_r_i_p_t_i_o_n:

     Calculates the values of several fuzzy validity measures. The
     values of the indexes can be independently used in order to
     evaluate and compare clustering partitions or even to determine
     the number of clusters existing in a data set.

_U_s_a_g_e:

     fclustIndex(y, x, index = "all")

_A_r_g_u_m_e_n_t_s:

       y: An object of a fuzzy clustering result of class '"fclust"'

       x: Data matrix

   index: The validity measures used: '"gath.geva"', '"xie.beni"',
          '"fukuyama.sugeno"', '"partition.coefficient"',
          '"partition.entropy"', '"proportion.exponent"',
          '"separation.index"' and '"all"' for all the indexes.

_D_e_t_a_i_l_s:

     The validity measures and a short description of them follows,
     where N is the number of data points, u_{ij} the values of the
     membership matrix, v_j the centers of the clusters and k te number
     of clusters.

          *  *gath.geva*: Gath and Geva introduced 2 main criteria for
             comparing and finding optimal partitions based on the
             heuristics that a better clustering assumes clear
             separation between the clusters, minimal volume of the
             clusters and maximal number of data points concentrated in
             the vicinity of the cluster centroids. These indexes are
             only for the cmeans clustering algorithm valid. For the
             first, the ``fuzzy hypervolume'' we have:
             F_{HV}=sum_{j=1}^{c}{[det(F_j)]}^{1/2}, where
             F_j=frac{sum_{i=1}^N
             u_{ij}(x_i-v_j)(x_i-v_j)^T}{sum_{i=1}^{N}u_{ij}}, for the
             case when the defuzzification parameter is 2. For the
             second, the ``average partition density'':
             D_{PA}=frac{1}{k}sum_{j=1}^kfrac{S_j}{{[det(F_j)]}^{1/2}},
             where S_j=sum_{i=1}^N u_{ij}. Moreover, the ``partition
             density'' which expresses the general partition density
             according to the physical definition of density is
             calculated by: P_D=frac{S}{F_{HV}}, where
             S=sum_{j=1}^ksum_{i=1}^N u_{ij}.

          *  *xie.beni*: This index is a function of the data set and
             the centroids of the clusters. Xie and Beni explained this
             index by writing it as a ratio of the total variation of
             the partition and the centroids $(U,V)$ and the separation
             of the centroids vectors. The minimum values of this index
             under comparison support the best partitions.
             u_{XB}(U,V;X)=frac{sum_{j=1}^ksum_{i=1}^Nu_{ij}^2{||x_i-v_
             j||}^2}{N(min_{jneq l}{{||v_j-v_l||}^2})}

          *  *fukuyama.sugeno*: This index consists of the difference
             of two terms, the first combining the fuzziness in the
             membership matrix with the geometrical compactness of the
             representation of the data set via the prototypes, and the
             second the fuzziness in its row of the partition matrix
             with the distance from the $i$th prototype to the grand
             mean of the data. The minimum values of this index also
             propose a good partition.
             u_{FS}(U,V;X)=sum_{i=1}^{N}sum_{j=1}^k
             (u_{ij}^2)^q(||x_i-v_j||^2-||v_j-bar v||^2)

          *  *partition.coefficient*: An index which measures the
             fuzziness of the partition but without considering the
             data set itself. It is a heuristic measure since it has no
             connection to any property of the data. The maximum values
             of it imply a good partition in the meaning of a least
             fuzzy clustering. F(U;k)=frac{tr
             (UU^T)}{N}=frac{<U,U>}{N}=frac{||U||^2}{N}

             *  F(U;k) shows the fuzziness or the overlap of the
                partition and depends on kN elements. 

             *  1/k<=q F(U;k)<=q 1, where if F(U;k)=1 then U is a hard
                partition and if F(U;k)=1/k then U=[1/k] is the
                centroid of the fuzzy partion space P_{fk}. The
                converse is also valid.

          *  *partition.entropy*: It is a measure that provides
             information about the membership matrix without also
             considering the data itself. The minimum values imply a
             good partition in the meaning of a more crisp partition.
             H(U;k)=sum_{i=1}^{N} h(u_i)/N, where h(u)=-sum_{j=1}^{k}
             u_j,log _a (u_j) the Shannon's entropy.

             *  H(U;k) shows the uncertainty of a fuzzy partition and
                depends also on kN elements. Specifically, h(u_i) is
                interpreted as the amount of fuzzy information about
                the membership of x_i in k classes that is retained by
                column u_j. Thus, at U=[1/k] the most information is
                withheld since the membership is the fuzziest possible.

             *  0<=q H(U;k)<=q log_a(k), where for H(U;k)=0 U is a hard
                partition and for H(U;k)=log_a(k) U=[1/k].

          *  *proportion.exponent*: It is a measure P(U;k) of fuzziness
             adept to detect structural variations in the partition
             matrix as it becomes more fuzzier. A crisp cluster in the
             partition matrix can drive it to infinity when the
             partition coefficient and the partition entropy are more
             sensitive to small changes when approaching a hard
             partition. Its evaluation does not also involve the data
             or the algorithm used to partition them and its maximum
             implies the optimal partition but without knowing what
             maximum is a statistically significant maximum.

             *  0<=q P(U;k)<infty, since the [0,1] values explode to
                [0,infty) due to the natural logarithm. Specifically,
                P=0 when and only when U=[1/k], while Prightarrowinfty
                when any column of U is crisp. 

             *  P(U;k) can easily explode and it is good for partitions
                with large column maximums and at detecting structural
                variations. .in -3

             *  *separation.index (known as CS Index)*: This index
                identifies unique cluster structure with well-defined
                properties that depend on the data and a measure of
                distance. It answers the question if the clusters are
                compact and separated, but it rather seems
                computationally infeasible for big data sets since a
                distance matrix between all the data membership values
                has to be calculated. It also presupposes that a hard
                partition is derived from the fuzzy one.

                D_1(U;k;X,d)=min_{i+1,<=q,l,<=q,k-1}<=ft{min_{1,<=q,j,<
                =q,k}<=ft{frac{dis(u_j,u_l)}{max_{1<=q m<=q
                k}{dia(u_m)}}right}right}, where dia  is the diameter
                of the subset, dis the distance of two subsets, and d a
                metric. U is a CS partition of X Leftrightarrow D_1>1.
                When this holds then U is unique.

_V_a_l_u_e:

     Returns a vector with the validity measures values.

_A_u_t_h_o_r(_s):

     Evgenia Dimitriadou

_R_e_f_e_r_e_n_c_e_s:

     James C. Bezdek, _Pattern Recognition with Fuzzy Objective
     Function Algorithms_, Plenum Press, 1981, NY.
      L. X. Xie and G. Beni, _Validity measure for fuzzy clustering_,
     IEEE Transactions on Pattern Analysis and Machine Intelligence,
     vol. *3*, n. 8, p. 841-847, 1991.
      I. Gath and A. B. Geva, _Unsupervised Optimal Fuzzy Clustering_,
     IEEE Transactions on Pattern Analysis and Machine Intelligence,
     vol. *11*, n. 7, p. 773-781, 1989.
      Y. Fukuyama and M. Sugeno, _A new method of choosing the number
     of clusters for the fuzzy $c$-means method_, Proc. 5th Fuzzy Syst.
     Symp., p. 247-250, 1989 (in japanese).

_S_e_e _A_l_s_o:

     'cmeans'

_E_x_a_m_p_l_e_s:

     # a 2-dimensional example
     x<-rbind(matrix(rnorm(100,sd=0.3),ncol=2),
              matrix(rnorm(100,mean=1,sd=0.3),ncol=2))
     cl<-cmeans(x,2,20,verbose=TRUE,method="cmeans")
     resultindexes <- fclustIndex(cl,x, index="all")
     resultindexes   

