stability              package:clValid              R Documentation

_S_t_a_b_i_l_i_t_y _M_e_a_s_u_r_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     Calculates the stability measures the average proportion of
     non-overlap (APN), the average distance (AD), the average distance
     between means (ADM), and the figure of merit (FOM).

_U_s_a_g_e:

     stability(mat, Dist=NULL, del, cluster, clusterDel, method="euclidean")

_A_r_g_u_m_e_n_t_s:

     mat: The data matrix of the clustered observations

    Dist: The distance matrix (as a matrix or dist object) of the
          clustered observations.  If NULL then 'method' is used with
          'mat' to determine the distance matrix.

     del: An integer indicating which column was removed

 cluster: An integer vector indicating the cluster partitioning based
          on all the data

clusterDel: An integer vector indicating the cluster partitioning based
          on the data with column 'del' removed.

  method: The metric used to determine the distance matrix.  Not used
          if 'distance' is provided.

_D_e_t_a_i_l_s:

     The stability measures evaluate the stability of a clustering
     result by comparing it with the clusters obtained by removing one
     column at a time. These measures include the average proportion of
     non-overlap (APN), the average distance (AD), the average distance
     between means (ADM), and the figure of merit (FOM).  The APN, AD,
     and ADM are all based on the cross-classification table of the
     original clustering with the clustering based on the removal of
     one column.  The APN measures the average proportion of
     observations not placed in the same cluster under both cases,
     while the AD measures the average distance between observations
     placed in the same cluster under both cases and the ADM measures
     the average distance between cluster centers for observations
     placed in the same cluster under both cases.  The FOM measures the
     average intra-cluster variance of the deleted column, where the
     clustering is based on the remaining (undeleted) columns.  In all
     cases the average is taken over all the deleted columns, and all
     measures should be minimized. For details see the package
     vignette.

     NOTE: The 'stability' function only calculates these measures for
     the particular column specified by 'del' removed.  To get the
     overall scores, the user must average the measures corresponding
     to each removed column.

_V_a_l_u_e:

     Returns a numeric vector with the APN, AD, ADM, and FOM measures
     corresponding to the particular column that was removed.

_N_o_t_e:

     The main function for cluster validation is 'clValid', and users
     should call this function directly if possible.

     To get the overall values, the stability measures corresponding to
     each removed column should be averaged (see the examples below).

_A_u_t_h_o_r(_s):

     Guy Brock, Vasyl Pihur, Susmita Datta, Somnath Datta

_R_e_f_e_r_e_n_c_e_s:

     Datta, S. and Datta, S. (2003). Comparisons and validation of
     statistical clustering techniques for microarray gene expression
     data. Bioinformatics 19(4): 459-466.

_S_e_e _A_l_s_o:

     For a description of the function 'clValid' see 'clValid'.

     For a description of the class 'clValid' and all available methods
     see 'clValidObj' or 'clValid-class'.

     For additional help on the other validation measures see
     'connectivity',   'dunn', 'BSI', and  'BHI'.

_E_x_a_m_p_l_e_s:

     data(mouse)
     express <- mouse[1:25,c("M1","M2","M3","NC1","NC2","NC3")]
     rownames(express) <- mouse$ID[1:25]
     ## hierarchical clustering
     Dist <- dist(express,method="euclidean")
     clusterObj <- hclust(Dist, method="average")
     nc <- 4 ## number of clusters      
     cluster <- cutree(clusterObj,nc)

     stab <- matrix(0,nrow=ncol(express),ncol=4)
     colnames(stab) <- c("APN","AD","ADM","FOM")

     ## Need loop over all removed samples
     for (del in 1:ncol(express)) {
       matDel <- express[,-del]               
       DistDel <- dist(matDel,method="euclidean")
       clusterObjDel <- hclust(DistDel, method="average")
       clusterDel <- cutree(clusterObjDel,nc)
       stab[del,] <- stability(express, Dist, del, cluster, clusterDel)
     }
     colMeans(stab)

