BSI                 package:clValid                 R Documentation

_B_i_o_l_o_g_i_c_a_l _S_t_a_b_i_l_i_t_y _I_n_d_e_x

_D_e_s_c_r_i_p_t_i_o_n:

     Calculates the biological stability index (BSI) for a given
     statistical clustering partition and biological annotation.

_U_s_a_g_e:

     BSI(statClust, statClustDel, annotation, names = NULL, category = "all", goTermFreq = 0.05)

_A_r_g_u_m_e_n_t_s:

statClust: An integer vector indicating the statistical cluster
          partitioning

statClustDel: An integer vector indicating the statistical cluster
          partitioning based on one column removed

annotation: Either a character string naming the Bioconductor
          annotation package for mapping genes to GO categories, or a
          list with the names of the functional classes and the
          observations belonging to each class.

   names: An optional vector of names for the observations

category: Indicates the GO  categories to use for biological
          validation.  Can be one of "BP", "MF", "CC", or "all".

goTermFreq: What threshold frequency of GO terms to use for functional
          annotation.

_D_e_t_a_i_l_s:

     The BSI  inspects the consistency of clustering for genes with
     similar biological functionality.  Each sample is removed,  and
     the cluster membership for genes with similar functional
     annotation is compared with the cluster membership using all
     available samples. The BSI is in the range [0,1], with larger
     values corresponding to more stable clusters of the functionally
     annotated genes. For details see the package vignette.

     NOTE: The 'BSI' function only calculates these measures for one
     particular column removed.  To get the overall scores, the user
     must average the measures corresponding to each removed column.

_V_a_l_u_e:

     Returns the BSI value corresponding to the particular column that
     was removed.

_N_o_t_e:

     The main function for cluster validation is 'clValid', and users
     should call this function directly if possible.

     To get the overall BSI value, the BSI values corresponding to each
     removed column should be averaged (see the examples below).

_A_u_t_h_o_r(_s):

     Guy Brock, Vasyl Pihur, Susmita Datta, Somnath Datta

_R_e_f_e_r_e_n_c_e_s:

     Datta, S. and Datta, S. (2006). Methods for evaluating clustering
     algorithms for gene expression data using a reference set of
     functional classes. BMC Bioinformatics 7:397.

_S_e_e _A_l_s_o:

     For a description of the function 'clValid' see 'clValid'.

     For a description of the class 'clValid' and all available methods
     see 'clValidObj' or 'clValid-class'.

     For additional help on the other validation measures see
     'connectivity',   'dunn', 'stability', and 'BHI'.

_E_x_a_m_p_l_e_s:

     data(mouse)
     express <- mouse[1:25,c("M1","M2","M3","NC1","NC2","NC3")]
     rownames(express) <- mouse$ID[1:25]
     ## hierarchical clustering
     Dist <- dist(express,method="euclidean")
     clusterObj <- hclust(Dist, method="average")
     nc <- 4 ## number of clusters      
     cluster <- cutree(clusterObj,nc)

     ## first way - functional classes predetermined
     fc <- tapply(rownames(express),mouse$FC[1:25], c)
     fc <- fc[-match( c("EST","Unknown"), names(fc))]
     bsi <- numeric(ncol(express))
     ## Need loop over all removed samples
     for (del in 1:ncol(express)) {
       matDel <- express[,-del]               
       DistDel <- dist(matDel,method="euclidean")
       clusterObjDel <- hclust(DistDel, method="average")
       clusterDel <- cutree(clusterObjDel,nc)
       bsi[del] <- BSI(cluster, clusterDel, fc)
     }
     mean(bsi)

     ## second way - using Bioconductor
     if(require("Biobase") && require("annotate") && require("GO") &&
        require("moe430a")) {
       bsi <- numeric(ncol(express))
       for (del in 1:ncol(express)) {
         matDel <- express[,-del]               
         DistDel <- dist(matDel,method="euclidean")
         clusterObjDel <- hclust(DistDel, method="average")
         clusterDel <- cutree(clusterObjDel,nc)
         bsi[del] <- BSI(cluster, clusterDel, annotation="moe430a",
                         names=rownames(express), category="all")
       }
       mean(bsi)
     }

