RankAggreg            package:RankAggreg            R Documentation

_W_e_i_g_h_t_e_d _R_a_n_k _A_g_g_r_e_g_a_t_i_o_n _o_f _p_a_r_t_i_a_l _o_r_d_e_r_e_d _l_i_s_t_s

_D_e_s_c_r_i_p_t_i_o_n:

     Performs aggregation of ordered lists based on the ranks
     (optinally with additional weights) via the Cross-Entropy Monte
     Carlo algorithm or the Genetic Algorithm.

_U_s_a_g_e:

     RankAggreg(x, k, index.weights = NULL, use.weights = FALSE, method = c("CrossEntropy", "GeneticAlgorithm"), 
     distance = c("Spearman", "Kendall"), rho = 0.01, weight = 0.5, N = 5 * k * length(unique(sort(as.vector(x)))), 
     error = 0.001, maxIter = 100, popSize = 100, CP = 1, MP = 0.001, informative = FALSE, v1 = NULL, verbose = TRUE)

_A_r_g_u_m_e_n_t_s:

       x: a matrix of ordered lists to be combined (lists must be in
          rows)

       k: size of the top-k list

index.weights: a matrix of scores (weights) to be used in the
          aggregation process. Weights in  each row must be ordered
          either in decreasing or increasing order and must correspond
          to the elements in x

use.weights: boolean, if weights are to be used

  method: method to be used to perform rank aggregation: Cross Entropy
          Monte Carlo or Genetic Algorithm (GA)

distance: distance to be used which "measures" the similarity of
          ordered lists

     rho: (rho*N) is the "quantile" of candidate lists sorted by the
          function values. Used only by the Cross-Entropy algorithm

  weight: a learning factor used in the probability update procedure of
          the algorithm. Used only by the Cross-Entropy algorithm

       N: a number of samples to be generated by the MCMC; default:
          5nk, where n is the number of  unique elements in x. Used
          only by the Cross-Entropy algorithm

   error: convergence criteria for the Cross-Entropy algorithm

 maxIter: the maximum iterations allowed; can be used as a stopping
          criteria for the Genetic Algorithm

 popSize: population size in each generation of Genetic Algorithm; has
          no effect if method="CrossEntropy"

      CP: Cross-over probability for the GA; the default value is 1. It
          is usually greater than 0.5.

      MP: Mutation probability. This value should be small and the
          number of mutations in the population of size popSize and the
          number of features k is computed as popSize*k*MP. Used only
          by the GA

informative: boolean, if informative is TRUE, use v1 as the initial
          probability matrix

      v1: optional, can be used to specify the initial probability
          matrix; if v1=NULL, the initial probability matrix is set to
          1/n, where n is the number of unique elements in x

 verbose: boolean, if console output is to be displayed at each
          iteration

_D_e_t_a_i_l_s:

     The function performs rank aggregation via the Cross-Entropy Monte
     Carlo algorithm or the Genetic Algorithm. Both approaches can and 
     should be used when k is relatively large (k > 10). If k is small,
     one can enumerate all possible candidate lists and find the
     minimum directly using the BruteAggreg function available in this
     package.

     The Cross-Entropy Monte Carlo algorithm is an iterative procedure
     for solving difficult combinatorial  problems in which it is
     computationally not feasable to find the solution directly. In the
     context of  rank aggregation, the algorithm searches for the
     "super"-list which is as close as possible to the ordered lists in
     x. We use either the Spearman footrule distance or the Kendall's
     tau to measure the "closeness" of any two ordered lists (or
     modified by us the weighted versions of these distances). Please
     refer to the paper  in the references for further details.

     The Genetic Algorithm requires setting CP and MP parameters which
     effect the degree of "evolution" in the population. If both CP and
     MP are small, the algorithms is very conservative and may take a
     long time to search the solution space of all ordered candidate
     lists. On the other hand, setting CP and MP (especially MP) large
     will introduce a large number of mutations in the population which
     can result in a local optima. Two convergence criteria are used to
     stop the algorithm. The first being the repetition of the same
     minimum value of the objective function in five consecutive
     iterations. If that condition is not met in maxIter number of
     iterations, the algorithm will stop regardless of the  first
     condition.

_V_a_l_u_e:

top.list: Top-k aggregated list

optimal.value: the minimum value of the objective function
          corresponding to the top-k list

sample.size: the number of samples generated by the MCMC at each
          iteration

num.iter: the number of iterations until convergence

  method: which algorithm was used

distance: which distance was used

_A_u_t_h_o_r(_s):

     Vasyl Pihur, Somnath Datta, Susmita Datta

_R_e_f_e_r_e_n_c_e_s:

     Pihur, V., Datta, S., and Datta, S. (2007) "Weighted rank
     aggregation of cluster validation  measures: a Monte Carlo
     cross-entropy approach" Bioinformatics, 23(13):1607-1615

_S_e_e _A_l_s_o:

     'BruteAggreg'

_E_x_a_m_p_l_e_s:

     # rank aggregation without weights
     x <- matrix(c("A", "B", "C", "D", "E",
             "B", "D", "A", "E", "C",
             "B", "A", "E", "C", "D",
             "A", "D", "B", "C", "E"), byrow=TRUE, ncol=5)

     toplist <- RankAggreg(x, 5, rho=.1)

     # weighted rank aggregation
     set.seed(100)
     w <- matrix(rnorm(20), ncol=5)
     w <- t(apply(w, 1, sort))

     # using the Cross-Entropy Monte-Carlo algorithm
     toplistS <- RankAggreg(x, 5, rho=.1, index.weights=w, use.weights=TRUE)
     toplistK <- RankAggreg(x, 5, rho=.1, index.weights=w, use.weights=TRUE, distance="Kendall")

     # using the Genetic algorithm
     toplistS <- RankAggreg(x, 5, rho=.1, index.weights=w, use.weights=TRUE, method="Genetic")
     toplistK <- RankAggreg(x, 5, index.weights=w, use.weights=TRUE, distance="Kendall", method="Genetic")

