logitboost            package:LogitBoost            R Documentation

_L_o_g_i_t_B_o_o_s_t

_D_e_s_c_r_i_p_t_i_o_n:

     An implementation of the LogitBoost classification algorithm with
     decision stumps as weak learners. Additionally, a feature
     preselection method for handling datasets with many explanatory
     variables and and estimation of the stopping parameter via v-fold
     cross validation are provided.

_U_s_a_g_e:

     logitboost(xlearn, ylearn, xtest, mfinal, presel = 0, estimate = 0,
     verbose = FALSE)

_A_r_g_u_m_e_n_t_s:

  xlearn: A matrix, whose n rows contain the training instances.

  ylearn: A vector of length n containing the class labels from
          individuals of K different classes. The labels need to be
          coded by consecutive integers from 0 to (K-1).

   xtest: A matrix, whose rows contain the test instances.

  mfinal: An integer, describing the number of iterations for which
          boosting should be run. 

  presel: An integer, giving the number of features to be used for
          classification. If presel=0, no feature preselection is
          carried out.

estimate: An integer, specifying the v of an additional, internal
          v-fold cross validation on the respective training data for
          stopping parameter estimation. Please note that this is
          (especially for larger values of 'estimate') extremly time
          consuming. The default value of estimate=0 means no stopping
          parameter estimation.

 verbose: Logical, indicates whether comments should be given.

_V_a_l_u_e:

   probs: Array, whose rows contain out of sample probabilities that
          the class labels are predicted as 1, for every boosting
          iteration. For multiclass problems, the third dimension of
          the array are the probabilites for the K binary
          one-against-all partitions of the data.

loglikeli: Array, contains the log-likelihood across the training
          instances for determination of the stopping parameter if
          estimate>0. For multiclass problems, the third dimension of
          the array contains the values for the K binary
          one-against-all partitions of the data.

_A_u_t_h_o_r(_s):

     Marcel Dettling

_R_e_f_e_r_e_n_c_e_s:

     See "Boosting for Tumor Classification of Gene Expression Data",
     Dettling and Buhlmann (2002), available on the web page
     http://stat.ethz.ch/~dettling/boosting.html

_S_e_e _A_l_s_o:

     'crossval', 'summarize'

_E_x_a_m_p_l_e_s:

     data(leukemia)

     ## Dividing the leukemia dataset into training and test data
     xlearn <- leukemia.x[c(1:20, 34:38),]
     ylearn <- leukemia.y[c(1:20, 34:38)]
     xtest  <- leukemia.x[21:33,]
     ytest  <- leukemia.y[21:33]

     ## An example without stopping parameter estimation
     fit <- logitboost(xlearn, ylearn, xtest, mfinal=100, presel=75, verbose=TRUE)
     summarize(fit, ytest)

     ## Now with stopping parameter estimation by 4-fold cross validation
     fit <- logitboost(xlearn, ylearn, xtest, mfinal=100, pre=75, esti=4, verb=TRUE)
     summarize(fit, ytest)

