inbagg                 package:ipred                 R Documentation

_I_n_d_i_r_e_c_t _B_a_g_g_i_n_g

_D_e_s_c_r_i_p_t_i_o_n:

     Function to perform the indirect bagging and subagging.

_U_s_a_g_e:

     inbagg.data.frame(formula, data, pFUN=NULL, 
       cFUN=list(model = NULL, predict = NULL, training.set = NULL), 
       nbagg = 25, ns = 0.5, replace = FALSE, ...)

_A_r_g_u_m_e_n_t_s:

 formula: formula. A 'formula' specified as 'y~w1+w2+w3~x1+x2+x3'
          describes how to model the intermediate variables 'w1, w2,
          w3' and the response variable 'y', if no other formula is
          specified by the elements of 'pFUN' or in 'cFUN'

    data: data frame of explanatory, intermediate and response
          variables.

    pFUN: list of lists, which describe models for the intermediate
          variables, details are given below.

    cFUN: either a fixed function with argument 'newdata' and returning
          the class membership by default, or a list specifying a
          classifying model, similar to one element of 'pFUN'. Details
          are given below.

   nbagg: number of bootstrap samples.

      ns: proportion of sample to be drawn from the learning sample. By
          default, subagging with 50% is performed, i.e. draw 0.5*n out
          of n without replacement.

 replace: logical. Draw with or without replacement.

     ...: additional arguments (e.g. 'subset').

_D_e_t_a_i_l_s:

     A given data set is subdivided into three types of variables:
     explanatory, intermediate and response variables.

     Here, each specified intermediate variable is modelled separately
     following 'pFUN', a list of lists with elements specifying an
     arbitrary number of models for the intermediate variables and an
     optional element 'training.set = c("oob", "bag", "all")'. The
     element 'training.set' determines whether, predictive models for
     the intermediate are calculated based on the out-of-bag sample
     ('"oob"'), the default, on the bag sample ('"bag"') or on all
     available observations ('"all"'). The elements of 'pFUN',
     specifying the models for the intermediate variables are lists as
     described in 'inclass'. Note that, if no formula is given in these
     elements, the functional relationship of 'formula' is used.

     The response variable is modelled following 'cFUN'. This can
     either be a fixed classifying function as described in Peters et
     al. (2003) or a list, which specifies the  modelling technique to
     be applied. The list contains the arguments 'model' (which model
     to be fitted), 'predict' (optional, how to predict), 'formula'
     (optional, of type 'y~w1+w2+w3+x1+x2' determines the variables the
     classifying function is based on) and the optional argument
     'training.set = c("fitted.bag", "original", "fitted.subset")'
     specifying whether the classifying function is trained on the
     predicted observations of the bag sample ('"fitted.bag"'), on the
     original observations ('"original"') or on the predicted
     observations not included in a defined subset ('"fitted.subset"').
     Per default the formula specified in 'formula' determines the
     variables, the classifying function is based on.

     Note that the default of 'cFUN = list(model = NULL, training.set =
     "fitted.bag")' uses the function 'rpart' and the predict function
     'predict(object, newdata, type = "class")'.

_V_a_l_u_e:

     An object of class '"inbagg"', that is a list with elements 

  mtrees: a list of length 'nbagg', describing the prediction models
          corresponding to each bootstrap sample. Each element of
          'mtrees' is a list with elements 'bindx' (observations of bag
          sample), 'btree' (classifying function of bag sample) and
          'bfct' (predictive models for intermediates of bag sample).

       y: vector of response values.

       W: data frame of intermediate variables.

       X: data frame of explanatory variables.

_A_u_t_h_o_r(_s):

     Andrea Peters <Peters.Andrea@imbe.imed.uni-erlangen.de>

_R_e_f_e_r_e_n_c_e_s:

     David J. Hand, Hua Gui Li, Niall M. Adams (2001), Supervised
     classification with structured class definitions. _Computational
     Statistics & Data Analysis_ *36*, 209-225.

     Andrea Peters, Berthold Lausen, Georg Michelson and Olaf Gefeller
     (2003), Diagnosis of glaucoma by indirect classifiers. _Methods of
     Information in Medicine_ *1*, 99-103.

_S_e_e _A_l_s_o:

     'rpart', 'bagging', 'lm'

_E_x_a_m_p_l_e_s:

     y <- as.factor(sample(1:2, 100, replace = TRUE))
     W <- mvrnorm(n = 200, mu = rep(0, 3), Sigma = diag(3))
     X <- mvrnorm(n = 200, mu = rep(2, 3), Sigma = diag(3))
     colnames(W) <- c("w1", "w2", "w3") 
     colnames(X) <- c("x1", "x2", "x3") 
     DATA <- data.frame(y, W, X)

     pFUN <- list(list(formula = w1~x1+x2, model = lm, predict = mypredict.lm),
     list(model = rpart))

     inbagg(y~w1+w2+w3~x1+x2+x3, data = DATA, pFUN = pFUN)

