inclass                package:ipred                R Documentation

_I_n_d_i_r_e_c_t _C_l_a_s_s_i_f_i_c_a_t_i_o_n

_D_e_s_c_r_i_p_t_i_o_n:

     A framework for the indirect classification approach.

_U_s_a_g_e:

     ## S3 method for class 'data.frame':
     inclass(formula, data, pFUN = NULL, cFUN = NULL, ...)

_A_r_g_u_m_e_n_t_s:

 formula: formula. A 'formula' specified as 'y~w1+w2+w3~x1+x2+x3'
          models each intermediate variable 'w1, w2, w3' by
          'wi~x1+x2+x3' and the response by 'y~w1+w2+w3' if no other
          formulas are given in 'pFUN' or 'cFUN'.

    data: data frame of explanatory, intermediate and response
          variables.

    pFUN: list of lists, which describe models for the intermediate
          variables, see below for details.

    cFUN: either a function or a list which describes the model for the
          response variable. The function has the argument 'newdata'
          only.

     ...: additional arguments, passed to model fitting of the response
          variable.

_D_e_t_a_i_l_s:

     A given data set is subdivided into three types of variables:
     those to be used predicting the class (explanatory variables)
     those to be used defining the class (intermediate variables) and
     the class membership variable itself (response variable).
     Intermediate variables are modelled based on the explanatory
     variables, the class membership variable is defined on the
     intermediate variables.

     Each specified intermediate variable is modelled separately 
     following 'pFUN' and a formula specified by 'formula'. 'pFUN' is a
     list of lists, the maximum length of 'pFUN' is the number of
     intermediate variables. Each element of 'pFUN' is a list with
     elements:
      'model' -  a function with arguments 'formula' and 'data'; 
      'predict' - an optional function with arguments 'object, newdata'
     only,  if 'predict' is not specified, the predict method of
     'model' is used; 
      'formula' - specifies the formula for the corresponding 'model'
     (optional), the formula described in 'y~w1+w2+w3~x1+x2+x3' is used
     if no other is specified. 

     The response is classified following 'cFUN', which is either a
     fixed function or a list as described below. The determined
     function 'cFUN' assigns the intermediate (and explanatory)
     variables to a certain class membership, the list 'cFUN' has the
     elements 'formula, model, predict' and 'training.set'. The
     elements 'formula, model, predict' are structured as described by
     'pFUN', the described model is trained on the original
     (intermediate variables) if 'training.set="original"' or if
     'training.set = NULL', on the fitted values if 'training.set =
     "fitted"' or on observations not included in a specified subset if
     'training.set = "subset"'. 


     A list of prediction models corresponding to each  intermediate
     variable, a predictive function for the response, a list of
     specifications for the intermediate and for the response are
     returned. 
      For a detailed description on indirect classification see Hand et
     al. (2001).

_V_a_l_u_e:

     An object of class 'inclass', consisting of a list of  

model.intermediate: list of fitted models for each intermediate
          variable.

model.response: predictive model for the response variable.

para.intermediate: list, where each element is again a list and
          specifies the model for each intermediate variable.

para.response: a list which specifies the model for response variable.

_A_u_t_h_o_r(_s):

     Andrea Peters <Peters.Andrea@imbe.imed.uni-erlangen.de>

_R_e_f_e_r_e_n_c_e_s:

     David J. Hand, Hua Gui Li, Niall M. Adams (2001), Supervised
     classification with structured class definitions. _Computational
     Statistics & Data Analysis_ *36*, 209-225.

     Andrea Peters, Berthold Lausen, Georg Michelson and Olaf Gefeller
     (2003), Diagnosis of glaucoma by indirect classifiers. _Methods of
     Information in Medicine_ *1*, 99-103.

_S_e_e _A_l_s_o:

     'bagging', 'inclass'

_E_x_a_m_p_l_e_s:

     data(Smoking)
     # Set three groups of variables:
     # 1) explanatory variables are: TarY, NicY, COY, Sex, Age
     # 2) intermediate variables are: TVPS, BPNL, COHB
     # 3) response (resp) is defined by:

     classify <- function(data){
       data <- data[,c("TVPS", "BPNL", "COHB")]
       res <- t(t(data) > c(4438, 232.5, 58))
       res <- as.factor(ifelse(apply(res, 1, sum) > 2, 1, 0))
       res
     }

     response <- classify(Smoking[ ,c("TVPS", "BPNL", "COHB")])
     smoking <- data.frame(Smoking, response)

     formula <- response~TVPS+BPNL+COHB~TarY+NicY+COY+Sex+Age

     inclass(formula, data = smoking, pFUN = list(list(model = lm, predict =
     mypredict.lm)), cFUN = classify)

