rfImpute            package:randomForest            R Documentation

_M_i_s_s_i_n_g _V_a_l_u_e _I_m_p_u_t_a_t_i_o_n_s _b_y _r_a_n_d_o_m_F_o_r_e_s_t

_D_e_s_c_r_i_p_t_i_o_n:

     Impute missing values in predictor data using proximity from
     randomForest.

_U_s_a_g_e:

     ## Default S3 method:
     rfImpute(x, y, iter=5, ntree=300, ...)
     ## S3 method for class 'formula':
     rfImpute(x, data, ..., subset)

_A_r_g_u_m_e_n_t_s:

       x: A data frame or matrix of predictors, some containing 'NA's,
          or a formula.

       y: Response vector ('NA''s not allowed).

    data: A data frame containing the predictors and response.

    iter: Number of iterations to run the imputation.

   ntree: Number of trees to grow in each iteration of randomForest.

     ...: Other arguments to be passed to 'randomForest'.

  subset: A logical vector indicating which observations to use.

_D_e_t_a_i_l_s:

     The algorithm starts by imputing 'NA's using 'na.roughfix'.  Then
     'randomForest' is called with the completed data.  The proximity
     matrix from the randomForest is used to update the imputation of
     the 'NA's.  For continuous predictors, the imputed value is the
     weighted average of the non-missing obervations, where the weights
     are the proximities.  For categorical predictors, the imputed
     value is the category with the largest average proximity.  This
     process is iterated 'iter' times.

     Note: Imputation has not (yet) been implemented for the
     unsupervised case.  Also, Breiman (2003) notes that the OOB
     estimate of error from randomForest tend to be optimistic when run
     on the data matrix with imputed values.

_V_a_l_u_e:

     A data frame or matrix containing the completed data matrix, where
     'NA's are imputed using proximity from randomForest.  The first
     column contains the response.

_A_u_t_h_o_r(_s):

     Andy Liaw

_R_e_f_e_r_e_n_c_e_s:

     Leo Breiman (2003).  Manual for Setting Up, Using, and
     Understanding Random Forest V4.0. <URL:
     http://oz.berkeley.edu/users/breiman/Using_random_forests_v4.0.pdf
     >

_S_e_e _A_l_s_o:

     'na.roughfix'.

_E_x_a_m_p_l_e_s:

     data(iris)
     iris.na <- iris
     set.seed(111)
     ## artificially drop some data values.
     for (i in 1:4) iris.na[sample(150, sample(20)), i] <- NA
     set.seed(222)
     iris.imputed <- rfImpute(Species ~ ., iris.na)
     set.seed(333)
     iris.rf <- randomForest(Species ~ ., iris.imputed)
     print(iris.rf)

