matchControls             package:e1071             R Documentation

_F_i_n_d _m_a_t_c_h_e_d _c_o_n_t_r_o_l _g_r_o_u_p

_D_e_s_c_r_i_p_t_i_o_n:

     Finds controls matching the cases as good as possible.

_U_s_a_g_e:

     matchControls(formula, data = list(), subset, contlabel = "con",
                    caselabel = NULL, dogrep = TRUE, replace = FALSE)

_A_r_g_u_m_e_n_t_s:

 formula: A formula indicating cases, controls and the variables to be
          matched. Details are described below.

    data: an optional data frame containing the variables in the model.
           By default the variables are taken from the environment
          which 'matchControls' is called from.

  subset: an optional vector specifying a subset of observations to be
          used in the matching process.

contlabel: A string giving the label of the control group.

caselabel: A string giving the labels of the cases.

  dogrep: If 'TRUE', then 'contlabel' and 'contlabel' are matched using
          'grep', else string comparison (exact equality) is used.

 replace: If 'FALSE', then every control is used only once.

_D_e_t_a_i_l_s:

     The left hand side of the 'formula' must be a factor determining
     whether an observation belongs to the case or the control group. 
     By default, all observations where a grep of 'contlabel' matches,
     are used as possible controls, the rest is taken as cases.  If
     'caselabel' is given, then only those observations are taken as
     cases.  If 'dogrep = TRUE', then both 'contlabel' and 'caselabel'
     can be regular expressions.

     The right hand side of the 'formula' gives the variables that
     should be matched.  The matching is done using the 'daisy'
     distance from the 'cluster' package, i.e., a model frame is built
     from the formula and used as input for 'daisy'. For each case, the
     nearest control is selected. If 'replace = FALSE', each control is
     used only once.

_V_a_l_u_e:

     Returns a list with components 

   cases: Row names of cases.

controls: Row names of matched controls.

  factor: A factor with 2 levels indicating cases and controls (the
          rest is set to 'NA'.

_A_u_t_h_o_r(_s):

     Friedrich Leisch

_E_x_a_m_p_l_e_s:

     Age.case <- 40 + 5 * rnorm(50)
     Age.cont <- 45 + 10 * rnorm(150)
     Age <- c(Age.case, Age.cont)

     Sex.case <- sample(c("M", "F"), 50, prob = c(.4, .6), replace = TRUE)
     Sex.cont <- sample(c("M", "F"), 150, prob = c(.6, .4), replace = TRUE)
     Sex <- as.factor(c(Sex.case, Sex.cont))

     casecont <- as.factor(c(rep("case", 50), rep("cont", 150)))

     ## now look at the group properties:
     boxplot(Age ~ casecont)
     barplot(table(Sex, casecont), beside = TRUE)

     m <- matchControls(casecont ~ Sex + Age)

     ## properties of the new groups:
     boxplot(Age ~ m$factor)
     barplot(table(Sex, m$factor))

