basicTextGatherer           package:RCurl           R Documentation

_C_u_m_u_l_a_t_e _t_e_x_t _a_c_r_o_s_s _c_a_l_l_b_a_c_k_s (_f_r_o_m _a_n _H_T_T_P _r_e_s_p_o_n_s_e)

_D_e_s_c_r_i_p_t_i_o_n:

     These functions create callback functions that can be used to with
     the libcurl engine  when it passes information to us when it is
     available as part of the HTTP response.

     'basicTextGatherer' is a generator function that returns a closure
     which is used to cumulate text provided in callbacks from the
     libcurl engine when it reads the response from an HTTP request.

     'debugGatherer' can be used with the 'debugfunction' libcurl
     option in a call and the associated 'update' function is called
     whenever libcurl has information about the header, data and
     general messages about the request.

     These functions return a list of functions. Each time one calls
     'basicTextGatherer' or 'debugGatherer', one gets a new, separate
     collection of functions.  However, each collection of functions
     (or instance) shares the variables across the functions and across
     calls. This allows them to store data persistently across the
     calls without using a global variable. In this way, we can have
     multiple instances of the collection of functions, with each
     instance updating its own local state and not interfering with
     those of the others.

     We use an S3 class named 'RCurlCallbackFunction' to indicate that
     the collection of funcions can be used as a callback. The 'update'
     function is the one that is actually used as the callback function
     in the CURL option. The 'value' function can be invoked to get the
     current state that has been accumulated by the 'update' function. 
     This is typically used when the request is complete. One can reuse
     the same collection of functions across different requests. The
     information will be cumulated. Sometimes it is convenient to reuse
     the object but reset the state to its original empty value, as it
     had been created afresh. The 'reset' function in the collection
     permits this.

     'multiTextGatherer' is used when we are downloading multiple URIs
     concurrently in a single libcurl operation.  This merely uses the
     tools of 'basicTextGatherer' applied to each of several URIs. See
     'getURIAsynchronous'.

_U_s_a_g_e:

     basicTextGatherer(txt = character(), max = NA, value = NULL)
     multiTextGatherer(uris)
     debugGatherer()

_A_r_g_u_m_e_n_t_s:

     txt: an initial character vector to start things. We allow this to
          be specified so that one can initialize the content.  

     max: if specified as an integer this controls  the total number of
          characters that will be read.  If more are read, the function
          tells libcurl to stop!

    uris: for 'multiTextGatherer', this is either the number or the
          names of the uris being downloaded and for which we need a
          separate writer function. 

   value: if specified, a function that is called when retrieving the
          text usually after the completion of the request and the
          processing of the response. This function can be used to
          convert the result into a different format, e.g. parse an XML
          document, read values from table in the text.

_D_e_t_a_i_l_s:

     This is called when the libcurl engine finds sufficient data on
     the stream from which it is reading the response. It cumulates
     these bytes and hands them to a C routine in this package which
     calls the actual gathering function (or a suitable replacement)
     returned as the 'update' component from this function.

_V_a_l_u_e:

     Both the 'basicTextGatherer' and 'debugGatherer' functions return
     an object of class 'RCurlCallbackFunction'. 'basicTextGatherer'
     extends this with the class 'RCurlTextHandler' and 
     'debugGatherer' extends this with the class 'RCurlDebugHandler'.
     Each of these has the same basic structure, being a list of 3
     functions. 

  update: the function that is called with the text from the callback
          routine and which processes this text by accumulating it into
          a vector

   value: a function that returns the text cumulated across the
          callbacks. This takes an argument 'collapse' (and additional
          ones) that are handed to 'paste'. If the value of  'collapse'
          is given as 'NULL', the vector of elements containing the
          different text for each callback is returned. This is
          convenient when debugging or if one knows something about the
          nature of the callbacks, e.g. the regular size that causes
          iit to identify records in a natural way. 

   reset: a function that resets the internal state to its original,
          empty value. This can be used to reuse the same object across
          requests but to avoid cumulating new input with the material
          from previous requests.


     'multiTextGatherer' returns a list with an element corresponding
     to each URI. Each element is an object obtained by calling
     'basicTextGatherer', i.e. a collection of 3 functions with shared
     state.

_A_u_t_h_o_r(_s):

     Duncan Temple Lang <duncan@wald.ucdavis.edu>

_R_e_f_e_r_e_n_c_e_s:

     Curl homepage <URL: http://curl.haxx.se>

_S_e_e _A_l_s_o:

     'getURL'

_E_x_a_m_p_l_e_s:

       txt = getURL("http://www.omegahat.org/RCurl/index.html", write = basicTextGatherer())

       h = basicTextGatherer()
       txt = getURL("http://www.omegahat.org/RCurl/index.html", write = h$update)
         # Cumulate across pages.
       txt = getURL("http://www.omegahat.org/index.html", write = h$update)

       headers = basicTextGatherer()
       txt = getURL("http://www.omegahat.org/RCurl/index.html", header = TRUE, headerfunction = headers$update)

          # Now read the headers.
       headers$value()
       headers$reset()

         # Debugging callback
       d = debugGatherer()
       x = getURL("http://www.omegahat.org/RCurl/index.html", debugfunction = d$update, verbose = TRUE)
       names(d$value())
       d$value()[["headerIn"]]

       uris = c("http://www.omegahat.org/RCurl/index.html", "http://www.omegahat.org/RCurl/philosophy.html")
       g = multiTextGatherer(uris)
       txt = getURIAsynchronous(uris,  write = g)
       names(txt)
       nchar(txt)

        # Now don't use names for the gatherer elements.
       g = multiTextGatherer(length(uris))
       txt = getURIAsynchronous(uris,  write = g)
       names(txt)
       nchar(txt)

     ## Not run: 
     Sys.setlocale(,"en_US.latin1")
     Sys.setlocale(,"en_US.UTF-8")
     uris = c("http://www.omegahat.org/RCurl/index.html", "http://www.omegahat.org/RCurl/philosophy.html")
     g = multiTextGatherer(uris)
     txt = getURIAsynchronous(uris,  write = g)
     ## End(Not run)

