getURIAsynchronous           package:RCurl           R Documentation

_D_o_w_n_l_o_a_d _m_u_l_t_i_p_l_e _U_R_I_s _c_o_n_c_u_r_r_e_n_t_l_y, _w_i_t_h _i_n_t_e_r-_l_e_a_v_e_d _d_o_w_n_l_o_a_d_s

_D_e_s_c_r_i_p_t_i_o_n:

     This function allows the caller to specify multiple URIs to
     download at the same time. All the requests are submitted and then
     the replies are processed as data becomes available on each
     connection. In this way, the responses are processed in an
     inter-leaved fashion, with a chunk from one response from one
     request being processed and then followed by a chunk from a
     different request.

     Downloading documents asynchronously involves some trade-offs. The
     switching between different streams, detecting when input is
     available on any of them involves a little more processing and so
     increases the consumption of CPU cycles. On the other hand, there
     is a potentially large saving of time when one considers total
     time to download. See <URL:
     http://www.omegahat.org/RCurl/concurrent.xml> for more details. 
     This is a common trade-off that arises in
     concurrent/parallel/asynchronous computing.

     'getURI' calls this function if more than one URI is specified and
     'async' is 'TRUE', the default in this case. One can also download
     the (contents of the) multiple URIs serially, i.e. one after the
     other using 'getURI' with a value of 'FALSE' for 'async'.

_U_s_a_g_e:

     getURIAsynchronous(url, ..., .opts = list(), write = multiTextGatherer(url),
                        curl = getCurlHandle(),
                         multiHandle = getCurlMultiHandle(), perform = Inf,
                          .encoding = integer())

_A_r_g_u_m_e_n_t_s:

     url: a character vector identifying the URIs to download.

     ...: named arguments to be passed to 'curlSetOpt' when creating
          each of the different 'curlHandle' objects.

   .opts: a named list or 'CURLOptions' object identifying the curl
          options for the handle. This is merged with the values of ...
          to create the actual options for the curl handle in the
          request.

   write: an object giving the functions or routines that are to be
          called when input is waiting  on the different HTTP response
          streams. By default, a separate callback function is
          associated with each input stream. This is necessary for the
          results to be meaningful as if we use a single reader, it
          will be called for all streams in a haphazard order and the
          content interleaved. One can do interesting things however
          using a single object. 

    curl: the prototypical curlHandle that is duplicated and used in in 

multiHandle: this is a curl handle for performing asynchronous
          requests. 

 perform: a number which specifies the maximum number of calls to
          'curlMultiPerform' that are to be made in this function call.
          This is typically either 0 for no calls or 'Inf' meaning
          process the requests until completion. One may find
          alternative values useful, such as 1 to ensure that the
          requests are dispatched. 

.encoding: an integer or a string that explicitly identifies the
          encoding of the content that is returned by the HTTP server
          in its response to our query. The possible strings are
          UTF-8 or ISO-8859-1 and the integers should be specified
          symbolically as  'CE_UTF8' and 'CE_LATIN1'. Note that, by
          default, the package attempts to process the header of the
          HTTP response to determine the encoding. This argument is
          used when such information is erroneous and the caller knows
          the correct encoding. 

_D_e_t_a_i_l_s:

     This uses 'curlMultiPerform' and the multi/asynchronous interface
     for libcurl.

_V_a_l_u_e:

     The return value depends on the run-time characteristics of the
     call. If the call merely specifies the URIs to be downloaded, the
     result is a named character vector. The names identify the URIs
     and the elements of the vector are the contents of the
     corresponding URI.

     If the requests are not performed or completed (i.e. 'perform' is
     zero  or too small a value to process all the chunks) a list with
     2 elements is returned. These elements are: 

multiHandle: the curl multi-handle, of class 'MultiCURLHandle-class'.
          This can be used in further calls to 'curlMultiPerform'

   write: the 'write' argument (after it was potentially expanded to a
          list). This can then be used to fetch the results of the
          requests when the requests are completed in the future. 

_A_u_t_h_o_r(_s):

     Duncan Temple Lang <duncan@wald.ucdavis.edu>

_R_e_f_e_r_e_n_c_e_s:

     Curl homepage <URL: http://curl.haxx.se>

_S_e_e _A_l_s_o:

     'getURL' 'getCurlMultiHandle' 'curlMultiPerform'

_E_x_a_m_p_l_e_s:

       uris = c("http://www.omegahat.org/RCurl/index.html", "http://www.omegahat.org/RCurl/philosophy.xml")
       txt = getURIAsynchronous(uris)
       names(txt)
       nchar(txt)

