# csv-export-bolt

[![Build Status](https://magnum.travis-ci.com/shareablee/cassandra-bolt.svg?token=NU2eMZobEmxbYse4grEj&branch=dev)](https://magnum.travis-ci.com/shareablee/cassandra-bolt)

A reusable Storm bolt for exporting data in csv format. Accumulates collections of vectors of data representing rows in CSV data and formats into CSV string using Hive escaping rules. 

Data is accumulated to temporary files named after the `partition-key` for 300 seconds (by default) before they are emitted in csv format and the temporary files are deleted. All data received with the partition key are guaranteed to end up in the same output. This allows the caller to specify how the data should be partitioned.

## Usage

Include the following in `project.clj`:

```
[csv-export-bolt "0.1.2"]
```

Input:

```
["partition-key" "coll"]
```

- `partition-key`: Name (string) of the file to accumulate data to. All data in `coll` will be accumulated to the same file
- `coll`: A collection of values that will be joined by commas and escaped into CSV format using Hive escaping rules


Output:

```
["partition-key" "csv-content"]
```

- `partition-key`: name of the file data was accumulated to
- `csv-content`: newline separated csv data

In a Topology:

```
(bolt-spec {"some-stream" :shuffle} (csv-export true))
```

## Changing the frequency of emitting files

```
(defcsvexport longer-csv-export 100)

;; In your topology
(bolt-spec {"some-stream" :shuffle} (longer-csv-export true))
```

## Changing the directory used for accumulated temp files

To overwrite the `jvm` default temporary directory, add the following to the storm config:

```
CSV_EXPORT_TEMP_FILE_PATH=/my/tmp/file/path
```

## Running Tests

```
lein test
```
