# jawsome-core

A Clojure library of functions useful for JSON manipulation and analysis
of collections of JSON documents.

## Not just JSON: read-phase vs transform-phase

When we say "JSON", what we really mean is "hierarchical / possibly
nested data structures, including but not limited to JSON and
XML". Another way of saying this is "data structure which can be
parsed into a Clojure map", or "tree-like data structures".

Example of non-nested JSON:

`{"key": "value"}`

Example of nested JSON:

`{"key": \"{\"nested_key\": \"nested_value\"}\"}`

Example of non-nested XML:

`<my-elem> nested-value </my-elem>`

Example of nested XML:

`<my-elem> <my-nested-elem> nested-value </my-nested-elem> </my-elem>`

How can jawsome-core be useful for arbitrary "hierarchical" data
structures in general? By virtue of its two-phase composition:

1. First are the *read phase* functions, which are useful for
   operating on raw text. The signature of the read-phase overall is,
   it takes raw text (JSON, XML, etc.) and returns a Clojure map.
2. Second are the *transform phase* functions, which are useful for
   operating on Clojure maps. The signature of the transform-phase
   overall is, it takes a Clojure map and returns a Clojure map.

Factoring the read-phase out is the powerful design choice here. It
means that you can write a JSON-reader, an XML-reader, or any other
arbitrary data structure reader, and then share all the transform
phase functionality for free.

# The life cycle of a (JSON) record

Let's walk through the life cycle of a hypothetical JSON record, so
you can get a sense for what kind of transformations we're talking
about here.

We will use a real-life example, so it's not as contrived as it seems
;)

## Read-phase (Raw text cleanup)

While JSON can originate in many places, it often comes directly
from a file. Files often can be very dirty. They might contain
comments, they might contain unreadable lines of JSON, they might be
filled with errors from the process that had been generating the log
files. Maybe the file had a few lines corrupted, or a couple of ASCII
unprintable characters slipped in for unknown reasons. Maybe the JSON
was generated by an application by very human developers, and is
slightly malformed... _Ultimately the JSON data may be in a state that
it cannot be parsed, because for one reason or another its not valid
JSON._

### Nested escaped JSON forms

Often times, applications that are writing out JSON data may have
components that are also writing out JSON data and returning them as
strings- these applications yield JSON records where some property's
value may be an escaped JSON string.  Having to write code to
specifically remove the escaping so that the JSON is parseable with
its structure intact is very tedious. While this may result in
parseable JSON records, we really want to be able to traverse into
those nested property paths. As such assume that we would want to
unescape any inner nested JSON.

### Handling of Unicode characters

Many systems encode unicode a number of different ways. We need to
ensure that various encodings are readable so that we don't encounter
exceptions when attempting to parse the JSON, nor lose the data
captured by the Unicode characters.

### Extensibility

We also want to make sure this phase is extensible so that users can
supply their own raw text cleanups to ensure the files are parseable
or correct errors outside of JSON.

The general idea is, users can provide a function that takes a string
and returns a string (or, theoretically and probably less frequently,
a function that takes a string representing their custom data
structure and returns a clojure map).

In practice, this extensibility occurs at the pipeline-level in
jawsome-dsl... see the jawsome-dsl README for more details.


## Transform-phase (Meatier, more meaningful transformations)

Once the JSON data is parsable, it's much easier to work with as an
in-memory map. Here are many common transformations that we offer,
since they are commonly reoccuring patterns.

NB: I will use the phrases "key", "path", and "property path"
interchangeably in the discussion below. If I were being pedantic, I
would describe the difference using Clojure syntax: `(get some-map
some-key)`, vs `(get-in some-map some-path)`.

### Property hoisting, renaming, remapping, and pruning

"Hoisting" nested maps up one level of nesting. e.g. If you take
the Clojure map `{:k1 "top-level-value", :k2 {:nested_k1 "foo",
:nested_k2 "bar"}}` and hoist `:k2`, you would get `{:k1
"top-level-value", :nested_k1 "foo", :nested_k2 "bar"}`. Currently
(2014/04/21), you can hoist a property one level at a time, supplying
an optional prefix/suffix to concatenate to the keys in the map being
hoisted. See `make-hoist`.

Renaming keys or altering the property paths of values, on an
individual-path level. (e.g. you could just remap `[:k2 :nested_k1]`
to the top-level, while still leaving `:nested_k2` nested). See
`make-property-remapper`.

You can prune property paths out of maps entirely (i.e. dissoc the
key/path from the record). See `make-prune-paths` (specifying the list
of paths to dissoc-in) or `make-keep-paths` (specifying the list of
paths to retain, dissocing-in all other paths in the record).

### String value reification (Nullify, Numberify, Boolify, Arrayify, Mapify)

Parsing values to determine if they can have their type simplified
from simply being a string. This step will also unbox stringified
nested maps, nested arrays, inner escaped JSON, booleans, nulls,
and numbers up to 19 digits.

e.g. if you reify the values `{:k1 "42", :k2 "false", :k3 "null"}` you
would get `{:k1 42, :k2 false, :k3 null}`.

See `make-reify-values`.

### Value-synonym mapping (or "synonym-translation")

Converting synonyms for values to those literal values (e.g. `"-"` =>
`null`, `"yes"` => `true`). You can do this globally for every path in
every record (see `make-value-synonymizer`) or on a per-path basis
(see `make-path-specific-synonymizer`).

### Static-value injection

Associng static values into every record. E.g. if you are using
jawsome for batch-processing, you may want to inject a batch-id into
every record. See `static-value-merge-fn` (injects values, overwriting
existing values in the record) and `default-value-merge-fn` (injects
values only if the path does not yet appear in the record).

### Logging

Logging the record passed in and returning it unmodified. See `make-log`.

### Value-based pruning

Removing paths whose corresponding values are `null`. See
`make-prune-nils`.

Removing paths whose corresponding values don't match the configured
expected type (e.g. strings, numbers, booleans). See
`make-value-type-filter`. (This is sometimes referred to as "type
enforcement".)

Removing paths whose corresponding values are the empty string
`""`. See `make-remove-empty-string-fields`.

### Sanitize field names

Renaming all keys/paths to consist only of alphanumeric + underscore
characters. See `make-sanitize-field-names`.

### Denormalize

Two parts to this:

1. Flattening nested maps, joining keys in the path with the string
   "_dot_" (or whatever other joiner you want). e.g. `{:k1
   {:nested_now "value"}}` => `{:k1_dot_nested_now "value"}`
2. Denormalizing on arrays. e.g. `{:a "foo", :b
   ["x" "y"]}` => `{:a "foo", :b_idx 0, :b_arr "x"}, {:a "foo", :b_idx
   1, :b_arr "y"}`

### Order

The thing about these transformations is that they are very sensitive
to the order in which they are performed. It wouldn't make sense to
remove all the values that are null, before we've applied a
transformation that maps all the synonyms for null values to
null. Similarly, it wouldn't make sense to walk each record and turn
all of the strings that appear as valid numbers or synonyms for
boolean values into numbers or booleans AFTER removing any key-value
pairs that don't match a configured type requirement for those fields.

Anyway. This is just a library of useful functions -- the order of
transformations is enforced in jawsome-dsl, so head over to that README
for details on how we (attempt to) make your life easier by imposing a
partial ordering on these xforms :)

### Extensibility

Just like the read-phase, we want to allow users to supply their own
custom xforms (i.e. functions that take a clojure map and return a
clojure map). Again, that's handled in jawsome-dsl -- this is just a
library.

## License

Copyright © 2013 One Kings Lane

Distributed under the Eclipse Public License, the same as Clojure.
