# jawsome-core

A Clojure library of functions useful for JSON manipulation and analysis
of collections of JSON documents.

There are many parts to the JSON life cycle. While often JSON is
coming out of some sort of store or API that guarentees well formatted
JSON, it's not unusual that we need to be able to process JSON that
was generated by an application by very human developers, and, as
such, may be less than perfect, let alone parsable.  Recognizing this,
we view the life cycle of JSON as such:

1. Read from some sort of storage, possibly as raw text
2. Requires raw text processing in order to make it "readable"
3. Requires manipulation to ensure records are consistent
4. Needs to have a schema defined for the collection


#### Raw text cleanup

While JSON can originate in many places, it often can come directly
from a file. Files, often can be very dirty. They might contain
comments, they might contain unreadable lines of JSON, they might be
filled with errors from the process that had been generating the log
files- _ultimately the JSON data may be in some such state that it
cannot be parsed because for one reason or another its not valid
JSON._

##### Nested, escaped, JSON forms

Often times, applications that are writing out JSON data may have
components that are also writing out JSON data and returning them as
strings- these applications yield JSON records where some property's
value may be an escaped JSON string.  Having to write code to
specifically remove the escaping so that the JSON is parseable with
its structure intact is very tedious. While this may result in
parseable JSON records, we really want to be able to traverse into
those nested property paths. As such assume that we would want to
unescape any inner nested JSON.


##### Handling of Unicode characters

Many systems encode unicode a number of different ways. We need to
ensure that various encodings are readable so that we don't encounter
exceptions when attempting to parse the JSON, nor lose the data
captured by the unicode characters.

##### Extensibility

We also want to make sure this phase is extensible so that users can
supply their own raw text cleanups to ensure the files are parseable
or correct errors outside of JSON.

As such, users can provide a function that takes a string and returns
a string.


#### Transformations

Once the JSON data is parsable, it's much easier to work with as an in
memory map. As such, there are many common transformations that we
would like to offer through configuration since they are common
reoccuring patterns.

##### Property Renaming, Remapping and Pruning
  Renaming keys, altering the property paths of values, or pruning
  them all together.

##### String Value Reification (Nullify, Numberify, Boolify, Arrayify, Mapify) #####

  Parsing values to determine if they can have their type simplified
  from simply being a string. This step will also unbox stringified
  nested maps, nested arrays, inner escaped JSON, booleans, nulls,
  and numbers up to 19 digits.

##### Value Synonym Mapping
  Coverting synonyms for values to those literal values (e.g. `"-"` =>
  `null`, `"yes"` => `true`, `""` => `""`)

##### Null Pruning
  Removing keys whose corresponding values are `null`

##### Property path value-type filtering (i.e. type enforcement) #####

  For particular fields that we always expect to have a value in a
  particular range of values (e.g. strings, numbers, booleans), remove
  any fields that do not have an expected value.

##### Static value injection
We may want to inject static values for various reasons.


The thing about these transformations is that they are very sensitive
to the order in which they are performed. It wouldn't make sense to
remove all the values that are null, before we've applied a
transformation that maps all the synonyms for null values to
null. Similarly, it wouldn't make sense to walk each record and turn
all of the strings that appear as valid numbers or synonyms for
boolean values into numbers or booleans after
removing any key-value pairs that don't match a configured type
requirement for those fields.

This means all of these transformations themselves form their own
ordered pipeline.  As such, we propose that the most reasonable order
for these transformations to take place in is as follows:

1. String Value Reification (Numberify, Boolify, Arrayify, Mapify)
2. Value Synonym Mapping
3. Property Renaming, Remapping and Pruning
4. Null pruning
5. Property path value-type filtering (i.e. type enforcement)
6. Static value injection

###### Extensibility

Sandwiched around this pipeline, the end user can supply their own
transformations- enabling the user to do any preprocessing they like,
or any post processing, and by sandwiched, I do indeed mean that they
can provide a function that takes a clojure map and return a clojure
map before the 6 steps above run, and after the 6 steps above run.

It's important to note that *Every step in the pipeline accepts a map and returns a map, nil, or false*

This has two important implications:

1. If `nil` or `false` is returned, it's assumed that this element should be filtered our
and not make it into the final data set.
2. A user should be careful when supplying his own transformation
functions to ensure they do not return nils or falses when they mean
to take no action.


During this transformation phase, there are a number of places that a
user may want to inject their own custom transformations, as this is
the prime time to specify transformation functions that operate on and
return json map every step of the way.


-- Pre denorm
8. Property Name Hoisting
-- Post denorm
8. Property Name special character handling -- move out?



## License

Copyright © 2013 One Kings Lane

Distributed under the Eclipse Public License, the same as Clojure.
