# dbwalk

A Clojure library for inserting and querying nested maps to/from a database or other sources of relational data.

NOTE: This library is in alpha and is not to be used for anything. 
If you want to participate in its development, go to #clojure on Slack or contact Sami Sandqvist.

At the moment, it can do a deep insert and queries by following foreign keys.
Automatic database schema detection is supported for postgresql.

## Overview

Dbwalk is based on the idea from [Objection.js](https://www.vincit.fi/en/blog/nested-eager-loading-and-inserts-with-objection-js/), namely that given a set of database rows, 
it is possible to extract primary/foreign key columns and fetch a set of related rows from another table 
using one WHERE *foreign key column* IN (*list of primary key columns values*) or vice versa. 

As this is a Clojure library, we do not use Objects and therefore dbwalk is not an ORM. 
The relations in dbwalk are the foreign key relations of the database tables themselves and there is no abstraction over the data contained in the database.

### Core idea, simplified

The "main" namespace is v.d.crawler. It contains functions that take a database description and a nested query, read the database, and 
produce a [Loom](https://github.com/aysylu/loom) graph of the results.

The database description is a listing of all primary and foreign keys in the database. See v.d.schema-detect and v.d.config for postgreSQL autodetection and further formatting. 
This is a two-part process so that implementing schema autodetection for other databases is simpler.

The nested query is a tree in which every node corresponds to a single SELECT to that node's table.

The result (a Loom graph) contains a node for each row read from the database. The edges in the graph contain the database relations that were used to travel between tables and rows.

Most of the codebase consists of helper functions for generating the input data and formatting the output graph. Their use is encouraged but not required, as everything is just data.
The user is expected to compose their own API. See v.d.api.sweet for examples. Note that the schema detection should be run only once in actual use.

The idea of a Datasource can be simply thought of as "the database" if only one database is used. See v.d.api.sweet/filtered->graph for how to write a single-database API function.

## Concepts in more detail

### QueryTree

Dbwalk's query format is a tree of SQL queries, each to one table.

For each level in the query tree, :query selects properties from that level's table and :eager contains the tables to branch the query to.

The query functions take HoneySQL queries. The FROM clause must have only one table, as it is currently used to determine the next table to walk to.
You can use WHERE, ORDER BY and pretty much everything provided by HoneySQL. 
Selecting only some columns is allowed, but dbwalk will automatically add all columns required for the relations in the query tree.

    {:query (-> (s/select :*) 
                (s/from :owners))              ;; Select from "owners"
    :eager [{:query (-> (s/select :*)          
                        (s/from :items))]}     ;; Then query "items" and match owners to items.

The above query is presented without the required datasource information, as it should be added using vincit.dbwalk.utils/with-datasource. 
See the examples in utils_test.

A simple helper for generating query trees can be found in v.d.api.vector. It transforms a nested vector into a query tree.

Inside a vector,

* the first keyword is the target table
* the first set of keywords contains the columns to select (default :*)
* the first map contains a base query to start from. SELECT and FROM clauses will be overwritten. Note that HoneySQL's helpers return a map.
* contained vectors are handled recursively and placed under the :eager key
    
As an example,

    [:my-table #{:id} 
     [:another-table #{:name}]
     [:third-table #{:foo} (sql/where [:= :name "bar"])]]

becomes

     {:query {:select (:id), :from (:my-table)},
                  :eager [{:query {:select (:name)
                                   :from   (:another-table)}}
                          {:query {:select (:foo)
                                   :from   (:third-table)}}]})))
                                   :where  [:= :name "bar"])

### QueryPath

The problem with the QueryTree format is that it complects the data you want to get with the path used to gather the data. 
When using SQL JOINs, the type of join affects the format and content of your result data. 
With the dbwalk method of getting data, when you go from table A to B you will always get 0..n rows of B in relation to one row from A.

If you consider the structure of a database as a non-directional graph, a QueryPath is one of its directed acyclic subgraphs. 
As duplicate nodes in a QueryPath are also forbidden, a QueryPath is in fact a tree. 

QueryPaths can be generated in the REPL using the helpers in v.d.api.subgraph. 
It also contains a function which will generate a QueryPath for the entire database structure using a breadth-first search starting from a given table.

The primary advantage of using QueryPaths is that in a "normal" database there are only a few directions of travel that produce meaningful data.
If, for example, your database queries join tables A -> B -> C -> D for one query and C -> D for another, you are using only one direction of travel.
Using a QueryPath, you can just state that the path is A -> B -> C -> D and request data from tables A and D in one query 
and data from C and D in another without repeating the path used to gather the data.

In practice, a QueryPath and a list of required data such as [:A/* :C/id :C/name] are enough to generate a query tree
from the smallest subtree of the QueryPath that still covers the requested tables. However, the helpers will not 
return a query tree unless one of the given tables is at the root of the minimal subtree. 

See also Output formats/Filtered for the companion output formatter. Implementations for both are in v.d.output.filtered.

### Relation

A Relation in dbwalk is based on the idea of a foreign key in databases. 
It describes a unique identifier for some data item and the way it is contained in another item. The only Relations
currently implemented are

* source table's primary key in a column in the target table (OneToMany relation)
* a column in the source table, containing the target table's primary key (ManyToOne relation)

A join table as commonly used is seen as a ManyToOne relation followed by a OneToMany relation.

Currently an SQL endpoint has been implemented, so most SQL databases should be walkable. 
There is no restriction that both ends of a relation must be in the same database.

Future Relation endpoints will be at least
 
* resource filenames so that a table in a database can be linked to, for example, a JSON file which can be deserialized automatically as part of the query

### Datasource

A Datasource is anything that holds relational data. When building queries, each database is its own Datasource. 
Datasources are abstracted by using multimethods. See vincit.dbwalk.relations for the abstractions.

The datasource description for an SQL db can be anything that the functions in clojure.java.jdbc accept. 
The user is expected to handle transaction and connection pool management.


## Output formats

Look at the tests in writer_test to see inserting and querying examples for Tree output. 
Note that running the tests requires an empty PostgreSQL database. See test_setup.clj. 
This is because JDBCMetaData for H2 does not contain correct data.

### Tree

All of the data from the database as a vector of top-level row items, each of which is a map. 

Each map in the result represents a row in the database and its related rows in other tables. 
The related rows are placed in a vector and assoc'ed to the row either replacing the foreign key or, 
when the foreign key is in the other table, under a new keyword key named after the database table the related rows came from.

The map will contain the requested columns plus all foreign and primary keys required for dbwalk to function.

### Flat

A one-level map of tablename -> vector of rows read from the table.

### Filtered

This is an input generator/output formatter pair. Note that they can and should also be used separately.

See filtered_test for usage. The query generator (query-for-columns) takes a QueryPath and a list of namespaced keywords,
each of which describes a column. For example, :items/id is the column "id" in table "items". 
There is also a possibility to add HoneySQL queries (without SELECT or FROM) so that filtering and sorting is simple.

v.d.o.filtered/query-for-columns returns a QueryTree which walks a minimal set of tables to be able to SELECT all of the requested columns.
Note that one of the selected columns must be in a table that will be the root of the QueryTree.

The filtered output is similar to the Tree output except it contains only the requested columns 
and all related items are placed in a vector under a key named after the related items' source table. 

# Writing
  
Writing operations are under development. The implementation is divided using the same principle as 
querying.
 
The writer component (v.d.a.writer) takes a Loom Digraph that is somewhat similar to the output graph
except the edges always go towards the nodex that has the foreign key column. The writer component
always applies the requested operations to nodes without incoming edges, updates their successors with
the generated foreign keys and removes the nodes. This is done recursively until the graph has no nodes.

There is currently only one option for generating the graph, v.d.a.full-map. It takes a nested map in the
'map' output format with operations set as metadata for the maps. See the test in v.d.a.operations-test.

## Plans

Everything will be moved to clojure.spec when it becomes stable. 

Tests will be improved before publishing.


## License

Copyright © 2016 Vincit

##Changelog

### 0.1.6-SNAPSHOT

Major refactoring of the "simple" ns (previously "sweet").
Filtered ns was split into input/output.
RuntimeExpections are thrown where appropriate.

Added update and delete to insert ns, renamed vincit.dbwalk.action. Will hopefully be renamed again before first release.

### 0.1.5

Last release that is compatible with older versions.

