# dbwalk

A Clojure library for inserting and querying nested maps to/from a database or other sources of relational data.

NOTE: This library is in alpha and is not to be used for anything. 
If you want to participate in its development, go to #clojure on Slack or contact Sami Sandqvist.

At the moment, it can do a deep insert and query by following foreign keys.
Automatic database schema detection is supported for postgresql.

## Overview

Dbwalk is based on the idea from [Objection.js](https://www.vincit.fi/en/blog/nested-eager-loading-and-inserts-with-objection-js/), namely that given a set of database rows, 
it is possible to extract primary/foreign key columns and fetch a set of related rows from another table 
using one WHERE *foreign key columns* IN (*list of primary key columns values*). 

As this is a Clojure library, we do not use Objects and therefore dbwalk is not an ORM. 
The relations in dbwalk are the foreign key relations of the database tables themselves and there is no abstraction over the data contained in the database,

## Concepts

### Relation

A Relation in dbwalk is based on the idea of a foreign key in databases. 
It describes a unique identifier for some data item and the way it is contained in another item. The only Relations
currently implemented are

* source table's primary key in a column in the target table (OneToMany relation)
* a column in the source table, containing the target table's primary key (ManyToOne relation)

Note that a join table as commonly used is seen as a ManyToOne relation followed by a OneToMany relation.

Currently an SQL endpoint has been implemented, so most SQL databases should be walkable. 
Note that there is no restriction that both ends of a relation must be in the same database.

Future Relation endpoints will be at least
 
* resource filenames so that a table in a database can be linked to e.g. a JSON file which can be deserialized automatically as part of the query.

### Datasource

A Datasource is anything that holds relational data. When building queries, each database is its own Datasource.

The datasource description for an SQL db can be anything that the functions in clojure.java.jdbc accept. 
The user is expected to handle transaction and connection pool management.

### QueryTree

DBbwalk's query format is a tree of SQL queries, each to one table.

For each level in the query tree, :query selects properties from that level's table and :eager contains the tables to branch the query to.

The query functions take HoneySQL queryies. The FROM clause must have only one table, as it is currently used to determine the next table to walk to.
You can use WHERE, ORDER BY and pretty much everything provided by HoneySQL. 
Selecting only some columns is allowed, but dbwalk will automatically add all columns required for the relations in the query tree.

    {:query (-> (s/select :*) 
                (s/from :owners))              ;; Select from "owners"
    :eager [{:query (-> (s/select :*)          
                        (s/from :items))]}     ;; Then query "items" and match owners to items.

The above query is presented without the required datasource information, as it can (and should) be inserted using vincit.dbwalk.utils/with-datasource. 
See the examples in utils_test.

### QueryPath

The problem with the QueryTree format is that it complects the data you want to get with the path used to gather it. 
When using SQL JOINs, the type of join affects the format of your result data. 
With the dbwalk method of getting data, when you go from table A to B you will always get 0..n rows of B in relation to one row from A.
Therefore, only the direction of travel is significant for any relation.

If you consider the structure of a database as a non-directional graph, a QueryPath is one of its directed acyclic subgraphs. 
As duplicate nodes in a QueryPath are also forbidden, a QueryPath is in fact a tree. It can be seen as representin a direction of travel in the database
or the desired cardinalities between rows of result data.

Tech note: duplicate nodes in a QueryPath are currently forbidden due to the pathfinding algorithm which will be improved. 
In the future, a QueryPath can include more than one ingoing per relation, but still only one outgoing link.


## Output formats

Look at the tests in writer_test to see inserting and querying examples for Tree output. 
Note that running the tests requires an empty PostgreSQL database. See test_setup.clj. 
This is because JDBCMetaData for H2 does not contain correct data.

### Tree

All of the data from the database as vector of top-level row items, each of which is a map. 

Each map in the result represents a row in the database and its related rows in other tables. 
The related rows are placed in a vector and assoc'ed to the row either replacing the foreign key or, 
when the foreign key is in the other table, under a new keyword key named after the database table the related rows came from.

The map will contain the requested columns plus all foreign and primary keys required for dbwalk to function.

### Flat

A one-level map of tablename -> vector of rows read from the table.

### Filtered

See filtered_test for usage. The query generator (query-for-columns) takes a QueryPath and a list of namespaced keywords,
each of which describes a column. E.g. :items/id is the column "id" in table "items". 
There is also a possibility to add HoneySQL queries (without SELECT or FROM) so that filtering and sorting is simple.

v.d.o.filtered/query-for-columns returns a QueryTree which walks a minimal set of tables to be able to SELECT all of the requested columns.
Note that one of the selected columns must be in a table that will be the root of the QueryTree.

The filtered output will be similar to the Tree output except it will have only the requested columns 
and all related items are placed in a vector under a key named after the related items' source table. 

# Inserting
  
The insert functions expect the data in the same format as the select functions output it. 
Note that inserting removes duplicates, so that you cannot insert the exact same row twice to one table with one insert. 
This may become optional in the future.

## Plans

Add a primatic/schema for database schema description so it can be used manually or generated for other databases.


## License

Copyright © 2016 Vincit

