A Datomic-inspired Clojure library for entity extraction (Big E in ETL) from a relational database.
Give credit where credit is due: the relational algebra is pretty great. It's solid, relatively easy to reason about, and we've got 50 years and billions of dollars of engineering effort in making fantastic relational database implementations.
Its great strength is in its unifying abstraction, the relation. Basically a table with named columns. This is also its great weakness: everything is a table. Your data is a table, and the answer to any question you ask the database will be a table, a rectangular prison that you cannot escape.
The pervasive rectangleness of relational data is somewhat inconvenient for us as Clojure developers, since most interesting data that we work with is tree-shaped, not rectangle-shaped. It's so easy and natural for us to create and work with nested data structures, and they are often the most natural representation of the data models that we're working with.
Take a database of music recordings, and we're interested in albums. An album has one or more album artists - which could be bands that have one or more members - and one or more tracks each of which has properties including one or more artists, possibly one or more songwriters, possibly one or more producers, etc.
The natural representation of this kind of data for us is something like:
clojure
{:name "Abbey Road"
:artist [{:name "The Beatles"}]
:tracks [{:name "Come Together"
:number 1
:artist [{:name "The Beatles"}]
:songwriter [{:name "John Lennon"}
{:name "Paul McCartney"}]
:producer [{:name "George Martin"}]}
{:name "Something"
:number 2
:artist [{:name "The Beatles"}]
:songwriter [{:name "George Harrison"}]
...}
... etc
]}
But let me tell you what, it sure is annoying to get to this point if you're starting from a relational database. Because, again, it's the rectangles. Whenever you're dealing with nested zero-or-more things, you can't get it all at once. You have to do a bunch of queries to get little rectangular pieces and then stitch them together in a tree yourself.
A typical way to model this in a relational database might have the following tables:
To get "Abbey Road", we'd need to do the following queries:
After doing all these queries, we need take the data we get back and put it together in our application with code that understands each relationhsip and how it goes into the output.
While it's true that there are "ORM" tools that let you do this without writing all the code yourself, let's just say it's a matter of some controversy whether they deliver on their promises or meaningfully improve the experience of building systems.
Copyright © 2017 Ladders
Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.