NoSQL: What’s the Buzz About Graph Databases?

I attended the NoSQLNow conference in San Jose and had the opportunity to  speak one-on-one with a number of principals of NoSQL database concerns, including Emil Eifrém of Neo Technology. For those of you who aren’t familiar with the concept, graph databases are based on an arrangement of edges, properties and nodes with relationships between them, not rows and columns with primary and foreign key relationships. In practice this allows them to traverse graphs of information more efficiently than reading pages of data and finding the rows that match the query.

Interestingly, Graph Theory (Euler) predates Set Theory (Cantor/Dedekind) on which the relational model is based by over 150 years. Of historical interest, the development of the relational database at IBM was conceived as a method to get data out of databases, not get data in. This turned out, in the early 70’s to be a problem for IBM so they redirected Ted Codd’s efforts to making relational databases fast transaction processors. Enter the concept of “normal form,” a horribly misleading term that has side-railed a zillion projects by data modelers with a thin understanding of the concept insisting on “normal” purity no matter the cost. The rest is history. The whole DSS/BI/Analytics movement grew out of the fact that the relational databases were poor performers at non-transaction processing.

According to the NoSQL movement, and I’m not entirely convinced of this but I’m listening, the rigidity of a physical schema needed in relational databases is their undoing in an era of agility, speed and volume.  Here is a quote from Wikipedia:

Compared with relational databases, graph databases are often faster for associative data sets, and map more directly to the structure of object-oriented applications. They can scale more naturally to large data sets as they do not typically require expensive join operations. As they depend less on a rigid schema, they are more suitable to manage ad-hoc and changing data with evolving schemas. Conversely, relational databases are typically faster at performing the same operation on large numbers of data elements.

The key characteristic of graph databases is this notion if index-free adjacency, meaning, each node knows the location of its adjacent nodes so an index is unnecessary. Obviously, a semantic interpretation of this is that the graph is a representation of relationship. Paradoxically, there are no relationships in a “relational” database, they are applied at run time from the query.

Emil seems to think that graph databases are superior to RDB in every way and will eventually supplant them. The concept that RDB are based on sound and proven mathematical principles is interesting, but relational theory is only 50 years old. Graph theory goes back to the 17th century!

This all sounds good, but there are only a billion applications out there that rely on things NOT changing and for which the relational model is well-suited. As William McKnight said in his keynote, look to NoSQL as additive not replacement technology.

For those old enough to remember, lots of database systems in the 80’s tried to get on the database bandwagon by getting certified as relational, and we’re already seeing this with graph databases. Just to name a few, the FlockDB for Twitter is only a thin graph database on top of MySQL and therefore lacks index-free adjacency. Microsoft’s Trinity does not store graphs natively.

Most of the NoSQL vendors are pushing the notion that their products are so much less complex than the current RDB’s. This is undoubtedly true, but they probably lack so much functionality that’s been built on the RDB model over the decades. In fact, RDB’s were pretty simple in the beginning too.

To sum it up, most of the NoSQL products I’ve seen are clearly aimed at high-speed, low-complexity transaction or streaming processing, usually with unconventional data. They are not analytical tools. But they could provide a very useful, even indispensible role in analytics: getting meaning into the process.

There has been a schism between semantic technology and graph databases, probably because the former still can’t figure out how to market their technologies while simultaneously trying to prove how smart they are. Their message in muddled and their most visible promoters are not, shall we say, enterprise ready. Oddly, the notion of a triple is fundamental to graphs, but graph database vendors are steering clear of the whole ontology/RDF/OWL thing and finding their customers in other pursuits. Good move.

This entry was posted in Big Data and tagged , , , , , , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s