I have been struggling for a while with trying to understand and depict in an easy-to-relay fashion the organizational structure of my employer – it may sound silly, but it turns out it is not all that easy to describe or represent our rather 3-dimensional management structure in EMC presales. I found that Neo4J is easily capable of not only constructing the necessary relationship models, but also of allowing easy traversal of the graph to ‘walk’ the EMC presales organization. Our presales organization is not only somewhat 3-dimensional, it is also not strictly uniform from division to division, theater to theater, or business unit to business unit. Asking a question such as “who is my peer in that other division?” (or worse, “who is my peer’s manager in that other division”), while unnecessarily convoluted in an RDBMS, is relatively straightforward in Neo4J.
Then, while reading an article on How to Use Neo4J for Natural Language Search, the author made the point the value a graph brings to a complex data modeling problem isn’t just that it can model relationships in a more agile fashion than an RDBMS, it is actually the metadata that holds the value. That is to say, the questions a user will be asking of a graph database are answered by the nature of the relationships themselves. Put another way, it is not necessarily the objects / nodes that are of interest, but the relationships themselves.
Let’s consider a common game that many folks are familiar with- Rock Paper Scissors (also known as Roshambo, though i will refer to it hereafter as RPS). The game is played using hand symbols to represent objects (ummmm…, a rock, paper, and scissors, maybe?). For an understanding of how the game is played, ease see Wikipedia here- Rock Paper Scissors.
Ready to get a bit metaphysical?
(Bet you never thought you would read that in the context of Rock Paper Scissors…)
While it is necessary to have representative objects for the players to use to play the game, the objects themselves are immaterial it is the relationships between the objects that imparts significance to them… That is to say, it is how the objects are used that we are interested in, rather than the rock, the paper, and the scissors.
If we wanted to try to represent this in a traditional RDBMS, what would we need? Well, we would need a table, and that table would have one or more rows of data, each row representing a single record. It would be easy enough to use a single table with 3 rows, one each for the rock, paper, and the scissors. How would we then represent the relationships?
Relationships in a RDBMS are formed through the use of identifying ‘keys’ to tie records in one table to a record or records in another. The ‘key’ is used to avoid duplicating data. In an effort to represent the relationships for RPS using a RDBMS, consider the following diagram:
Note the complexity inherent in trying to force the issue – every record needs an identifying ‘primary’ key in the ‘object’ table, and a corresponding ‘foreign’ key is used in the ‘relationship’ table to tie the object to another object. Using a single table for the objects is the cleanest choice, reducing structural redundancy in the database, but it is necessary to hen create a ‘join’ or ‘bridge’ table to construct the relationships. Furthermore, all we have been able to accomplish thus far is establish that there IS a relationship, but since this type of ‘transaction’ doesn’t inherently carry the NATURE of the relationship with it, the solution falls short of being able to represent the transactions that occur during the game. We haven’t even addressed the nature of the relationships themselves yet – how would we denote that Rock ‘crushes’ the scissors, and so on? We would have to add another field in the existing table, or perhaps even another table…
In its full instantiation, it might appear as follows:
On my favorite show, The Big Bang Theory, Sheldon and the gang use a ‘nerdified’ version of RPS that they call “Rock Paper Scissors Lizard Spock” (click the link for an explanation). In this game, the rock, the paper, and the scissors are complemented by the addition of a lizard, and by Spock (hence the name of the game!).
So what’s the point?
The point, briefly, is simply to drive home that while some systems fit into a relational management scheme well, others don’t. In the case of the game above, a relational system just doesn’t capture the important information – the relationships – in a fashion that makes sense or are useful to humans. Consider the following diagram instead:
In this diagram, we can clearly capture the relevant information about the relationships between the objects. What’s more, this can be represented easily in a graph database, because the database was designed to be ‘schema-less’, inherently flexible and capable of representing complex relationships as first-class citizens.
Furthermore, in the aforementioned article on Natural Language Search, the author makes the point that the value of the graph database is in being able to identify and cache the relationships between objects, as queries to the database will be in the form of “what is the relationship of X to Y,” rather than the types of queries you might see in a traditional RDBMS (“how many objects of type X are there?”)… Since the nature of the questions being asked differ, the nature of the system used to store and retrieve the data must necessarily be different as well.
That’s enough for now… I will pursue the nature of the above in a future post. Thanks for reading!