While they sound fancy and complex, graph databases are actually a pretty simple idea. A graph as applied to databases is just a different way of structuring data than we are used to.
Most of us have used or are familiar with traditional "relational" databases. These databases are modeled after ledgers and forms, and store our data in tables. To break it down in a simplified way:
- A row in a table represents a thing
- Properties of the thing are columns in the table
- Different types of things are stored in different tables
- Two things are said to be related when they have the same values in "key" columns. So an item in table Foo is related to an item in table Bar when
Foo.Bar_IDholds the same value as
This is how most database work as been done for a long time. It's familiar to us, and you can do a lot with it.
Graph databases are actually a bit simpler. There are only two things to worry about: nodes (sometimes called vertices) and relationships (sometimes called edges). Nodes are dots, and relationships are lines between the dots. It's all dots and lines.
Nodes are the things in graph databases. In most graph DBs, nodes have properties, which are a set of keys and values. In many graph dbs, nodes will also have labels which are used to categorize and group nodes. This way you know what type of thing a node is.
So, to break it down:
- A node represents a thing
- Properties of the thing are properties on the node
- The type of a thing is set by a label on the node.
- Two things are related when a relationship is created between them – a line is drawn between the nodes
It's pretty cool, because it's close to how we often think about entities and relationships between them. We don't have to translate from how we would draw it on a whiteboard, and how it actually will work in the database.
That's really all a graph is: a collection of nodes connected by relationships. That simplicity, however, allows it to scale up to big sets of data, and adapt to changing needs very well.