Automatic property-based schema and inheritance detection in Structr

A little more than two weeks have passed since we introduced the GraphGist importer as our contribution to the Neo4j Graph Gist Challenge. You can read all about that in Axel's previous blog post which you can find here.

We received a lot of positive feedback, and many people have since experimented with the new feature, including ourselves of course. But, one thing that was noted by many people who tried to import their existing domain models into Structr, was the lack of type inheritance.

So, we sat down and thought about how to make type inheritance possible because it would be a fantastic addition to the already existing importer.

Labels

When we look at the GraphGist Challenge submissions, most of them already use the new Neo4j 2.0 labels feature to convey type information, for example the Single Malt Scotch Whisky GraphGist by Patrick Baumgartner:

CREATE
 (w_aberlour : Whisky { name: 'Aberlour' }),
 (w_ardbeg :   Whisky { name: 'Ardbeg'   }),
 [...]

Extending this concept to multiple types is somewhat more difficult, because Neo4j does not preserve the order in which labels are set on a node. In our experiments, the order in which the labels of a node were returned was always alphabetical, which is of little use if we want to utilize the labels to form a type hierarchy.

So we had to find a different way to specify and detect inheritance.

It turned out that the existence of a common set of properties is a very good indicator for the fact that two nodes share a common type. We experimented with different models, graph gists and import formats and found out that we can quite successfully identify a type hierarchy in a graph by using the labels, properties and property types of individual nodes.

Even more labels

The key to successfully identify the semantic structure of a graph is the consistent use of labels. Let's consider the following example which we will extend to a complete GraphGist in the course of this blog post.

[source,cypher]
----
CREATE (becks : Beer : AlcoholicBeverage {
    brand: 'Beck's',
    alcoholPercentage: 4.9,
    name: 'Beck's'
})

CREATE (potts : Beer : AlcoholicBeverage {
    brand: 'Pott's',
    alcoholPercentage: 4.8,
    name: 'Pott's'
})

CREATE (auchentoshan : Whisky : AlcoholicBeverage {
    age: 10,
    alcoholPercentage: 40.0,
    name: 'Auchentoshan'
})

CREATE (dalmore : Whisky : AlcoholicBeverage {
    age: 12,
    alcoholPercentage: 40.0,
    name: 'Dalmore'
})
----

We use the GraphGist syntax with Cypher to create two beer nodes and two whisky nodes, each with a set of properties, and three labels: Beer, Whisky and AlcoholicBeverage. You can see that all four nodes share the label AlcoholicBeverage and the two properties alcoholPercentage and name, so Structr will identify AlcoholicBeverage to be the common base type of Beer and Whisky. The other properties brand and age are only shared by two of the nodes respectively, so they will be assigned to those nodes.

At this point in time, our schema looks like this:

AlcoholicBeverage
 + alcoholPercentage: Float
 + name: String

Beer extends AlcoholicBeverage
 + brand: String

Whisky extends AlcoholicBeverage
 + age: Integer

Evolving the schema

As you probably noticed, the current graph does not contain any relationships, so we need to add some more types and relationships to move things forward. Please note that we modify the whole import set, so the above example becomes:

[source,cypher]
----
CREATE (becks : Beer : AlcoholicBeverage : Thing {
    brand: 'Beck's',
    alcoholPercentage: 4.9,
    name: 'Beck's'
})

CREATE (potts : Beer : AlcoholicBeverage : Thing {
    brand: 'Pott's',
    alcoholPercentage: 4.8,
    name: 'Pott's'
})

CREATE (auchentoshan : Whisky : AlcoholicBeverage : Thing {
    age: 10,
    alcoholPercentage: 40.0,
    name: 'Auchentoshan'
})

CREATE (dalmore : Whisky : AlcoholicBeverage : Thing {
    age: 12,
    alcoholPercentage: 40.0,
    name: 'Dalmore'
})

CREATE (axel : Person : Thing {
    age: 38,
    name: 'Axel'
})

CREATE (christian : Person : Thing {
    age: 32,
    name: 'Christian'
})

CREATE (christian-[:LIKES]->potts)
CREATE (christian-[:LIKES]->auchentoshan)
CREATE (axel-[:LIKES]->becks)
CREATE (axel-[:LIKES]->dalmore)
----

As you can see, we added two new Labels to the graph. The first one is Thing, which becomes the new home of the name property, because all nodes are now created with this label (and all nodes have a name). The other one is Person, which contains an additional age property. And we added two persons, Axel and Christian, that can like things. Alcoholic beverages in this case, which is a reasonable approximation of the real world.

Note that even though the LIKES relationship is created between persons and beers (or persons and whiskies respectively), Structr is able to identify the correct related type, which is AlcoholicBeverage, because both Beer and Whisky share the common base class AlcoholicBeverage.

Also note that the age property of the Person nodes is different from the age property of the Whisky nodes because the two do not share a common base class except for Thing, but not all Thing nodes have an age property, so Structr identifies the different semantics correctly.

Adding more things

Finally, to complete this example, we add some more nodes and relationships:

[source,cypher]
----
CREATE (becks : Beer : AlcoholicBeverage : Thing {
    brand: 'Beck's',
    alcoholPercentage: 4.9,
    name: 'Beck's'
})

CREATE (potts : Beer : AlcoholicBeverage : Thing {
    brand: 'Pott's',
    alcoholPercentage: 4.8,
    name: 'Pott's'
})

CREATE (auchentoshan : Whisky : AlcoholicBeverage : Thing {
    age: 10,
    alcoholPercentage: 40.0,
    name: 'Auchentoshan'
})

CREATE (dalmore : Whisky : AlcoholicBeverage : Thing {
    age: 12,
    alcoholPercentage: 40.0,
    name: 'Dalmore'
})

CREATE (christian : Person : Thing {
    age: 32,
    name: 'Christian'
})

CREATE (axel : Person : Thing {
    age: 38,
    name: 'Axel'
})

CREATE (car1 : Car : Vehicle : Thing {
    licensePlate: 'AXL',
    wheels: 4,
    name: 'Axel's car'
})

CREATE (car2 : Car : Vehicle : Thing {
    licensePlate: 'CH',
    wheels: 4,
    name: 'Christian's car'
})

CREATE (bike1 : Bike : Vehicle : Thing {
    color: 'silver',
    wheels: 2,
    name: 'Axel's bike'
})

CREATE (bike2 : Bike : Vehicle : Thing {
    color: 'blue',
    wheels: 2,
    name: 'Christian's bike'
})

CREATE (christian-[:LIKES]->potts)
CREATE (christian-[:LIKES]->auchentoshan)
CREATE (christian-[:LIKES]->car2)
CREATE (axel-[:LIKES]->becks)
CREATE (axel-[:LIKES]->dalmore)
CREATE (axel-[:LIKES]->bike1)
----

When importing the above example, Structr will create the following schema:

Thing
 + name: String
 + personLikes : List<Person>

AlcoholicBeverage extends Thing
 + alcoholPercentage: Double

Whisky extends AlcoholicBeverage
 + age : Long

Beer extends AlcoholicBeverage
 + brand: String

Person
 + likesThings: List<Thing>
 + age: Long

Vehicle extends Thing
 + wheels: Long

Car extends Vehicle
 + licensePlate: String

Bike extends Vehicle
 + color: String

The importer has correctly identified the LIKES relationships to connect persons and things as well as it has correctly analyzed and mapped the implicit type hierarchy formed by the label and property sets.

Ok, so what can I do with that?

Structr is an application framework, so the obvious thing to do is create an application out of your data. The Structr REST backend makes that incredibly easy, since you can access the data from different clients, e.g. a web frontend or a mobile application.

Let's have a look at the REST output of some selected examples. You can see that you can query the database using any type or supertype.

/persons

persons.jpg

/beers

beers.jpg

/vehicles

vehicles.jpg

/things

things.jpg

You can even use more complex functions like range queries, inexact search, etc.

/things?name=an&loose=1

things2.jpg

/whiskys?age=[0 TO 11]

whiskys.jpg

And you can access nested collections directly like this:

/persons/beda386b6db34df2881952a45e328fc3/likesThings

likes.jpg

Conclusion

Property-based schema and inheritance detection is particularly well suited for structured data that is for example created from an existing hierarchical domain model, because such data usually has type information already associated, which can then be easily transformed into labels.

The preferred way of creating a GraphGist for import into Structr can be summarized as follows:

  • generate Cypher CREATE statements for all of your nodes
  • add labels for all types and subtypes of each node
  • generate property entries for all properties of each node
  • generate Cypher CREATE statements for all relationships
  • use the following template and insert the generated statements
  • save the resulting file
  • use the GraphGist importer to import it using a file:// URL (see below)

Template

[source,cypher]
----
// your Cypher here..
----

Did you know?

You can use the GraphGist importer to import files on your local disk! Just use

file:///home/user/mygist.gist

Share this post

Comments