The data model Syndeia uses for the Digital Thread is a graph, a collection of vertices and edges, each of which can have a name, type and properties (edges can also have a direction). In our case, we have all of those. Referring back to Figure 1, Part 1, the reader can see why this might be a good fit with our picture of the Digital Thread. But the second advantage of graph databases is that the technology is highly scalable. Pushed by social networks like Facebook and LinkedIn, they have been engineered to handle millions of nodes and connections. The time for a properly formed query is independent of the size of the dataset.
Figure 1 Graph characteristics
In our data model, repositories, containers and artifacts are all vertices, relations are edges, as illustrated in Figure 2. In addition, there are two special types of edges. One is the “ownedBy” hierarchical relationship. Artifacts and relations are owned by a container, for example, issues are owned by a project in JIRA, and containers are owned by a repository.
Figure 2 One view of the Common Data Model in Syndeia
The second is a “hasType” relationship, for example, the edge between Repository and Repository Type in Figure 3. Each element has a type. A Teamwork Cloud (TWC) artifact could be a model, a branch, a revision, a block, or so forth, all artifact types specific to a particular TWC repository.
Figure 3 A second view of the Common Data Model in Syndeia
Graphically, our Digital Thread looks something like Figure 4. Each of the larger circles represents a repository. The smaller circles within are containers, which contain artifact and intra-model relations. Syndeia creates a set of inter-model relations between them, which are collected in a Syndeia project container within the Syndeia repository. Note that Syndeia doesn’t try to store the artifact data, only the connections between them with enough information to identify and find the artifacts at the ends.
Figure 4 How Repositories, Containers, Artifacts and Relations model the Digital Thread in Syndeia
In practice, mapping each repository structure to the common data model is not simple. Each specialized tool has its own set of standard and custom types. The mapping is different for Jama than JIRA, as suggested in Figure 5 where different terminology is used for Repositories, Containers and Artifacts. Some tools diverge quite strongly from the common model, e.g., relations may be treated as attributes, and the mapping is complex.
Figure 5 Mapping JIRA and Jama data models to the Syndeia common model
In Part 4 (forthcoming), it’s finally time to do some Data Science. We will apply these concepts in generating Gremlin graph queries to analyze a Digital Thread involving seven separate repositories.
For more blogs in the series:
- Data Science and the Digital Thread | Part 1
- Data Science and the Digital Thread | Part 2
- Data Science and the Digital Thread | Part 3 (This Post)
- Data Science and the Digital Thread | Part 4
- Data Science and the Digital Thread | Part 5