In the first three parts of this blog series, I discussed the connections between Data Science and Digital Engineering through the idea of a Digital Thread. In this and the following post, I will show some examples from Syndeia, the digital thread platform from Intercax, that illustrate these ideas, beginning with the application of graph analysis to a small Syndeia project based on the schema in Figure 1. Requirements are in Jama Connect, linked to blocks and activities in a SysML model in Teamwork Cloud (TWC), linked in turn to software code files in GitLab and parts in Windchill, respectively. Further connections are made to test cases in TestRail, digital documents in Artifactory and issues in JIRA where progress in this area can be logged.
Figure 1 Schema for Graph Analysis example
To illustrate the power of graph analysis, we will look at the digital thread at two points in time, Stage 1 where only 28 inter-model relations have been created and Stage 2 with 78 relations. The ability to monitor the evolution of the system model in time is one of the critical strengths of digital threads. In the accompanying video, we will ask the following questions using the Gremlin pattern-matching query language transmitted through the Syndeia Cloud Web Dashboard.
- count all relations in Stage 1 (DZSB15)
- show all relations in Stage 1
- show all Jama to TWC links in Stage 1
- show all Jama-TWC-GitLab connections in Stage 1
- show all Jama-TWC-JIRA connections in Stage 1
- count all relations in Stage 2 (DZSB19)
- count all Jama-TWC-GitLab connections in Stage 2
- show all Jama-TWC-GitLab connections in Stage 2
- show all Jama-TWC-JIRA connections in Stage 2
- count all Jama-TWC-X-JIRA connections in Stage 2
- show all Jama-TWC-X-JIRA connections in Stage 2
- show all Jama-TWC-Windchill-JIRA connections in Stage 2
Looking deeper at a few of these queries, we can understand the realization of the theoretical schema in Figure 1. Not all paths in this schema will be built out at the same rate. In Figure 2, we narrow the query to identify connections from Jama to TWC to GitLab at Stage 1. Only one such instance matches this pattern.
Figure 2 Graph query results, Jama to TWC to GitLab at Stage 1
On the other hand, a query about Jama-TWC-JIRA connections returns eleven instances in Figure 3. This part of the digital thread is more developed.
It should be noted that even these simple queries are filtered by several conditionals. The inter-model relations shown must (1) belong to Syndeia project DZSB15 (Stage 1) and (2) must be the most recent version of these relations. The Gremlin “.has(‘_isLatest’,’TRUE’)” conditional eliminates older versions of the connections which Syndeia Cloud, as a configuration-managed database, retains. Filtering by attributes of the repository artifacts, e.g. JIRA issues, would also be possible.
Figure 3 Graph query results, Jama to TWC to JIRA at Stage 1
The second half of the video demonstrates a similar analysis of the same digital thread at a latter point in time. It also provides two examples of how visual graphs help identify changes and anomalies in the data. As I discussed in Part 1, preparation of the data before analysis is an important step in Data Science. It cannot be assumed that a data set will be free from errors, gaps, and unexpected structural changes.
Figure 4 Graph query results, Jama to TWC to GitLab at Stage 2
Figure 4 repeats the query in Figure 2 at a later stage. Quantitatively, there are now five instances of the Jama-TWC-GitLab pattern at Stage 2, but it is easily observed that one instance is qualitatively different, involving two GitLab connections from the same TWC element. This may or may not be an error, but it is now available to the analyst for evaluation and possible repair before a potential error in the conclusions.
Another situation arises when we repeat the query in Figure 3. The Stage 2 digital thread returns a zero value for the number of Jama-TWC-JIRA connections. Referring to the schema on Figure 1, we see that there are alternative connection paths between TWC and JIRA via either GitLab or Windchill. The query in Figure 5 is modified to identify those possibilities, where X could be GitLab, Windchill or some third possibility.
Figure 5 Graph query results, Jama to TWC to X to JIRA at Stage 2
We can see from Figure 5 that there are examples of Jama-TWC-GitLab-JIRA and Jama-TWC-Windchill-JIRA at least, even though there are no Jama-TWC-JIRA instances. Clearly, the structure of the digital thread was revised between Stage 1 and Stage 2. The direct Jama-TWC-JIRA connections have been deleted (although records of the older configurations remain in the Syndeia Cloud database). Graph analysis has revealed this, preventing us from errors based on obsolete assumptions. We can now revise our analyses, as in Figure 6 querying Jama-TWC-Windchill-JIRA connections specifically.
Figure 6 Graph query results, Jama to TWC to Windchill to JIRA at Stage 2
The general philosophy of Data Science stresses two issues. First, the Data Scientist should not accept the data unquestioningly, indifferent to its quality or provenance. Second, the questions asked should produce actionable intelligence of clear technical and ultimately ”business” significance. In this part, we have tried to demonstrate how graph analysis supports these issues. In Part 5 (forthcoming), we’ll demonstrate how to do this on a practical ongoing basis with a second data science tool, Jupyter notebooks.
Please note this video does not contain audio.
For more blogs in the series: