Data Science and the Digital Thread | Part 2

In the first post in this series, we considered the relationship between Data Science and the Digital Thread. In this post, we will discuss what need to put these ideas into practice. First, obviously, is data. We want all the data about the system and a way to get to it. For that, I’m going to use Syndeia™, the Digital Thread platform from Intercax. It builds and manages a network of inter-model connections between models in the engineering applications shown in Figure 1, as well as structured data in various formats like XML, SQL, CSV and others. It also provides a mechanism to get to that data.

Data-Science-digitalthread-syndeia

Figure 1  The Digital Thread integrations in Syndeia Release 3.4

The good news for Data Science is that all this data is structured. The bad news is that there are a lot of different structures. Our second need is a common data model that allows us as data scientists to structure our queries across the entire Digital Thread, the full dataset. For Syndeia, we use four element types, Repositories, Containers, Artifacts and Relations, as shown in Figure 2. All data is treated as one of these elements or their attributes.

data model

Figure 2 Common Data Model in Syndeia

Third, we need the analysis tools. Our objective at Intercax is to make the data available and let data scientists use the tools they already know and love. This series will consider two widely used open-source Data Science tools, TinkerPop graph analysis and Jupyter Notebooks (Figure 3).

TinkerPop originated back in 2009 as an open-source software project managed by the Apache Software Foundation. It offers a common interface supported by many open source and proprietary graph databases including a common graph analysis language, Gremlin, for searching and querying.

Our second tool will be the Jupyter Notebook, a web-based interactive computational notebook that emerged out of the iPython open-source project in 2014. It consists of a set of ordered cells containing API calls, computation, text and visualization and there are many open-source data science libraries.  We will use the Python language in our examples, although it can support other languages as well.

data science tools

Figure 3 Two Open-Source tools for Data Science

In Part 3 (forthcoming), we will dive more deeply into data structures in our Digital Thread. The final posts will demonstrate some of these ideas in action.

For more blogs in the series:

Dirk Zwemer

Dr. Dirk Zwemer (dirk.zwemer@intercax.com) is President of Intercax LLC (Atlanta, GA), a supplier of MBE engineering software platforms like Syndeia and ParaMagic. He is an active teacher and consultant in the field and holds Level 4 Model Builder-Advanced certification as an OMG System Modeling Professional.