Time is on My Side: New Temporal Modeling Capabilities for Synthetic xAPI Data
One of the most interesting aspects of xAPI is the ability to model time.
Activity, of course, occurs over time. And events occur as sequences of activity. xAPI can track the time and duration of any given activity and xAPI Profiles can model the sequences. Because DATASIM (the Data and Training Analytics Simulated Input Modeler) is based on xAPI Profiles, there has always been a desire to leverage these capabilities to increase the ability to refine the modeling of the time factors in a generated output of synthetic xAPI data.
Huh?
Quick Background
DATASIM is an Apache 2.0 open source software capability that was designed and developed by the team at Yet Analytics under a BAA engagement with the Advanced Distributed Learning Initiative (ADL).
What it does: It uses the patterns and sequences in xAPI Profiles to generate synthetic data in the format of xAPI.
It’s purpose: It was intended to generate context-relevant synthetic xAPI data for the purpose of stress-testing learning ecosystem infrastructure and for checking data design.
Example use case: How much would it cost to run our xAPI system at scale? Design an xAPI Profile that models your use case, and run it through DATASIM to produce any scale of data you want. In our T&E period, we were pushing out about 4 billion xAPI Profile aligned statements per 3.5 hours.
Another use case: Quickly test to see if your xAPI Profile produces data relevant to the metrics you are trying to report on. You can use the Centriph beta platform to create testing snippets. (DATASIM is baked into Centriph).
What it does not do: DATASIM was never intended to model learning experiences in such a way as to make predictions about the use of certain instructional strategies. We’ll leave that for other people to do. DATASIM should be understood as a smart systems design and testing tool, not a predictive learning tool.
So What’s the Big News?
DATASIM was delivered pre-pandemic. And we always had this one nagging issue, but we never had time to work on it. Basically, if you let DATASIM create a synthetic dataset, all of the time and duration attributes would just follow a deterministic pattern. And the result would be highly unruly and patently of a non-fidelity nature. As a result, if using DATASIM to model any complex scenarios, you had to essentially run micro-simulations representing different “times” and then patch them together by hand to represent the final dataset.
Well, this new release addresses this issue. Now DATASIM is able to provide enhanced temporal modeling in xAPI, including:
Actor-specific or global time bounds for activities and xAPI Profile components
Resume (or restart) behaviors for when time bounds are hit
Event occurrence frequency modifications
Enhanced behavioral modeling
And, as a bonus, it features a reworked command line interface (CLI) with a more intuitive argument structure and new inputs.
Note that with progress comes change. The prior DATASIM UI has been deprecated. The software is now exclusively available for CLI. And while you will still be able to run the old DATASIM, this version breaks things. So if you want to add the new capabilities, you have to use the new version. It’s still distributed as open source under the Apache 2.0 license, so you are welcome to use it for any purpose including in commercial applications.
What’s Next
We’re currently using DATASIM in a new R&D project related to pilot training. We’ll be looking to share initial results towards the end of 2024.
If you have a project and would like help using DATASIM, please reach out and we can discuss.