The open source solution for synthetic xAPI —
leveraged to generate data for a variety of purposes.
Producing more xAPI than any other application on Earth.
The development of DATASIM has been supported by the sponsorship of the Advanced Distributed Learning Initiative and by the Air Force Research Laboratory. Yet Analytics manages the open source.
Why would I ever need synthetic data?
When we began work on DATASIM, the idea was that it could provide synthetic data with an appropriate amount of fidelity for the stress-testing of learning technology infrastructure.
But people keep finding new uses for it.
It’s been used to help with cost calculations.
It’s been used to meet the challenges of modeling longitudinal development and to estimate the capabilities of a competency-assertion system.
It’s been used many, many times to test xAPI Profile design on Centriph.
And it’s been used to create synthetic data to help push the needle on the creation of novel analytics and AI capabilities in the Air Force.
The image here is a small screenshot taken of hundreds of thousands of synthetic xAPI data statements created in seconds.
Get it on GitHub.
Pull it from off of Docker.
Let our team set it up.
This isn't your AI's synthetic data.
One of the greatest challenges of working with AI is that so much of it occurs within a black box. Therefore, it can be cumbersome if not downright impossible to troubleshoot when problems arise. In the case of synthetic data generation this could mean not understanding why a GPT is producing and reproducing a certain errors and therefore finding it impossible to fix the problem.
DATASIM takes a different approach. The data profiles that power DATASIM are powered by humans and augmented by real-time machine validation. These data profiles combine with modularly-controlled variables in DATASIM to produce a transparent and accessible simulation specification. DATASIM uses that spec to generate a completely auditable stream of synthetic data. If problems arise, or if you want to tilt the scales in your simulation in a different direction, you have complete control and the ability to access those decision points.
Use Case: Synthetic Data Automation for Pilot Training
What if?
While the domain of pilot training could benefit from the development of novel analytics and artificial intelligence innovations, much of the data required to feed such applications are restrained by security and privacy requirements, especially in the DoD.
But what if the data were synthetically generated, right? GenAI is an option. Though AI-driven synthetic data is often... problematic. If there were only a better approach that was fast, cost-effective, transparent, secure, and accurate...
A different approach...
In our approach, all of the patterns and sequences of synthetic activity are modeled in an xAPI Profile -- a semantic data profile that can be designed by a human, run by a machine, and which is defined by open global standards.
The profile is fed into DATASIM where the researcher chooses parameters such as the number of learners being observed, the duration of the training session, and whether the activity should be weighted for increased stress and difficulty. The profile and the variable modifications are saved as a simulation specification and are reusable. So unlike black box solutions the behavior here is transparent, explainable, and able to be audited and modified either by humans or machines.
While attempts have been made to model airplane behavior and pilot activities in the cockpit, relatively little attention has been paid to modeling the pilot instructors themselves. We chose to create data profiles representing the act of evaluating pilot trainees. To do this, we created a digital representation of a pilot evaluation form and then modeled the behaviors and activities of pilot instructors during an evaluation including not only how they score pilots on different items, but things like how long it takes them to complete portions of the assessment, in what order they observe things happening, and when they choose to change initial observations based on later observations.
Human-Computer Synthesis
By modeling the human pilot training experience in data, we are able to feed the machine with the information it needs to create a simulated world in which the learning activity is carried out according to the patterns and sequences of possible and coherent activity described in the data profile.
The data produced by that simulated learning experience is then captured and validated just as though it were coming from a real-life human scenario. The key is to simulate the instrumentation of the digital interface points and human observations points that will allow for a high-fidelity simulation to occur. As all of the mechanics are available in a machine-readable format, there is no need for the simulation to occur in real-time -- but ideally, the synthetic data output from DATASIM is the same as if the simulated activity occurred in the real world.
Who cares?
In the near term, this data could be mined to better understand and predict instructor behaviors that improve pilot training. The result could be novel analytics that increase insight into the process of training pilots. The longer term implications are even more intriguing.
These analytics could leverage advances in learning science such as our understanding of spaced repetition, personalization, and optimal zones of learning development to increase the quality and efficiency of learning experiences. This is data that would enhance the design of AI instructors who could augment the capability of human instructors to meet the demands of a workforce that is experiencing generational change.
It's also an approach that can be applied to all training -- every training event could be represented in a digital data profile. The synthetic data available through those profiles would increase the ability to experiment with and improve training systems at all levels and in any vertical -- including creating connections between activity occurring during a learning experience and the assertion of competency.
Learn more about DATASIM
Podcast (coming soon)
Research & Development
Capabilities Statement