05/07/2026
Space Domain Awareness has a data problem most teams won’t say out loud:
The real data is too sparse to train AI that operators can actually trust. So we stopped trying to fix the data — and started generating our own.
I gave this talk yesterday at the 2026 Department of the Air Force Modeling, Simulation & Analytics Summit in Colorado Springs — hosted by STARCOM, SAF/SA CMSO, and NTSA — on the Virtual Range Architecture we’ve built at MSBAI, and how it maps to the Digital Space Range and NSTTC vision STARCOM laid out this week.
Three design choices that matter:
1. Generate the data you don’t have. TLE and EO inputs from the Unified Data Library are uneven and gap-ridden. We pre-train on millions of synthetic maneuver scenarios in NASA’s GMAT, then fine-tune on real ops data. The same playbook Tesla uses for crash scenarios autopilot has never seen.
2. Train at the embedding level, not the data level. Joint Embedding Predictive Architecture — newer than transformers, built for time-series — compresses inputs before learning. Less noise into the weights, more semantic structure out. We’re hitting AUC 0.98 on maneuver detection across 14,710 space objects, and 94–96% classification accuracy.
3. Wrap the learned components in symbolic logic. A deterministic rules engine on top is what makes the system auditable to a Guardian or an accreditor. The LLM-only crowd cannot do this part. It’s the difference between a confident model and a defensible decision.
Running at ~2-minute end-to-end latency on 20,000+ objects, with linear JEPA training scalability to 4,000 nodes on Argonne National Laboratory’s Aurora.
Built under a CDAO contract administered by Air Force DTO, embedded with Space Systems Command at the SDA TAP Lab, and tested across HPCMP, Aurora (ANL), and Frontier (Oak Ridge Leadership Computing Facility).