Environments
Procedurally generated reinforcement-learning worlds engineered to elicit specific failure modes — deception, reward hacking, sycophancy, sandbagging, situational awareness.
Output
Gym-compatible env · containerized rollout
Reinforcement-learning environments and datasets for safer frontier AI.
contact@sand-box-ai.com
Investors
We're raising a pre-seed to ship our first set of environments. Email for the deck or to set up a call.
Labs
Tell us what you're testing for — model family, behavior to surface, data format, timeline.
Frontier models ship on benchmarks that don't measure the failures we actually fear. Sandbox AI builds the missing infrastructure — environments, trajectories, and evaluations that turn safety claims into testable behavior.
Procedurally generated reinforcement-learning worlds engineered to elicit specific failure modes — deception, reward hacking, sycophancy, sandbagging, situational awareness.
Output
Gym-compatible env · containerized rollout
Versioned trajectory and evaluation datasets — audit-ready, reproducible — licensed to frontier labs and academic safety groups.
Output
Parquet · HF dataset card · evals manifest
Bespoke environments built to your safety case. Joint research, internal evaluations, red-team infrastructure for in-house teams.
Output
6–12 weeks · NDA available · scoped proposal
Team

PhD with Yoshua Bengio at Mila. Prev Anthropic, GDM, Waabi.

PhD with Glen Berseth and Nikolay Malkin at Mila. Prev Mistral, CMU, NASA JPL.

PhD with Irina Rish and Eugene Belilovsky at Mila. Prev Meta, Waterloo.

PhD with Laurent Charlin at Mila. Prev Microsoft Research, ServiceNow, KU Leuven.