There is no safety without security.

The UChicago XLab AI Security Guide is built to prepare the next generation of AI researchers for the next generation of hackers.

About AI Security

While RLHF, constitutional AI, and other methods are often effective at improving the safety of AI products, they aren't robust to a variety of edge cases. A single clever prompt or adversarial suffix can undo months of safety training, while fine-tuning open weight models can easily bypass safeguards that appeared robust upon release.

AI security research systematically exposes the gap between what AI models learn and what humans intend them to learn. Unlike other pillars of AI safety research which are pre-paradigm or theory-based, AI security research gives practical insights into how we can design safer models.

About XLab

Founded in 2022 at the University of Chicago, the Existential Risk Laboratory (XLab) is an interdisciplinary research organization dedicated to the analysis and mitigation of risks that threaten human civilization's long-term survival.

The legacy of existential risk work at the University of Chicago dates back to Enrico Fermi and the world's first nuclear chain reaction under the historic Stagg Field. XLab was founded in the same spirit of concern and commitment to mitigating the great threats of our time.