me

I founded and lead the Agent Security team at the Center for AI Standards and Innovation (CAISI), a hub of AI expertise in the U.S. government. Our team does research, evaluations, red teaming, and standards development to measure and improve the security of advanced AI systems.

Some things we’ve been up to:

I live in Princeton, NJ, where I am a Princeton University AI Lab Policy Fellow. I joined government in 2024 as a TechCongress AI Security Fellow after my PhD.

During my PhD in Computer Science at Harvard, I co-founded the ML Foundations research group and was supported by a NSF Graduate Research Fellowship. My advisors were Sham Kakade and Leslie Valiant; I was also privileged to have Boaz Barak, Cyril Zhang, and Surbhi Goel as mentors.

One theme of my doctoral research was the interplay between machine learning and strategic incentives. Another theme—the subject of my dissertation—was using simple, mathematically well-defined tasks as model systems to study training dynamics and inductive biases in neural networks.

Before that, I undergraduated in math at Princeton and did computational complexity research.

benedelman100@gmail.com | Google Scholar | LinkedIn | X

Research

Transcendence: Generative Models Can Outperform The Experts That Train Them
Edwin Zhang, Vincent Zhu, Naomi Saphra, Anat Kleiman, BE, Milind Tambe, Sham M. Kakade, and Eran Malach
NeurIPS 2024 | Blog post

The Evolution of Statistical Induction Heads: In-Context Learning Markov Chains
BE, Ezra Edelman, Surbhi Goel, Eran Malach, and Nikolaos Tsilivis
NeurIPS 2024 | Blog post

Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh Lubana, Erik Jenner, Stephen Casper, Oliver Sourbut, BE, Zhaowei Zhang, Mario Günther, Anton Korinek, Jose Hernandez-Orallo, and 27 others
TMLR, 2024 | Webpage

Distinguishing the Knowable from the Unknowable with Language Models
Gustaf Ahdritz, Tian Qin, Nikhil Vyas, Boaz Barak, and BE
ICML 2024 | Blog post

Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models
Hanlin Zhang, BE, Danilo Francati, Daniele Venturi, Giuseppe Ateniese, and Boaz Barak
ICML 2024, and Secure & Trustworthy LLMs Workshop @ ICLR 2024 | Blog post

Feature Emergence via Margin Maximization: Case Studies in Algebraic Tasks
Depen Morwani, BE, Costin-Andrei Oncescu, Rosie Zhao, and Sham Kakade
ICLR 2024 (spotlight) | Blog post

Pareto Frontiers in Deep Feature Learning: Data, Compute, Width, and Luck
BE, Surbhi Goel, Sham Kakade, Eran Malach, and Cyril Zhang
NeurIPS 2023 (spotlight)

Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit
Boaz Barak, BE, Surbhi Goel, Sham Kakade, Eran Malach, and Cyril Zhang
NeurIPS 2022

Inductive Biases and Variable Creation in Self-Attention Mechanisms
BE, Surbhi Goel, Sham Kakade, and Cyril Zhang
ICML 2022

The Multiplayer Colonel Blotto Game
Enric Boix-Adserà, BE, and Siddhartha Jayanti
Games and Economic Behavior (full version), EC 2020 (extended abstract)

Causal Strategic Linear Regression
Yonadav Shavit, BE, and Brian Axelrod
ICML 2020

SGD on Neural Networks Learns Functions of Increasing Complexity
Preetum Nakkiran, Gal Kaplun, Dimitris Kalimeris, Tristan Yang, BE, Fred Zhang, and Boaz Barak
NeurIPS 2019 (spotlight)

Matrix Rigidity and the Croot-Lev-Pach Lemma
BE, Zeev Dvir
Theory of Computing, 2019

Theses

Combinatorial Tasks as Model Systems of Deep Learning
PhD Thesis

A Proof of Strassen’s Degree Bound for Homogeneous Arithmetic Circuits
Undergraduate Senior Thesis

Teaching

Spring 2021 Teaching fellow for CS 229br: Biology and Complexity
Received Certificate of Distinction in Teaching from Harvard University

Spring 2020 Teaching fellow for CS 228: Computational Learning Theory
Gave three lectures on “Mysteries of Generalization in Deep Learning”

Tutorials

How to Achieve Both Transparency and Accuracy in Predictive Decision Making: An Introduction to Strategic Prediction
with Chara Podimata and Yonadav Shavit
FAccT 2021

Talks

January & March 2024 Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models
NYC Crypto Day, Boston Crypto Day

February 2023 Studies in feature learning through the lens of sparse boolean functions
Seminar in Mathematics, Physics and Machine Learning, University of Lisbon

November 2022 Hidden progress in deep learning
Statistical Learning Theory and Applications, MIT course

September 2022 Sparse feature emergence in deep learning
Alg-ml seminar, Princeton University

May 2022 Towards demystifying the inductive bias of attention mechanisms
Collaboration on the Theoretical Foundations of Deep Learning

Feb 2022 Towards demystifying transformers & attention
New Technologies in Mathematics Seminar, Harvard Center of Mathematical Sciences and Applications