About


I'm exploring the role of causality as a language for explainability and oversight of large generative models like ChatGPT. Advised by Victor Veitch.
I'm investigating post-hoc, internal interpretability of large language models from a causal inference perspective. Specifically, what does it mean to locate a human-understandable concept, and how can we validate such claims?
I'm particularly motivated by applications to AI safety such as monitoring long-term planning and deception, but am also excited about applications to fairness and adversarial robustness.

Publications


Stability of Stochastically Switched and Stochastically Time-Delayed Systems


Camille Carter, Jacob Murri, David Reber, Benjamin Webb

arXiv, 2020


Dynamical Stability despite Time-Varying Network Structure


David Reber, Benjamin Webb

MAA Intermountain Section Conference, SIAM Discrete Mathematics Conference, SIAM Annual Meeting, 2018