Decoding gears, revealing minds, and pushing for safer AI systems.
Tracing how information flows through a model by swapping activations at specific sites.
Studying how individual neurons simultaneously respond to many different, unrelated concepts.
Finding the small subnetworks inside large models that are responsible for specific capabilities.
Exploring architectures in motion, with infinite possibilities.
Exploring how assigning energy scores to data can lead to richer, more structured representations.
Designing architectures that combine pattern learning with structured, rule-based reasoning.
Truly caring about correctness
Turning informal mathematical arguments into fully verified, machine-checkable proofs.
Connecting language models to the Lean proof assistant for interactive, verified reasoning.
Developing smarter algorithms for navigating the vast space of possible proof strategies.