My contributions include leading research with the UK AISI on machine unlearning for AI safety, developing the N2G method adopted by OpenAI to evaluate Sparse Autoencoders, and leading the Alan Turing Institute's response to the UK House of Lords on LLMs that informed parliamentary inquiries. I've also collaborated with Anthropic on papers investigating deception in LLMs and reward hacking, among other topics.
I'm affiliated with Cambridge's CSER, NTU's Digital Trust Centre, Edinburgh's Informatics and a member of ELLIS. In 2024–2025, I served as a Research Consultant with Anthropic's Alignment team. Previously, I was a researcher at Amazon, a researcher at Huawei, and Co-director and head of research at Apart Research.
I'm looking for motivated people interested in AI Safety, Interpretability, and Technical AI Governance. I value working collaboratively and am especially committed to working with researchers from disadvantaged backgrounds. If this resonates with you, contact me: fazl[at]robots[dot]ox[dot]ac[dot]uk.