fazlB.jpg

Hi, I'm Fazl Barez

AI safety and Interpretability researcher.

I’m a Research Fellow at Torr Vision Group (TVG), University of Oxford, where I lead safety research. Additionally, I’m a Research Scientist (consultant) at Anthropic and advisor at Apart Research. My affiliations include two University of Cambridge institutions: the Centre for the Study of Existential Risk (CSER) and the Kruger AI Safety Lab (KASL), as well as the Future of Life Institute. I’m a member of ELLIS (European Laboratory for Learning and Intelligent Systems), a pan-European AI network of excellence. I also serve as an Adjunct Research Fellow at the Digital Trust centre (Singapore AI Safety Institute) at Nanyang Technological University, and a Visiting Researcher at the School of Informatics, University of Edinburgh.

Previously, I worked as Technology and Security Policy Fellow at RAND Corporation, on Interpretability at Amazon and The DataLab, recommender systems at Huawei, and on building a finance tool for budget management for economic scenario forecasting at RBS Group.

Nov 20, 2024 Excited to speak about unlearning and Safety at the How To Evaluate AI Privacy Tutorial at NeurIPS 2024!🍁🇨🇦
Sep 20, 2024 Our paper Interpreting LFPs in Large Language Models has been accepted at NeurIPS 2024! See you in Vancouver 🍁🇨🇦
Sep 20, 2024 Two papers accepted at EMNLP 2024! See you in Miami ❤️
Aug 11, 2024 Excited to be presenting our work on LLMs Relearn Removed Concepts at ACL 2024! See you all in Bangkok 🇹🇭!
Aug 10, 2024 Co-organised the first workshop on Mechanistic Interpretability at ICML 2024! 🎉

Research Interests


My research focuses on ensuring AI systems are safe, reliable, and beneficial as they grow more advanced. I work on techniques including but not limited to:

I take an interdisciplinary approach, drawing inspiration from fields like neuroscience and philosophy to enrich techniques for safe and beneficial AI. I openly share works on these topics to move AI safety progress forward through rigorous research and facilitate positive real-world impact.

I am always eager to discuss ideas or opportunities for collaboration. Please feel free to send me an email if you would like to connect!

Publications

* = Equal Contribution

For a more complete list, visit my Google Scholar

2023

  1. paper_13121_main.jpg
    Understanding Addition in Transformers
    Philip Quirke, and Fazl Barez
  2. paper_15241.jpg
    Measuring Value Alignment
    Fazl Barez, and Philip Torr
  3. paper_alan_turing.jpg
    The Alan Turing Institute’s response to the House of Lords Large Language Models Call for Evidence
    Fazl Barez, Philip H. S. Torr, Aleksandar Petrov, Carolyn Ashurst, Jennifer Ding, and 22 more authors
  4. paper_15507_main.png
    The Larger they are, the Harder they Fail: Language Models do not Recognize Identifier Swaps in Python
    Antonio Valerio Miceli Barone*, Fazl Barez*, Ioannis Konstas, and Shay B Cohen
  5. paper_19911_main.jpeg
    Neuron to Graph: Interpreting Language Model Neurons at Scale
    Alex Foote*, Neel Nanda, Esben Kran, Ioannis Konstas, Shay Cohen, and 1 more author
  6. paper_17553_2.jpeg
    Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark
    Jason Hoelscher-Obermaier*, Julia Persson*, Esben Kran, Ioannis Konstas, and Fazl Barez*
  7. paper_09826.jpeg
    Fairness in AI and Its Long-Term Implications on Society
    Ondrej Bohdal*, Timothy Hospedales, Philip H. S. Torr, and Fazl Barez*

2021

  1. paper_2021_2.jpeg
    Discovering topics and trends in the UK Government web archive
    David Beavan, Fazl Barez, M Bel, John Fitzgerald, Eirini Goudarouli, and 2 more authors

Keynotes and Invited Talks

Invited Panelist - Dialogue on Digital Trust and Safe AI 2024: Building ​bridges for a safe future (Singapore)

2024 - 🔗 Info

Invited Talk - Digital Trust Centre (Singapore AI Safety Institute) - Mechanistic Interpretability for AI Safety (NTU Singapore)

2024 🔗 Info

Invited Talk - Digital Trust Centre (Singapore AI Safety Institute) - Unlearning and Relearning in LLMs

2024 🔗 Info

Invited Talk - Foresight - AGI: Safety & Security Workshop (San Francisco 2024, USA)

2024 - 🔗 Info 📄 Slides 🎥 Presentation

Invited Panelist - Mechanistic Interpretability (ICLR 2024, Austria)

2024 - 🔗 Info

Invited Talk - Technical AI Safety Conference 2024 (TAIS Conference, Japan)

2024 - 🔗 Info - 🎥 Presentation

Invited Talk - Introduction to AI safety: Can we remove undesired behaviour from AI? (KAIST, South Korea)

2024 - 🔗 Info - 📄 Slides

Keynote Speaker - Personalization of Generative AI Systems (EACL 2024, Malta)

2024 - 🔗 Info

Invited Talk - Interpretability for Safety and Alignment (Foresight Institute, USA)

2023 - 🔗 Info - 🎥 Presentation


Selected Awards and Honors