fb2017.jpg

Hi, I'm Fazl Barez

AI safety and Interpretability researcher.

I’m a Research Fellow at Torr Vision Group (TVG), University of Oxford, where I lead safety research. Additionally, I’m a Research Advisor at Apart Research. I also hold affiliations with the Centre for the Study of Existential Risk at University of Cambridge and Future of Life Institute.

Previously, I worked as Technology and Security Policy Fellow at RAND, on Interpretability at Amazon and The DataLab, safe recommender systems at Huawei, and on building a finance tool for budget management for economic scenario forecasting at Natwest Group.

Apr 18, 2024 Serving as a Programme Committee for the upcoming ECAI 2024: 27th European Conference on Artificial Intelligence
Apr 2, 2024 Invited to a talk for AI Safety Colloquium: Introduction to AI safety: Can we remove undesired behaviour from AI?
Jan 17, 2024 Our paper has been acceoted at ICLR 2024! 🤗 See you In Vienna 🇦🇹: Understanding Addition In Transformers
Jan 10, 2024 New Paper 🎉: Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Jan 3, 2024 New Paper 🎉: Can language models relearn removed concepts?

Research Interests


My research focuses on ensuring AI systems are safe, reliable, and beneficial as they grow more advanced. I work on techniques including but not limited to:

I take an interdisciplinary approach, drawing inspiration from fields like philosophy and cognitive science to enrich techniques for safe and beneficial AI. I openly share works on these topics to move AI safety progress forward through rigorous research and facilitate positive real-world impact.

I am always eager to discuss ideas or opportunities for collaboration. Please feel free to send me an email if you would like to connect!

Publications

* = Equal Contribution

For a more complete list, visit my Google Scholar

2023

  1. paper_15241.jpg
    Measuring Value Alignment
    Fazl Barez, and Philip Torr
    2023
  2. paper_13121_main.jpg
    Understanding Addition in Transformers
    Philip Quirke, and Fazl Barez
    2023
  3. paper_alan_turing.jpg
    The Alan Turing Institute’s response to the House of Lords Large Language Models Call for Evidence
    Fazl Barez, Philip H. S. Torr, Aleksandar Petrov, Carolyn Ashurst, Jennifer Ding, and 22 more authors
    2023
  4. paper_15507_main.png
    The Larger they are, the Harder they Fail: Language Models do not Recognize Identifier Swaps in Python
    Antonio Valerio Miceli Barone*, Fazl Barez*, Ioannis Konstas, and Shay B Cohen
    In , 2023
  5. paper_19911_main.jpeg
    Neuron to Graph: Interpreting Language Model Neurons at Scale
    Alex Foote*, Neel Nanda, Esben Kran, Ioannis Konstas, Shay Cohen, and 1 more author
    2023
  6. paper_17553_2.jpeg
    Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark
    Jason Hoelscher-Obermaier*, Julia Persson*, Esben Kran, Ioannis Konstas, and Fazl Barez*
    2023
  7. paper_09826.jpeg
    Fairness in AI and Its Long-Term Implications on Society
    Ondrej Bohdal*, Timothy Hospedales, Philip H. S. Torr, and Fazl Barez*
    2023

2021

  1. paper_2021_2.jpeg
    Discovering topics and trends in the UK Government web archive
    David Beavan, Fazl Barez, M Bel, John Fitzgerald, Eirini Goudarouli, and 2 more authors
    2021

Talks

Invited Talk - Technical AI Safety Conference 2024 (International Conference Hall, Tokyo)

2024 - 🔗 Info - 🎥 Presentation

Invited Talk - Introduction to AI safety: Can we remove undesired behaviour from AI? (KAIST, South Korea)

2024 - 🔗 Info - 📄 Slides

Keynote Speaker - Personalization of Generative AI Systems (EACL 2024, Malta)

2024 - 🔗 Info

Invited Talk - Interpretability for Safety and Alignment (Intelligent Cooperation Workshop, SF, USA)

2023 - 🔗 Info - 🎥 Presentation


Selected Awards and Honors