fb2017.jpg

Hi, I'm Fazl Barez

AI safety and Interpretability researcher.

I’m a Research Fellow at Torr Vision Group (TVG), University of Oxford, where I lead safety research. Additionally, I’m a Research Advisor at Apart Research. I also hold affiliations with the Centre for the Study of Existential Risk at University of Cambridge and Future of Life Institute.

Previously, I worked as Technology and Security Policy Fellow at RAND, on Interpretability at Amazon and The DataLab, safe recommender systems at Huawei, and on building a finance tool for budget management for economic scenario forecasting at Natwest Group.

I recieved my PhD in AI from Edinburgh where I am currently a visiting scholar, collaborating closely with Shay Cohen.

Jan 17, 2024 Our paper has been acceoted at ICLR 2024! 🤗 See you In Vienna 🇦🇹: Understanding Addition In Transformers
Jan 10, 2024 New Paper 🎉: Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Jan 3, 2024 New Paper 🎉: Can language models relearn removed concepts?
Dec 23, 2023 Presented Measuring Value Alignment at NeurIPS 2023, New Orleans : Presentation

Research Interests


My research focuses on ensuring AI systems are safe, reliable, and beneficial as they grow more advanced. I work on techniques including but not limited to:

I take an interdisciplinary approach, drawing inspiration from fields like philosophy and cognitive science to enrich techniques for safe and beneficial AI. I openly share works on these topics to move AI safety progress forward through rigorous research and facilitate positive real-world impact.

I am always eager to discuss ideas or opportunities for collaboration. Please feel free to send me an email if you would like to connect!

Publications

* = Equal Contribution

2024

2024

  1. paper_01814_2.jpeg
    Large Language Models Relearn Removed Concepts
    Michelle Lo*, Shay B. Cohen, and Fazl Barez*
    2024

2023

2023

  1. paper_15241.jpg
    Measuring Value Alignment
    Fazl Barez, and Philip Torr
    2023
  2. paper_04131_main.jpg
    Locating Cross-Task Sequence Continuation Circuits in Transformers
    Michael Lan, and Fazl Barez
    Apart Research, 2023
  3. paper_13121_main.jpg
    Understanding Addition in Transformers
    Philip Quirke, and Fazl Barez
    2023
  4. paper_08164.jpg
    Interpreting Reward Models in RLHF-Tuned Language Models Using Sparse Autoencoders
    Luke Marks, Amir Abdullah, Luna Mendez, Rauno Arike, Philip Torr, and 1 more author
    2023
  5. paper_05876.jpeg
    AI Systems of Concern
    Kayla Matteucci, Shahar Avin, Fazl Barez, and Seán Ó hÉigeartaigh
    2023
  6. paper_01870.jpeg
    DeepDecipher: Accessing and Investigating Neuron Activation in Large Language Models
    Albert Garde, Esben Kran, and Fazl Barez
    2023
  7. paper_alan_turing.jpg
    The Alan Turing Institute’s response to the House of Lords Large Language Models Call for Evidence
    Fazl Barez, Philip H. S. Torr, Aleksandar Petrov, Carolyn Ashurst, Jennifer Ding, and 22 more authors
    2023
  8. paper_15507_main.png
    The Larger they are, the Harder they Fail: Language Models do not Recognize Identifier Swaps in Python
    Antonio Valerio Miceli Barone*, Fazl Barez*, Ioannis Konstas, and Shay B Cohen
    In , 2023
  9. paper_19911_main.jpeg
    Neuron to Graph: Interpreting Language Model Neurons at Scale
    Alex Foote*, Neel Nanda, Esben Kran, Ioannis Konstas, Shay Cohen, and 1 more author
    2023
  10. paper_17553_2.jpeg
    Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark
    Jason Hoelscher-Obermaier*, Julia Persson*, Esben Kran, Ioannis Konstas, and Fazl Barez*
    2023
  11. paper_09826.jpeg
    Fairness in AI and Its Long-Term Implications on Society
    Ondrej Bohdal*, Timothy Hospedales, Philip H. S. Torr, and Fazl Barez*
    2023
  12. paper_13850_2.jpeg
    Exploring the Advantages of Transformers for High-Frequency Trading
    Fazl Barez, Paul Bilokon, Arthur Gervais, and Nikita Lisitsyn
    2023
  13. paper_12561.jpeg
    Benchmarking Specialized Databases for High-frequency Data
    Fazl Barez, Paul Bilokon, and Ruijie Xiong
    2023
  14. paper_11593.jpeg
    System III: Learning with Domain Knowledge for Safety Constraints
    Fazl Barez, Hosien Hasanbieg, and Alesandro Abbate
    2023

2022

2022

  1. paper_08553.jpeg
    PMIC: Improving Multi-Agent Reinforcement Learning with Progressive Mutual Information Collaboration
    Pengyi Li, Hongyao Tang, Tianpei Yang, Xiaotian Hao, Tong Sang, and 6 more authors
    2022

2021

2021

  1. paper_02817.jpeg
    ED2: An Environment Dynamics Decomposition Framework for World Model Construction
    Cong Wang, Tianpei Yang, Jianye Hao, Yan Zheng, Hongyao Tang, and 5 more authors
    2021
  2. paper_2021_2.jpeg
    Discovering topics and trends in the UK Government web archive
    David Beavan, Fazl Barez, M Bel, John Fitzgerald, Eirini Goudarouli, and 2 more authors
    2021

Talks

Interpretability for Safety and Alignment (Intelligent Cooperation Workshop, Foresight Institute, SF, USA).

2023 - 🔗 Info - 📄 Slides

An Introduction to AI Safety and Alignment - Indaba Deep learning Conference, Tunis, Tunisia.

2022 - 🎥 Presentation

An Introduction to AI Safety and Alignment - AI safety Israel Conference - Technion, Haifa, Israel.

2022 - 🎥 Presentation


Selected Awards and Honors :