AI Safety and Alignment

A 5-day intensive course on foundational concepts and frontier research in the field of AI safety and alignment.

Topics:

Course information

Lecturer: Fazl Barez.

Teaching assistants: Hunar Batra, Matthew Farrugia-Roberts, and James Oldfield.

Guest Tutorial Speaker: Marta Ziosi.

Guest lecturers:

Schedule:

Lectures: 10:00–13:00 Monday–Thursday. LR7, Information Engineering Building. Open to all Oxford students. Recorded.
Labs: 14:00–17:00 Monday–Thursday. Eagle House. Reserved for AIMS CDT students. Lab materials will be made available.
Guest speaker day: 10:00–16:30 Friday. Location TBD. Open to all Oxford students.

Prerequisites:

Mathematical maturity with proficiency in probability, optimization, and machine learning.
Familiarity with neural networks, gradient descent, and deep learning fundamentals.
Python programming experience and ability to train basic neural networks.

The alignment problem (Monday, October 13^th):

Alignment methods and evaluation (Tuesday, October 14^th):

Frontier model training and alignment pipeline.
Alignment methods: RLHF, constitutional AI, deliberative alignment, weak-to-strong generalisation.
Benchmarks, evaluation, and scalable oversight.
Lab: RLHF and Constitutional AI.

Interpretability and monitoring (Wednesday, October 15^th):

Sociotechnical aspects of AI alignment (Thursday, October 16^th):

Guest speaker day (Friday, October 17^th):

To pass the module, AIMS students must:

Participate in labs Monday–Thursday.
Complete a 1,500 word report on a chosen alignment failure mode. The report is to be completed during the labs (after completing the day’s exercises) and outside of contact hours.

The deadline for the report is Monday, October 20^th, 23:59.