Prerequisites
In this course, you will learn AI Security concepts from the ground up. We will not assume you know anything about adversarial examples, jailbreaks, etc. If you don’t know much about AI security, but would like to learn, you are in the right place.
We also will not assume that you know too much about LLMs or their internals. In this course we will teach you what you need to know about LLM internals for basic security research. We will teach you how to load small language models locally, jailbreak them, and defend them against attacks.
We do, however, assume that you know Python, are familiar with PyTorch (with some basic proficiency in tensor operations), and have a background in linear algebra and multivariable calculus (don't worry about multivariable integration or vector calculus, gradients and differentiation are the most important). If you are unsure whether you have sufficient background to start completing this course, we recommend that you try it out and see how it goes. At the bottom of this page we provide so advice for navigating gaps in knowledge you may run into.
Python
This course assumes that you have a strong background in Python programming. For coding exercises, we aim to make our code as readable as possible, but you are expected to be able to understand classes, list comprehensions, etc.
If you are entirely new to programming, this course is not for you, but we recommend this course from Giraffe Academy which is a great resource for intro-level programming tutorials.
Machine Learning Framework
Because it is the dominant framework used for research, this course uses PyTorch. If you have experience in something like TensorFlow or JAX, you can likely start completing this course with minimal friction. As long as you reference the PyTorch documentation, or as an LLM to explain syntax, you will catch on quickly.
If you have a strong background in Python, but have not had practice with a machine learning framework like PyTorch, TensorFlow or JAX, we recommend the following tutorials from Andrej Karpathy to get up to speed:
-
The spelled-out intro to neural networks and backpropagation: building micrograd
You will learn about backpropagation and build a toy machine learning framework that has a similar structure to PyTorch. This will give you useful intuition about what is happening when you call
loss.backward(),optimizer.step(), andoptimizer.zero_grad(). -
The spelled-out intro to language modeling: building makemore
This video will give you good intuition about language modeling, loss functions for classification, and tensor operations in PyTorch.
-
Building makemore Part 2: MLP
This tutorial is the culmination of the previous two videos. You will build our an MLP language model in PyTorch building on the concepts from the first two videos.
The other videos in the series are also helpful but are generally outside of the scope of what we would expect you to know.
Understanding Tensor Operations
We will expect you to be familiar with manipulating shapes of tensors and computing sums or averages accross dimensions. Here are a few ways to test your knowledge before proceeding:
-
Given a simple tensor declaration would you be able to tell it’s shape without printing it out?
>>> tensor = torch.tensor([[1,2],[3,4]]) >>> tensor.shape torch.Size([2, 2]) -
Do you understand what it means to compute a sum or mean over a dimension? Do the output dimensions below match what you would expect?
>>> tensor = torch.tensor([[1,2],[3,4]]) >>> tensor.sum(dim=1) >>> tensor([3, 7]) >>> tensor.sum(dim=1, keepdim=True) >>> tensor([[3], [7]]) >>> tensor.sum(dim=0) >>> tensor([4, 6]) >>> tensor.sum(dim=0, keepdim=True) >>> tensor([[4, 6]])
If you have worked with PyTorch or NumPy before this will probably look somewhat familiar. If these kinds of operations are confusing to you, we recommend you read this page from the PyTorch documentation and then play around with PyTorch's sum, mean, and Softmax operations across different dimensions.
Math
We will also expect you to understand various concepts from linear algebra and multivariable calculus. For linear algebra, you should have a solid grasp on matrices, vectors, matrix multiplication, and norms. Understanding rank, subspaces, dot products, and the singular value decomposition will also help with certain sections. We won't have any explicitly mathematical exercises; an intuition about these topics is the most important (and for which we recommend 3Blue1Brown's Essence of Linear Algebra series.
For multivariable calculus, you should understand gradients, gradient descent, and the multivariable chain rule. Once agian, you won't have to do any pen-and-paper mathematical exercises, but none of these concepts should seem foreign. For a good background, we recommend 3Blue1Brown's Neural Networks series, specifically chapters 2, 3, and 4.
What Should You Do When You're Stuck?
This document is not an extensive list of everything you will need to know to complete this course. More likely than not you will get stuck on a line or concept that you do not understand. When this happens, here is what we recommend:
- First, spend a few minutes playing around, trying to figure out the problem with only the PyTorch documentation. There is value in struggling through problems and you shouldn’t cheat yourself out of this experience.
- If you are truly stuck on something, you should reference the hints we provide in the notebooks if the problem you are working on has them. If we anticipate at a certain part of a problem is particularly hard, we provide a hint to help you get through that part of the problem.
- Next, we recommend you take a look at the solution we provide for the problem. The purpose of providing the solutions is so you can reference them to see where you may have went wrong. To test you understanding, before moving on we recommend you type up a solution yourself, without looking directly at our solution.
Help From LLMs
In any of the three steps above, LLMs can be very useful! The key is to ask clear specific questions that will help you learn something new about AI security or PyTorch. Here is an example of when we believe it would be a good time to ask a question:
import torch.functional as F
tensor = torch.ones(1,2)
F.softmax(tensor, dim=1)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[74], line 5
1 import torch.functional as F
4 tensor = torch.ones(1,2)
----> 5 F.softmax(tensor, dim=1)
AttributeError: module 'torch.functional' has no attribute 'softmax'
If you ask an LLM what is the problem here, you will find that the issue is the import should be import torch.nn.functional as F rather than import torch.functional as F. This particular bug may take a minute to figure out even if you are looking at the documentation. Using an LLM in these cases can let you worry less about syntax and more about learning the core concepts of the course.