Models and Data

For the "Adversarial Basics" section of this course you will use two datasets, train one model, and load one pretrained model from our Hugging Face. Before beginning, we will give an overview of the datasets you will use and some context about the model you will be loading. We recommend you do not skip this section, because it is always extremely important to understand the data and models you are working with.

Dataset #1: The CIFAR 10 Dataset

The CIFAR 10 Dataset (Krizhevsky & Hinton, 2009) was created by Alex Krizhevsky and Geoffrey Hinton to study feature extraction for image classification.

There are a total of 10 classes in the dataset: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck. Each image is a 32 $\times$ 32 color image (meaning $32\cdot32\cdot3$ floating point values per image). Each value $x_{ijz}$ in an image $X$ is constrained such that $0 \leq x_{ijz} \leq 1$ .

Dataset #2: MNIST Handwritten Digits

The MNIST dataset of handwritten digits (Lecun et al., 1998) is a modified version of the NIST dataset (Grother & Hanaoka, 2016) which was produced by the US government. Interestingly, despite being incredibly popular, some important details related to its construction were never documented and remain unknown (Yadav & Bottou, 2019).

The dataset features 70,000 labeled 28 $\times$ 28 grayscale images of handwritten numbers. Because the image is grayscale, there are only $28 \cdot 28$ values per image rather than $28 \cdot 28 \cdot 3$ . Like CIFAR 10, each value in the image is constrained such that $0 \leq x_{ij} \leq 1$ .

Model #1: CIFAR 10 Model

For sections using the CIFAR 10 dataset, we provide you access to our pretrained CIFAR 10 classifier via our Hugging Face. The architecture of the model is inspired by (Zagoruyko & Komodakis, 2017) but is much more compact compared to a state-of-the-art model. We designed this model to be as small and efficient as possible to make it as easy as possible for you to run regardless of your hardware.

Technical Details of the CIFAR 10 Model

The model has 165,722 parameters and was trained for 75 epochs on the CIFAR 10 training dataset. Training took a total of about 4 minutes and 20 seconds on a single H100 GPU. The final train accuracy was 86.66% and the final test accuracy was 83.86%. The figure below shows the loss and accuracy curves for both the train and test set for each epoch.

To replicate these results, you may reference our code here.

Running the CIFAR 10 Model

The nice part about using Hugging Face is you don't have to manually download anything. We will provide the below code for you in the notebooks, but just so you can see, it is quite simple.

from huggingface_hub import hf_hub_download
from xlab.models import MiniWideResNet, BasicBlock
import torch

model_path = hf_hub_download(
    repo_id="uchicago-xlab-ai-security/tiny-wideresnet-cifar10",
    filename="adversarial_basics_cnn.pth"
)
model = torch.load(model_path, map_location='cpu')

Model #2: The MNIST Model

In the defensive distillation and the ensemble attack sections of this course, you will be using a variety of different compact models trained on the MNIST dataset. You can find training details for ensemble attack models here and the details for the distillation model here. The reason why you will be using the MNIST dataset in these sections is because it is much easier to train and run a small model for MNIST classification rather than CIFAR classification. It is also because it is useful to get hands-on experience with different datasets. As a side note, researchers today use MNIST sparingly for adversarial robustness research because models trained on the dataset are in general too easy to break and results don't generalize to other settings.

NextFGSM and PGD

References

Grother, P. J., & Hanaoka, K. K. (2016). NIST Special Database 19: Handprinted Forms and Characters Database [Techreport]. https://www.nist.gov/srd/nist-special-database-19

Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images (Techreport No. 4; Vol. 1, p. 7). University of Toronto.

Lecun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324. https://doi.org/10.1109/5.726791

Yadav, C., & Bottou, L. (2019). Cold Case: The Lost MNIST Digits. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché Buc, E. Fox, & R. Garnett (Eds.), Advances in Neural Information Processing Systems (Vol. 32). Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2019/file/51c68dc084cb0b8467eafad1330bce66-Paper.pdf

Zagoruyko, S., & Komodakis, N. (2017). Wide Residual Networks. https://arxiv.org/abs/1605.07146