Square Attack

This section has a series of coding problems using PyTorch. As always, we highly recommend you read all the content on this page before starting the coding exercises.

Introduction

The Square Attack (Andriushchenko et al., 2020) is a black-box method used to generate adversarial samples. Unlike 'white-box' approaches such as PGD or FGSM, the Square Attack does not require knowing model weights or gradients.

While other black-box attacks take many queries to perform attacks, the square attack requires relatively few. The attack works by taking repeated alterations in the shape of a square on the image, keeping it if it increases the loss of the model. The Square Attack, upon release, was successful enough that it even outperformed some existing white-box approaches on benchmarks.

While both LL_\infty and L2L_2 variations of the attack exist, we focus on the LL_\infty attack. We believe that the LL_\infty attack is sufficient to learn the main concepts behind the attack. At the bottom of this writeup we include limited information about the L2L_2 attack as well but we warn readers that this content is quite technical and not necessary to understand for future sections of the course.

A descriptive alt text
Fig. 1
Source: (Andriushchenko et al., 2020)

The Square Attack Loop

The Square Attack works through a random sampling algorithm. First, the adversarial image x^\hat{x} is initialized as the input image, and the loss is initialized as the loss function of model(x)model(x) and yy. For each iteration, a square of pixels is randomly chosen and perturbed. If the addition of this square to x^\hat{x} increases loss, this addition is kept. If not, the square is rejected. The size of the square is controlled by the variable hh, which is gradually reduced over time to simulate convergence.

The algorithm for the attack loop from the paper is shown below. Although it is not necessary for you to understand everything now, we encourage you to try to parse each line and guess what is happening. This is good practice for future sections which use this kind of notation!

1x^init(x),lL(f(x),y),i1\hat{x} \leftarrow \text{init}(x), \quad l^* \leftarrow L(f(x), y), \quad i \leftarrow 1
2while i<Ni < N and x^\hat{x} is not adversarial do
3h(i)h^{(i)} \leftarrow side length of the square to modify (according to some schedule)
4δP(ϵ,h(i),w,c,x^,x)\delta \sim P(\epsilon, h^{(i)}, w, c, \hat{x}, x)
5x^newProject x^+δ onto {zRd:zxpϵ}[0,1]d\hat{x}_{\text{new}} \leftarrow \text{Project } \hat{x} + \delta \text{ onto } \{z \in \mathbb{R}^d : \|z - x\|_p \le \epsilon\} \cap [0, 1]^d
6lnewL(f(x^new),y)l_{\text{new}} \leftarrow L(f(\hat{x}_{\text{new}}), y)
7if lnew<ll_{\text{new}} < l^* then x^x^new,llnew;\hat{x} \leftarrow \hat{x}_{\text{new}}, l^* \leftarrow l_{\text{new}};
8ii+1i \leftarrow i + 1
9end

LL_\infty Square Attack

For the LL_\infty attack, the δ\delta tensor from line 4 of the previous algorithm, is generated by picking a random location for the square on the image. Then for each color channel, a value for δ\delta is randomly chosen uniformly between 2ϵ-2\epsilon and 2ϵ2 \epsilon where ϵ\epsilon is the LL_\infty budget.

More concretely, δ\delta can be visualized below assuming an input image with 32×3232 \times 32 pixels and 3 color channels. Note that although each color channel has the same square location, the value of the change at each color channel is different.

Example square from L_\infty square attack
Fig. 2
Example δ\delta for LL_\infty square attack

Next this delta would be added to the current adversarial tensor before being projected such that all values are between 0 and 1 and the adversarial image is in the LL_\infty budget. For the LL_\infty attack this "projection" is done by clipping the image (similar to what you saw in the FGSM/PGD section).

BONUS: L2L_2 Square Attack

Once again, we will note that the L2L_2 attack is quite technical and not necessary to understand for future sections of this course. If interested a brief writeup of the attack can be found below with more details in the paper itself.

References

Andriushchenko, M., Croce, F., Flammarion, N., & Hein, M. (2020). Square Attack: a query-efficient black-box adversarial attack via random search. https://arxiv.org/abs/1912.00049