Selected Results in Additive Combinatorics: An Exposition

We give a self-contained exposition of selected results in additive combinatorics over the group GF(2) n = {0,1} n . In particular, we prove the celebrated theorems known as the Balog-Szemeredi-Gowers theorem (’94 and ’98) and the Freiman-Ruzsa theorem (’73 and ’99), leading to the remarkable result by Samorodnitsky (’07) that linear transformations are efficiently testable. No new result is proved here. However, we strip down the available proofs to the bare minimum needed to derive the efficient testability oflinear transformations over {0,1} n , thus hoping to provide a computer science-friendly introduction to the marvelous field of additive combinatorics.


Introduction
Additive combinatorics is a fascinating area of mathematics that has found several applications in theoretical computer science.In addition to the book by Tao and Vu [20], a number of expositions of various results in additive combinatorics are now available.See, for example, the survey by Trevisan [21] and the pointers therein.
In this survey we aim to provide a self-contained, friendly introduction to additive combinatorics.We cover a few selected results, stripped down to the minimum needed to obtain the following result by Samorodnitsky [17] that linear transformations are efficiently testable.
Theorem 1.1 (Testing linear transformations (Samorodnitsky)).For all ε > 0 there is ε > 0 such that for all sufficiently large n and all functions f : F n 2 → F n 2 the following holds.If then there is an n × n matrix M such that Pr In the statement of this theorem, and throughout the remainder of the paper, F 2 denotes the set {0, 1} with mod 2 arithmetic.
Theorem 1.1 shows that the test "pick x, x ∈ F n 2 uniformly at random, and accept if f (x) + f (x ) = f (x + x )" is useful to check if a function f : F n 2 → F n 2 , given as a black-box, is close to a linear transformation.Indeed, if f is a linear transformation, i. e., f (x) = Mx for an n × n matrix M, then clearly the test always accepts; while on the other hand if the test accepts with probability ε then the theorem guarantees that there is a linear transformation (given by M) that agrees with f on an ε fraction of the inputs.
This test was first proposed by Blum, Luby, and Rubinfeld [5], who analyze it for functions between finite groups.Bellare et al. [4] later refine the analysis in the case f : F n 2 → F 2 .The proofs in [5,4] appear to break down in the setting of Theorem 1.1 where the range of f is F n 2 .The proof of Theorem 1.1 gives an exponential dependence of ε on ε.The "polynomial Freiman-Ruzsa conjecture" states that ε is polynomial in ε [10].
To introduce the subject of additive combinatorics and motivate the following sections, let us now informally see how the property-testing result in Theorem 1.1 follows from some known results in the subject, which will be presented along the way.
Proof idea for Theorem 1.1.We are interested in the additive combinatorics of the graph A of the function f : A := {(x, f (x)) : x ∈ F n 2 } ⊆ F 2n 2 .The approach to prove the theorem is to show that A is approximately a linear space.This approach is motivated by the observation that if A were exactly a linear space, then f would be a linear transformation, because in this case We start by noting that our assumption can be written as Pr Next, we apply our first result in additive combinatorics, namely the Balog-Szemerédi-Gowers (BSG) theorem [2,9].This theorem states that if a set A satisfies (1.1) then it contains a large subset that is THEORY OF COMPUTING LIBRARY, GRADUATE SURVEYS 3 (2011), pp.[1][2][3][4][5][6][7][8][9][10][11][12][13][14][15] nearly closed under addition.More formally, defining 2S := {a + a : a, a ∈ S}, the BSG theorem says that there is a set (1.2) (From Equation (1.1) we cannot in general conclude that |2A| ≈ |A|, which motivates considering the subset A ⊆ A.) At this point we apply our second result in additive combinatorics, namely Ruzsa's theorem [16], which is a finite-field analogue of an older theorem due to Freiman [8].This theorem says that if a set A satisfies (1.2) then it is approximately a linear space.Specifically, denoting by span(A ) the vector space spanned by elements of A , Ruzsa's theorem states that In other words, Ruzsa's theorem says that if linear combinations of length 2 (i.e., 2A ) do not buy much size, then neither do linear combinations of arbitrary length (i.e., span(A )).
Finally, even though A may not be a linear space, from (1.3) one can still draw the conclusion that f is close to a linear transformation, thus concluding the proof of the theorem.
Before discussing how this exposition is organized, we stress that it focuses on additive combinatorics in the additive group of the vector space F n 2 .This choice is motivated by the importance of this space to computer science, and by the fact that the proofs of the relevant results in additive combinatorics appear to be cleanest over F n 2 .More work is needed to extend these results to more general domains mainly because of the need to take care of the signs (as discussed in [20], for example).With that more work, one can extend these results as follows.An analogue of Theorem 1.1 holds over F n p for any fixed prime p, see [17,Theorem 4.1].The BSG theorem holds over any abelian group, see [20,Theorem 2.29].The Freiman-Ruzsa theorem holds over any abelian group with bounded torsion, see [20,Theorem 5.27].
We also mention that Green and Tao [12] have recently given a new direct proof of the combination of the BSG and Ruzsa theorems over F n 2 using Fourier analysis.Going back to the proof of Theorem 1.1, their result goes directly from (1.1) to (1.3).
Organization.After some preliminaries in Section 2, we prove the BSG theorem in Section 3. In Section 4 we prove Ruzsa's theorem.In Section 5 we conclude the proof of the testability of linear transformations (Theorem 1.1).Our presentation of the BSG theorem in Section 3 follows the one by Sudakov, Szemerédi, and Vu [18], which relies on a graph-theoretic lemma regarding certain paths in dense graphs.In Section 6 we also present the proof of the optimality of the path length of this lemma, due to Kostochka and Sudakov [14].This section is not needed for the proof of Theorem 1.1; we include it to provide a more complete picture of the proof techniques we present.For the same reason, in Section 7 we present a simpler proof of the testability of linear transformations in the case in which the agreement is large (corresponding to ε ≈ 1).

Preliminaries
In this work we are concerned with subsets of the vector space F n 2 , whose operation is the componentwise addition mod 2 denoted by "+."Throughout this survey, A denotes a subset of F n 2 .
For sets A, B ⊆ F n 2 , we denote by A + B the set {a + b : a ∈ A, b ∈ B}.For an integer we denote by A the set A + A + • • • + A where the number of summands is .Finally, we denote by span(A) the span of the elements of A, i. e., span(A) = A. We use several times the following basic counting argument, whose proof is straightforward.Finally, all the graphs in this paper are undirected and have no self-loops.
One way to think of the BSG theorem is the following.For a subset E of the cartesian product A × A, let us denote its set of sums by ∑ E := {a + b : (a, b) ∈ E}.Then the BSG theorem says that from a dense E ⊆ A × A such that ∑ E is a subset of A and hence is small compared to |E|, we can obtain a dense The proof that we present of the above Theorem 3.1 follows one due to Sudakov, Szemerédi, and Vu [18].It makes use of the following graph-theoretical statement, which does not use any property of addition and only relies on the density of the graph.The use of the language of graph theory to prove results in additive combinatorics has been found fruitful, and it goes back at least to Szemerédi's theorem on arithmetic progressions [19].
Proof of Theorem 3.1, assuming Lemma 3.2.Consider the graph G = (A, E) on |A| nodes where two distinct nodes are adjacent if and only if their sum is in A, i. e., E := {{a, b} : (For large values of n, the factor 1/3 generously accounts for the translation between the hypothesis, which talks about pairs, and Lemma 3.2, which talks about edges.)Now let A be the subset of A given by Lemma 3.2, and consider any two a, b ∈ A .By Lemma 3.2, there are 8 .By the definition of E, the sum of two consecutive nodes in any path lies in A. Thus, considering the function THEORY OF COMPUTING LIBRARY, GRADUATE SURVEYS 3 (2011), pp.[1][2][3][4][5][6][7][8][9][10][11][12][13][14][15] Note that distinct triples (c 1 , c 2 , c 3 ) give rise to distinct inputs (x 1 , x 2 , x 3 , x 4 ), which follows from the fact that a and b are fixed.In other words, we have shown that each element of 2A can be represented in at least ε |A| 3 different ways as a sum of 4 elements of A. Via Proposition 2.1 this leads to the following upper bound on |2A |: concluding the proof.
Proof of Lemma 3.2.The idea is to exhibit a set A ⊆ A such that every a ∈ A shares Ω(N) neighbors with most nodes in A .(We may think of "most" as a 0.9 fraction.)From this we infer that, for every two nodes a, b ∈ A , most nodes c 2 in A share Ω(N) neighbors c 1 with a and also share Ω(N) neighbors c 3 with b, which implies the result.We now give the details.For a node v ∈ G let us denote by N(v) ⊆ A the neighborhood of v.The set A will be a subset of N(v ) for some v given by a probabilistic argument.For this argument, let us call a pair {u, w} of distinct vertices bad if |N(u) ∩ N(w)| ≤ ε 3 • N. Let, moreover, v ∈ G be a uniformly distributed random node of G.We are interested in the number of bad pairs inside N(v).Let B {u,w} be the 0/1 indicator variable that is 1 when u, v ∈ N(v) and {u, w} is bad.For every bad pair {u, w} (not necessarily in N(v)) it holds that {u, w} ⊆ N(v) if and only if v is a common neighbor of u and w, which by the definition of "bad" happens with probability at most ε 3 .Consequently, by the linearity of expectation, we have Let us now denote by S(v) the set of nodes u ∈ N(v) that form a bad pair with at least ε 2 • N other nodes w ∈ N(v).Since there are always at least , where the factor 1/2 comes from the fact that each bad {u, w} is counted once for u and once for w, Equation (3.1) implies that Therefore, using the fact that To see that A satisfies the conclusion of the lemma, consider any a, b ∈ A .Since we removed the nodes in S(v ), i. e., those that form a bad set with at least ε 2 • N other nodes w ∈ N(v ), both a and b form a good pair with all but at most ε 2 • N nodes of A .So there are at least A that form a good set with both a and b, where the last inequality holds if we assume that ε ≤ 1/3.(To optimize exponents, one can replace the last inequality with ε − 2ε 2 ≥ ε/3.)For every such c 2 we have, by the definition of "good," ε 3 • N choices for c 1 and This proves the theorem under the the assumption that ε ≤ 1/3.If ε > 1/3, the same proofs works when ε is replaced with ε/2, which is at most 1/3 because any graph trivially has at most N 2 /2 ≥ ε • N 2 edges.

Ruzsa's theorem
In this section we prove Ruzsa's theorem [16], which states that the span of A does not expand too much if 2A does not.
Theorem 4.1 (Ruzsa).For all c there is c such that for all n and all sets A ⊆ F n 2 the following holds.If The core of the proof of Theorem 4.1 is the following lemma, which states that 4A does not expand too much if 2A does not.Proof of Theorem 4.1 assuming Lemma 4.2.We start with the following covering claim showing that we can cover all of 4A by few translates of A: There is a set X ⊆ 3A whose size depends only on c such that for every b ∈ 3A we have To prove the covering claim, initialize X to the empty set, and as long as there is some b ∈ 3A violating (4.1), add b to X.The resulting X satisfies the intersection requirement by construction.To verify the bound on the size of X, note that at each iteration the set X + A grows in size by |A|, but X + A ⊆ 4A always holds, and so at the end of the process |X| is at most |4A|/|A|.By Lemma 4.2, together with our assumption that |2A| ≤ c • |A|, it follows that this quantity depends only on c.Now we show by induction that, for every ≥ 3, A ⊆ ( − 2)X + 2A.This will conclude the proof, as the size of X depends only on c. Specifically we obtain that span(A) ⊆ span(X) + 2A, and so For the base case = 3 of the induction, take any b ∈ 3A.By (4.1), X + A intersects b + A, which means x + a = b + a for some x ∈ X and a, a ∈ A, and so b = x + a + a ∈ X + 2A.
For the inductive step, write where we apply the inductive hypothesis and then the base case.
Proof of Lemma 4.2.We start with the following covering claim, whose statement and proof are very similar to those of the covering claim in the proof of Theorem 4.1 above.(We omit the proof.)For any As a consequence of the covering claim, we have that for every b ∈ A there are at least |A|/2 triples (a 0 , a 1 , x) ∈ A × A × X such that b = x + a 0 + a 1 .This is because each element y ∈ (X + A) ∩ (b + A) gives rise to, say, one such triple with a 1 := b + y. (This last requirement makes all the triples distinct.)Now we use this implication to prove the lemma.The idea is to represent each element w of 4A in at least (|A|/2) 2 distinct ways as a sum of a quintuple (c, c , c , x, x ) with c, c , c ∈ 2A and x, x ∈ X; then an application of Proposition 2.1 completes the proof.More specifically, write w ∈ 4A as w = z + z for z, z ∈ 2A.We first represent each of z and z separately in many ways as a sum of a triple, then we represent the pair (z, z ) in many ways as a sum of a sextuple, and finally we map the sextuples in a one-to-one fashion into quintuples that represent the sum z + z = w in many ways.We now give the details.
Fix an arbitrary w ∈ 4A, and fix some z, z ∈ 2A for which w = z + z .Further write z = b 0 + b 1 , for b 0 , b 1 ∈ A. By the above, there are at least |A|/2 triples (a 0 , a 1 , x) ∈ A×A×X such that z = b 0 +a 0 +a 1 +x.Since b 0 + a 0 ∈ 2A, there are at least |A|/2 triples (c, a 1 , x) ∈ (2A) × A × X such that z = c + a 1 + x.By repeating the argument for z = b 0 + b 1 , we obtain that there are at least (|A|/2) 2 sextuples Note that, in any solution to the above system, a 1 and a 1 are uniquely determined once c, x, c , x are chosen (as z and z are fixed).So, two different solutions cannot differ only in the two coordinates ranging in A. In particular, the map that takes a solution is one-to-one.Moreover, such a quintuple sums up to z + z = w.Therefore, similar to the proof of Theorem 3.1, we have a function such that, for every element w ∈ 4A, there are at least (|A|/2) 2 distinct inputs y such that f (y) = w.By Proposition 2.1, we have Turning back to the main result of this section, Theorem 4.1, we note that the current best upper bound on c is obtained by Green and Tao [11] who prove c ≤ 2 2c modulo lower order factors.Taking A to be a set of 2c independent vectors one notes that c ≤ 2 2c is the best possible, and in particular that c in general must be exponential in c.However, if one is willing to settle for the span of a large subset A of A, rather than all of A, in the same spirit as the BSG theorem, then it is conjectured [10] that c can be made polynomial in c.This would imply a polynomial dependence of ε on ε in Theorem 1.1.

Obtaining a linear transformation
In this section we conclude the proof of the property testing result in Theorem 1.1.The last component of the proof is the following linear-algebraic fact that states that if the span of (a large subset of) {(x, f (x)) : x ∈ F n 2 } does not grow much, then f is approximately a linear transformation.
Lemma 5.1.For all ε > 0, for all sufficiently large n, all functions f : then there is an n × n matrix M such that Proof.We start by finding an affine transformation T x + u with the required property, then we observe how this implies the existence of a linear transformation.Let v 1 , v 2 , . . ., v a be a basis of span(A).By definition, every vector (x, f (x)) ∈ A is a linear combination of the v i , i. e., for every (x, f (x)) ∈ A there exists w ∈ F a 2 such that, in matrix form, Let us now add to our collection new vectors v a+1 , v a+2 , . . ., v k so that the projection onto the first n coordinates of span({v 1 , . . ., v k }) is all of F n 2 .In order to bound k, note that since A ⊆ {(x, f (x)) : x ∈ F n 2 }, the projection of A on the first n coordinates has size ≥ ε • 2 n .Iteratively adding vectors that double this projection, we see that we can choose k − a ≤ log(1/ε).Also note that a ≤ n + log(1/ε) by assumption, hence k ≤ n + 2 log(1/ε).Let V k be the resulting matrix of the vectors v 1 , . . ., v k .By performing Gaussian elimination on the columns, we can find an invertible transformation that brings V k into the following canonical form: , where I is the n × n identity matrix, T is also n × n, and U is n × (k − n).This is possible because the projection of the vectors v 1 , . . ., v k onto the first n coordinates spans F n 2 .In other words, there is an invertible k × k matrix L such that V k • L is in the canonical form.Since L is invertible, we still have the property that for every vector (x, f (x)) ∈ A there exists w ∈ F k 2 such that This means that the first n coordinates of w must equal x.Consequently, for every (x, f (x)) ∈ A it holds f (x) = T x +Uz, for some z of length k − n ≤ 2 log(1/ε).Therefore, by an averaging argument, there exists a fixed u = Uz so that Pr where we also use the fact that the projection of A on the first n coordinates has size ≥ ε • 2 n .This gives us an affine transformation, and in what follows we show how one can get a linear transformation, i. e., get rid of the 'u' above, with only a slight loss in probability.We claim that ( Such a claim lets us construct a linear transformation M by summing u to the i-th column of T (in other words, Mx = T x + x i • u), concluding the proof of the lemma.The different factor in the conclusion of the lemma generously accounts for the probability that x i = 1.It remains to prove (5.1).For this, let X be the uniform distribution over F n 2 and let Y be the distribution on F n 2 that is obtained by selecting a random index i ≤ n, setting to 1 the i-th bit, and choosing the other bits uniformly at random.We would like to argue that there is not much difference in working with X or Y .This is useful because if we work with Y we easily obtain (5.1) by an appropriate choice of the index i in the definition of Y .To formalize this, consider the statistical distance between X and Y , i. e., the maximum over all sets S of | Pr[X ∈ S] − Pr[Y ∈ S]|.(While we are interested in S := {x : f (x) = T x + u}, the following applies to any S.) Note that this distance is maximized by the set S of strings of weight at most n/2, which is the set of strings having larger probability according to X than Y .Also, by Stirling's approximation [6, Lemma 17.5.1],both Pr[X ∈ S] and Pr[Y ∈ S] are 1/2 + Θ(1/ √ n).Therefore for large n their distance is at most 0.01 • ε 3 , and in particular Pr which proves (5.1).
We can now paste everything together for a quick conclusion of the proof of Theorem 1.1 about testability of linear transformations.6 Optimality of the path length in Lemma 3.2

Proof of Theorem 1.1. The proof amounts to defining
In this section we discuss the optimality of the path length in the graph-theoretic Lemma 3.2 that is the core of the proof of the BSG theorem.Recall that the lemma establishes that every dense graph contains a large subset of nodes such that every two nodes in the subset are connected by many paths of length 4. It is natural to ask if the path length can be reduced from 4, and one can quickly see that it cannot be set to 3: the graph could be bipartite, for instance, and any set of at least 3 nodes would have two nodes on the same side which cannot be connected by any path of odd length 3. We now state and prove a result by Kostochka and Sudakov [14] that also rules out path length 2. Thus, path length 4 is optimal in Lemma 3.2.Theorem 6.1 (Kostochka-Sudakov).For all ε > 0 there exist arbitrarily large values of N such that there is a graph on N vertices with N 2 /4 edges such that in every set of ε • N nodes there are two nodes with less than ε • N common neighbors.
In fact, this is true for all sufficiently large values of N but we only prove it for certain powers of 2.

Proof of Theorem 6.1
Let n be a sufficiently large even integer, and let N := 2 n .Identify the set of N nodes with the binary strings of length n.Let ∆(u, v) denote the Hamming distance between nodes u and v, i. e., the number of positions i such that u i = v i , and connect two nodes u = v if and only if ∆(u, v) ≤ n/2.Each node has at least N/2 neighbors, and thus the graph has at least N(N/2)/2 = N 2 /4 edges.We now show that it also has the desired property.The main idea is that any set of ε • N nodes must contain two nodes at Hamming distance at least n − O( √ n), but two such nodes have less than ε • N common neighbors.We now present the formal proof, starting with the next claim that gives us two distant nodes.Claim 6.2.Let S be any set of ε • N nodes.Then S contains two nodes at Hamming distance at least n − c • √ n, where c is a constant that only depends on ε.
Let us give some details on how the claim is proved.(For a somewhat different argument, see [22].) The proof will use the following lemma from a set of notes by Barvinok.The history of this lemma goes back to Harper [13], and similar lemmas can be found elsewhere, see for example [15,Theorem 14.2.3].The following formulation is particularly useful to us because it is not limited to sets of measure 1/2.Lemma 6.3 (Corollary 4.4 in [3]).Let S ⊆ F n 2 be a non-empty set.Then, for any b > 0, we have Proof sketch of Claim 6.2.First, note that if ε > 1/2 then the claim is easily proved with maximal Hamming distance n, that is, we can find two nodes u and v such that ∆(u, v) = n.This is because we can pair off each node with its complement at distance n, and a set S of size |S| ≥ ε • N > N/2 must take both nodes from some pair.
To handle the case ε ≤ 1/2, we apply the argument to the set S of nodes u such that there exists v ∈ S at distance ∆(u, v) ≤ b • √ n.Specifically, by applying Lemma 6.3 with a large enough constant b depending only on ε, we obtain that this set S has size > 2 n /2.We can now apply the previous "ε > 1/2" argument to S , from which the claim follows with c = 2 • b.Now that we have these two nodes at distance n − c • √ n, we conclude the proof of Theorem 6.1 by showing that the number of their common neighbors is less than ε • N. Without loss of generality, let these two nodes, which we denote by u 1 and u 2 , be the all-zero vector and the vector that is 0 exactly in the first k := c • √ n coordinates, respectively.Let us now see what nodes are common neighbors of u 1 and u 2 .Let X ∈ F n 2 be a node, and let P = P(X) be its number of 1's in the first k coordinates, and Q = Q(X) its number of 1's in the other n − k coordinates.The node X is a common neighbor of u 1 and u 2 precisely when P + Q ≤ n/2 and P + (n − k − Q) ≤ n/2.By combining the inequalities, we obtain that i. e., for a given P, if X is a neighbor of both u 1 and u 2 then Q has to lie in a set of k − 2 , and so by a union bound Q falls in the set of O( √ k) integers with probability tending to 0, and this proves the theorem.More formally, let d = d(ε) be a sufficiently large constant to be determined later.Let us choose a random node X, and let P = P(X) and Q = Q(X) respectively denote the Hamming weight of its first k and last n − k bits.We have: To bound the term Pr we use a union bound (where actually a factor 2 could be saved noting that if P > k/2 then (6.1) does not hold) and the fact that Pr In this section we show a simpler proof of Theorem 1.1 in the case in which the agreement is large.This proof was communicated to us by Shachar Lovett; it uses the ideas in [1].
Theorem 7.1.For all γ ∈ [0, 1/8), all n, and all functions f : F n 2 → F n 2 , the following holds.If then there is an n × n matrix M such that Pr The constant 2 in Thm 7.1 can be somewhat reduced at the cost of a more complicated proof.
7.1 Proof of Theorem 7.1 Define g(x) to be the "majority vote," i. e., a value that maximizes Pr y [ f (x + y) + f (y) = g(x)].First, we claim that Pr To see this, note that Pr and consequently g(x) = f (x).
To complete the proof, it remains to see that g is linear.The first step is to show that the majority in the definition of g is always overwhelming.So there is a fixed z such that Pr y [ f (x + y) + f (y) = f (x + z) + f (z)] ≤ 2 • γ.Since γ < 1/4, g(x) = f (x + z) + f (z).
To see that g is linear, fix any x, x , choose y, y uniformly at random, and consider the following 5 equations (3 rows and 2 columns): By applying Claim 7.2 to the rows, and the theorem's hypothesis to the columns, and using a union bound, we see that with probability at least 1 − 3 • 2 • γ − 2 • γ = 1 − 8 • γ > 0 all the 5 equations hold, which means that g(x) + g(x ) = g(x + x ).Since x and x were arbitrary, g is linear and one can write g(x) = Mx for an n × n matrix M.

Proposition 2 . 1 .
Let f : D → S be a function, for finite sets D and S. If it holds that | f −1 (s)| ≥ t for every s ∈ S, then |S| ≤ |D|/t.
As is well known, by Stirling's approximation [6, Lemma 17.5.1]the probability that Q is equal to any particular integer is O • P + 1 integers.The intuition for the rest of the proof is as follows.A typical P is within O( √ k) of k/2, and by (6.1) such a P constricts Q to lie in a set of k − 2 • P + 1 = O( √ k) integers.