On Some Extensions of the Fkn Theorem

Let S = a 1 r 1 + a 2 r 2 + · · · + a n r n be a weighted Rademacher sum. Friedgut, Kalai, and Naor have shown that if Var(|S|) is much smaller than Var(S), then the sum is largely determined by one of the summands. We provide a simple and elementary proof of this result, strengthen it, and extend it in various ways to a more general setting.


Introduction
Consider a family of independent random variables (X i ) n i=1 .It is easy to prove that if the distribution of their sum is supported on a set of cardinality 2, then all the X i 's but one are constant almost surely.In our paper we investigate stability of this phenomenon.Namely, we prove that if the distribution of the sum is concentrated around a two-point set, then there exists k ∈ {1, 2, . . ., n} such that ∑ i:i =k X i is concentrated around some point.We provide various strict quantitative variants of this heuristic statement.One of them is the following theorem: i=1 be a sequence of independent square-integrable random variables.Assume that Var(X i ) ≤ τ • (E|X i − EX i |) 2 for 1 ≤ i ≤ n.Then for some k ∈ {1, 2, . . ., n} we have where K(τ) is a constant which depends only on τ.
Note that always Var(X i ) − (E|X i − EX i |) 2 = Var(|X i − EX i |) ≥ 0, with equality only when X i is either constant almost surely, or uniformly distributed on a two-point set.For such X i 's, the comparison of moments assumption is thus satisfied with τ = 1.In a sense, the closer τ is to one, the more ∑ i X i must resemble a (shifted) weighted Rademacher sum.
This result for weighted Rademacher sums (i.e., in the case when for each i the random variable X i is symmetric and takes values ±a i , where a 1 , a 2 , . . ., a n are some real numbers) was proved in [4] by E. Friedgut, G. Kalai, and A. Naor, and was a part of the proof of their theorem on Boolean functions on the discrete cube with Fourier coefficients concentrated at the first two levels.
They gave two proofs of the result concerning Rademacher sums.One was a direct application of a theorem of König et al. [7].The other used a more elementary approach (Chernoff's inequality), but contained an omission-it worked only under the additional assumption that we already know that Var(X k ) ≥ C Var(∑ i X i ) for some C close to 1.This minor gap is well known by now as well as some ways to fix it, for example by the use of the Berry-Esseen theorem-in fact, it has been fixed already by Kindler and Safra [6], whose proof also yielded better asymptotic estimates than [4].(Although [6] was not formally published, as far as we know, it was widely circulated; the proof appeared also in Kindler's Ph.D. thesis [5].)The FKN theorem is a direct application of the above variance bound for Rademacher sums.It was originally devised for applications in discrete combinatorics and social science, but turned out to be useful also in theoretical computer science.In particular, the theorem is used in analyzing the Long Code Test in the celebrated expander proof of the PCP theorem by Irit Dinur ( [3]).With that in mind, we hope that our easy, self-contained proof will simplify understanding of the PCP theorem's background.Hence we set out to give an elementary proof of Theorem 1.1 for weighted Rademacher sums which does not refer to intricate results such as the Berry-Esseen inequality, [7], [2] or [1] (a proof based on the Bonami-Beckner hypercontractive bounds was also known).
We think it is also interesting (although not very surprising) that the inequality still holds if we replace the Rademacher variables by variables satisfying a moment comparison condition.Note that, in contrast to the weighted Rademacher setting described above, in our results the sums do not need to be linear combinations of an i. i. d. sequence (however, in the discrete cube setting we actually prove a stronger, THEORY OF COMPUTING, Volume 11 (18), 2015, pp.445-469 and essentially optimal, FKN type estimate in Theorem 5.3 by the use of yet another method; Ryan O'Donnell obtained the same bound independently by a slightly different approach-see [11,Theorem 5.33]).
We also provide the following analogous result for symmetric random variables with no additional assumption about moment comparison.
i=1 be a sequence of independent symmetric square-integrable random variables.Then for some k ∈ {1, 2, . . ., n} we have where C is a universal constant.The result holds true with C = (7 + √ 17)/2.
For the sake of clarity, we start by showing in Section 2 that if a sum of independent random vectors is concentrated around a two-point set, then by removing just one term we may make the sum of remaining vectors concentrate around a single point.Then, in Section 3, we demonstrate how to use this observation in the real-valued case.However, our results and methods can be quite easily adapted to a Banach space setting, with concentration around a finite set of points, which leads to some nice geometric considerations.We present them in Section 6.Only very basic knowledge of Banach space theory is needed, which can be found, e. g., in [15].
Since we tried to make our proofs as transparent and "low-tech" as possible, in many cases our estimates can be easily improved upon some natural optimization.
Readers interested only in the Rademacher case may find it useful to restrict their attention to Section 4, in which Theorem 1.3 is proved, and first two subsections of Section 5, in which the strengthening of the FKN Theorem is described.

Splitting of the sum
We begin by analyzing the concentration in terms of probability rather than variance.In what follows, we denote by µ Z the distribution of a random variable Z. Readers not comfortable with the Banach space formulation may simply replace V by R, and Lemma 2.1.Let X, Y be independent random variables with values in a real separable Banach space V .Assume that for δ ≥ 0 and a, b ∈ V we have where 0 ≤ ε < b − a /6.Then there exists some vector c ∈ V such that Volume 11 (18), 2015, pp.445-469 for y ∈ V .Then from (2.1) and independence of variables by the Fubini theorem we get so in particular µ X (A y ) ≥ 1 − δ for some y ∈ V , which means Similarly we prove that Let B(x, r) denote the closed ball with center x and radius r.If α, γ, β , η > √ δ , then αγ, β η > δ , and (2.1) would imply that Without loss of generality we may therefore assume that α Assume that for δ ≥ 0 and some vectors a, b ∈ V we have where ε < b − a /6.Then there exist k ∈ {1, . . ., n} and c ∈ V such that Additionally, if δ < 1/9, then Proof.Let I be a minimal (in the sense of inclusion) subset of {1, . . ., n} such that (if there is no such I, there is nothing to prove).Of course I = / 0. Let k ∈ I.We have S = ∑ i∈I X i + ∑ i / ∈I X i , and the two sums are obviously independent, so by our assumption about I, (2.2) and Lemma 2.1, for some c 1 ∈ V we get But I was minimal, so for some (2.4) 3), (2.4) and the triangle inequality yield The second assertion of the lemma follows easily upon recalling that S k and X k are independent and S = S k + X k : For δ ≥ 1/9 we have 1 − 2 √ δ − 2δ ≤ δ , which makes the arising probability bound trivial.
Remark 2.3.Both bounds are of optimal order for δ → 0 + even for V = R, as indicated by the following example.Fix a = 0, b = 1, ε = 1/7 and some n ≥ 2. Let X i ∼ Pois( √ δ /n), so that S ∼ Pois( √ δ ) and Hence the assumptions of Lemma 2.2 are satisfied.Since for every k ≤ n there is /n as δ → 0 + which shows that the O( √ δ ) bound cannot be improved.Also, for every k ≤ n we have 2 as δ → 0 + .Hence, for δ small enough, for every set A which is a union of two intervals of length 6/7 each, there is which proves the optimality of the O(δ ) bound.

Proof of Theorem 1.1 (comparison of moments assumption)
We will now show how to use the facts from the previous paragraph to give a proof of Theorem 1.1.Concentration bounds in terms of probability will be translated into statements about variances by the use of a Paley-Zygmund type inequality.We need a few simple and standard lemmas.Lemma 3.1 (Khinchine inequality).Let r 1 , r 2 , . . .be independent symmetric ±1 random variables.There exists a universal constant κ such that for every a 1 , a 2 , . . ., a m ∈ R there is Proof.The estimate with the optimal constant κ = √ 2 was proved by Szarek, [16] (see [8] for a simpler proof).For the reader's convenience we provide a well-known simple argument which yields κ , by Hölder's inequality we get . .,Y m be independent symmetric integrable random variables.Then , where κ is the universal constant from Lemma 3.1.
Proof.Let (r i ) m i=1 be a sequence of independent symmetric ±1 random variables, independent of , where we have used Lemma 3.1.
The following result may be traced back to the work of Marcinkiewicz and Zygmund [9] (see Théorème 2 therein).
, where κ is the universal constant from Lemma 3.1.
Proof.Indeed, by Lemma 3.2 we have where we have used Jensen's inequality for the convex function y → y l m 2 .
Proof of Theorem 1.1.Obviously, by considering x + X 1 instead of X 1 we may reduce our task to proving that min Let (X i ) n i=1 be an independent copy of (X i ) n i=1 .Let S = ∑ n i=1 X i and S = ∑ n i=1 X i , so that S is an independent copy of S. Note that random variables Y i = X i − X i (i ≤ n) are independent and symmetric.By Jensen's inequality, , where κ is the universal constant from Lemma 3.1.Let δ = κ −4 τ −2 /324, a = E|S|, and b = −a = −E|S|.We consider two cases: Case 1. Assume ε < |a − b|/6.By Chebyshev's inequality, where Obviously, Note that √ δ .Now we may apply a Paley-Zygmund type estimate: because κ, τ ≥ 1.We have proved the theorem with K(τ) = (6κ) 6 τ 3 .

Proof of Theorem 1.3 (symmetric variant)
Now we will prove an analogue of Theorem 1.1 for real symmetric random variables (but with no constrains on moments).It is possible to do it in a similar way as in the proof of Theorem 1.1, i. e., by using Lemma 2.2, but to get better estimates we will adopt another, more direct approach.We will need a lemma.
THEORY OF COMPUTING, Volume 11 (18), 2015, pp.445-469 Hence We have used the fact that for any square-integrable random variables V and W there is By taking expectations of both sides we arrive at where we have bounded the L 1 norm by the L 2 norm and used the independence of |X| and |Y |.Therefore • Var(|S|) .
Thus we have proved that min k≤n Var ∑ i≤n:i =k Remark 4.3.A simple example of n = 3 and X 1 , X 2 , X 3 i.i. d. symmetric ±1 random variables indicates that the constant C in Theorem 1.3 cannot be less than 8/3 ≈ 2.67 (it suffices to check it for x = 0).

Harmonic analysis on product spaces
Below, we introduce assumptions and notation which will be used throughout Section 5.This is a natural and convenient setting for harmonic analysis on product spaces, e. g., on the discrete cube with the uniform measure, in which case (π i ) n i=1 is the standard Rademacher system.This language will allow us to state FKN type results on product spaces other than the discrete cube.

Assumptions and notation (A & N)
Let ξ 1 , ξ 2 , . . ., ξ n be independent random variables satisfying Eξ i = 0 and Eξ 2 i = 1 for all i ≤ n.We consider a Hilbert space is the joint distribution of the random vector ξ = (ξ 1 , ξ 2 , . . ., ξ n ).It will be convenient to set ξ 0 ≡ 1, so that (ξ i ) n i=0 is an orthonormal system in L 2 .Let A be the linear (finite-dimensional and thus closed) subspace of L 2 consisting of all affine real-valued functions on R n .We define coordinate projection functions π i : R n −→ R by π i (x) = x i for 1 ≤ i ≤ n, and π 0 ≡ 1.Let A π = {π 0 , −π 0 , π 1 , −π 1 , . . ., π n , −π n }.For a Boolean Borel (i.e., {−1, 1}valued and such that the preimage of {1} is a Borel set) function f on R n , by f A : R n −→ R we will denote its orthogonal projection in L 2 onto the subspace A: where a i = f , π i L 2 .We may and will use the same notation for a Borel Boolean function f defined only on the support of µ, since obviously it may be extended to a Borel Boolean function F on the whole R n , and F A does not depend on the choice of the extension.Let us define the sign function in a slightly non-standard way as 1 [0,∞) − 1 (−∞,0) , to make the function Boolean (setting sign(0) = −1 would work as well).Let

Symmetric case
We will start with a theorem which recovers and extends the main result of [4] with a quite good, explicit constant.
Proof.Since | f | ≡ 1, the triangle inequality yields a pointwise bound given by X i = a i ξ i for 1 ≤ i ≤ n, and with X 0 being symmetric ±a 0 random variable.The sum | ∑ n i=0 X i | has the same distribution (and thus the same variance) as |S|, so by using Theorem 1.3 Since by orthogonality we have this ends the proof of the first assertion.
The inequality ρ ≤ d follows immediately from A π ⊆ A. Observe that The remaining assertion also follows easily because the first assertion implies |a k | ≥ 1 − O(ρ 2 ), so that Corollary 5.2.Under assumptions of Theorem 5.1 (A & N, and ξ 1 , ξ 2 , . . ., ξ n are symmetric) there is some k ∈ {0, 1, . . ., n} such that Note that for any s ∈ {−1, 1} and u ∈ R there is |s − u| ≥ |sign(u) − u| (and |s The assertion follows by the triangle inequality.Now let us see how to strengthen the result of Friedgut, Kalai and Naor.For a function f defined on the discrete cube {−1, 1} n we consider its standard Walsh-Fourier expansion ∑ A f (A)w A , where w A (x) = ∏ i∈A x i .
Theorem 5.3.There exists a universal constant L > 0 with the following property.For f Then there exists some B ⊆ THEORY OF COMPUTING, Volume 11 (18), 2015, pp.445-469 Proof.Let ξ 1 , ξ 2 , . . ., ξ n be independent symmetric ±1 random variables, so that the definition of ρ is consistent with the one from A & N. We also have , and a 0 = f ( / 0).Let us put Therefore The inequality , [1]).Now we have We finish the proof by observing that (5.1) and (5.2) yield uniformly, as ρ → 0 + .We have used the d = O(ρ) bound of Theorem 5.1.

Absolute first moment assumption
Without the symmetry assumption we cannot hope to get the same assertion as in Theorem 5.1-if n = 1, P(ξ 1 = −2) = 1/5, P(ξ 1 = 1/2) = 4/5 and f (x) = (3 + 4x)/5, then f ∈ A \ A π even though f is Boolean.However, under an additional moment assumption, we still may prove that any Boolean function which is close in L 2 to A must be also at a comparable L 2 -distance from an affine function of a single coordinate.Theorem 5.5.There exists a function C : (0, ∞) −→ (0, ∞) with the following property.
Proof.As in the proof of Theorem 5.1 we observe first that Var(|S|) ≤ ρ 2 , where S = a 0 + ∑ n i=1 a i ξ i .By Theorem 1.1 there is some k ∈ which proves the first assertion with C(η) = 1 + K(η −2 ).The second assertion follows from the first one in the same way in which Corollary 5.2 follows from Theorem 5.1.

General case
We will need two auxiliary lemmas.
Proof.Let (X ,Y ) be an independent copy of the pair (X,Y ).For a square-integrable random variable .
By the assumptions of the lemma we have dist(X +Y, {−1, 1}) 2 ≤ ρ.Since X +Y and X +Y have the same distribution, we obtain dist(X +Y, In a similar way we prove that The pointwise bound Hence by (5.3), (5.4), and the triangle inequality we obtain dist φ so that α > 3ρ.In a similar way from (5.4) we get β > 3ρ, and therefore αβ > 9ρ 2 , which contradicts (5.5).
Lemma 5.7.Let X 1 , X 2 , . . ., X n be independent square-integrable random variables and let S = ∑ n i=1 X i .Assume E(|S| − 1) 2 ≤ ρ 2 for some ρ ∈ [0, 1].Then there exists some k ∈ [n] such that Var(S − X k ) ≤ 50ρ.THEORY OF COMPUTING, Volume 11 (18), 2015, pp.445-469 Proof.Lemma 5.7 follows from Lemma 5.6 in a way similar to that in which we have deduced Theorem 1.3 from Lemma 4.1: we look for a minimal I ⊆ [n] = {1, 2, . . ., n} such that Var (∑ i∈I X i ) > 25ρ, then we choose any k ∈ I and from Lemma 5.6 we infer that Var ∑ i∈[n]\I X i ≤ 25ρ.By the minimality of I there is also Var ∑ i∈I\{k} X i ≤ 25ρ, which ends the proof.Now we may finally state a result which does not use any additional properties of the marginal distributions.
Theorem 5.8.Under A & N, there exists some k ∈ is a Boolean function on {− α/β , β /α} 2 equipped with the measure µ ξ 1 ⊗ µ ξ 2 .A simple analysis shows that ρ = dist L 2 ( f , A) = 2αβ while the L 2 -distance from f to any function of a single coordinate is not less that 2β 3/2 α 1/2 = Θ(ρ 1/2 ) as α → 0 + (and, consequently, β → 1 − and ρ → 0 + ).Thus the O( √ ρ) general bound of Theorem 5.8 cannot be improved even on two-dimensional biased discrete cube.On the other hand, some FKN-type bounds were obtained in [6] and [5] for the biased discrete cube of arbitrary dimension, in terms of the bias parameter.The approach used in the proof of Theorem 5.3 can be effectively adapted to the case of the biased discrete cube if the Bonami-Beckner estimate is replaced by the hypercontractive bounds of [12], see [10].

Boolean versus bounded
It is natural to look for an analogue of the FKN theorem for [−1, 1]-valued functions; however, there is no hope for estimates as good as in the Boolean case.Recall that the FKN theorem states that a Boolean function on the discrete cube (equipped with the uniform measure) which is close in L 2 to A must be be also at a comparable L 2 -distance from a function which is both Boolean and affine (i.e., some function from A π ).
Let ψ(t) = 1 for t > 1, ψ(t) = −1 for t < −1, and ψ(t) = t for t ∈ [−1, 1], and let us define functions f , g : ) for some positive parameter s.Note that as s → ∞, where G denotes the standard N(0, 1) Gaussian variable, while we have only The above example, demonstrating the gap between O(e −s 2 /4 ) and Θ(s −1 ), is as bad as it gets-in [10] Nayar proved that for s > 0 and every function

Banach space setting
The main result of this section, Theorem 6.10, is an advanced extension of Lemma 2.2.To prove it, we will need to develop some new tools.
In what follows, (V, • ) denotes a separable real Banach space, its continuous dual space (of bounded linear functionals on V ) is denoted by V , and + stands for Minkowski addition.By dist we will mean the distance in the norm • .Readers unfamiliar with Banach spaces may find it convenient to assume additionally that V is a finite-dimensional normed vector space and think about Euclidean spaces instead of Hilbert spaces.
For a finite A ⊂ V , positive ∆, ε, δ and a V -valued random vector X we will say that: • A is ∆-separated if either |A| ≥ 2 and for any two distinct x, y ∈ A there is x−y ≥ ∆, or |A| = 1; we define the separation constant of A by ∆(A) = min{ x − y ; x, y ∈ A and x = y}.Clearly, We need some simple lemmas.The first of them is obvious.
Lemma 6.1.Let X be a V -valued random vector which is (ε, δ )-close to some nonempty finite A ⊂ V with some ε > 0 and δ ∈ [0, 1].Then there is some a ∈ A such that P( X − a ≤ ε) ≥ (1 − δ )/|A|.Lemma 6.2.Let A, B ⊂ V be finite and ∆-separated for some ∆ > 0 and assume that X is a V -valued random vector which is both (ε 1 , δ )-close to A and (ε 2 , δ )-present around B for some δ > 0 and ε 1 , For a similar reason (B is also ∆-separated), the mapping b → a b is injective.This ends the proof.Lemma 6.3.Let X 1 , X 2 , . . ., X m be independent V -valued random vectors and let S = ∑ m i=1 X i .Assume that S is (ε, δ )-close to a nonempty finite A ⊂ V for some ε, δ > 0. Then there exist vectors v Proof.This follows from the Fubini theorem (see the beginning of the proof of Lemma 2.1).Therefore, Var(Y ) ≤ 800ρ 2 σ −2 , and the proof is finished.
Proof.Lemma 7.2 can be deduced from Lemma 7.1 in the same way in which Lemma 5.7 follows from Lemma 5.6.Proof.Just as in the proof of Theorem 5.8, we arrive at ≤ ρ 2 + 1600ρ 2 /Var(S) ≤ 1601ρ 2 /Var(S) , where we have used Lemma 7.2 and the fact that Var(S) = ∑ n i=1 a 2 i ≤ 1.The second assertion of the corollary follows from the first one in the same way in which Corollary 5.2 follows from Theorem 5.1.
The above corollary should be compared to the main result of Rubinstein's MSc thesis, [13,Corollary 10].Note that the setting introduced in Subsection 5.1 is slightly incompatible with Rubinstein's "pair-wise disjoint subsets of the inputs"-one would need to consider the product measure µ on a more abstract probability space than just R n to recover [13,Corollary 10]

Lemma 6 . 4 .
Let ∆ > 0 and let A and B be finite ∆-separated subsets of V with |A|, |B| ≥ 2. Then there exists some C ⊆ A + B = {a + b; a ∈ A, b ∈ B} with |C| > max(|A|, |B|) which is also ∆-separated.Proof.Without loss of generality we may assume that |A| ≥ |B|.Let b and b be arbitrary distinct elements of B. Let ϕ ∈ V be such that ϕ V = 1 and ϕ(b − b) = b − b (the existence of such a functional is guaranteed by the Hahn-Banach theorem).Finally, let â = arg max a∈A ϕ(a) (any maximizer will do if there is more than one).Then C = (A + b) ∪ { â + b } has more elements than A and is ∆-separated.These two facts follow since, for a ∈ A,( â + b ) − (a + b) ≥ ϕ( â + b − a − b) = ϕ( â) − ϕ(a) + ϕ(b − b) ≥ b − b ≥ ∆ .By an obvious induction we obtain the following corollary.Corollary 6.5.Let ∆ > 0 and let A 1 , . . ., A m be ∆-separated finite subsets of V with|A i | ≥ 2 for i ∈ [m].Then there exists some C ⊆ A 1 + A 2 + • • • + A m with |C| > m which is also ∆-separated.The next corollary easily follows (we leave it as an exercise).

Corollary 7 . 3 .
Now we are in position to state the following refinement of Theorem 5.8-note that in the case Var( f ) ≤ ρ the bound of Theorem 5.8 is trivial since alwaysf − a 0 − a k π k L 2 ≤ f − a 0 L 2 = Var( f ) ,while for Var( f ) ≥ ρ we have ρ/ Var( f ) ≤ √ ρ.Under A & N (see Subsection 5.1), there is some k ∈ [n] such that f − (a 0 + a k π k ) L 2 ≤ 41ρ/ Var( f ) , and f − sign(a 0 + a k π k ) L 2 ≤ 82ρ/ Var( f ) .
as a special case of Corollary 7.3.Still, [13, Corollary 10] may be easily deduced from Lemma 7.2 by an obvious modification of the proof of Corollary 7.3, in which a i π i 's should be replaced by Rubinstein's restrictions f i 's and a 0 turned into f ( / 0).