Hypercontractivity Via the Entropy Method

The Hypercontractive Inequality of Bonami (1968, 1970) and Gross (1975) is equivalent to the following statement: for every q > 2 and every function f : {−1,1}n→ R of Fourier degree at most m, ‖ f‖q ≤ (q−1)m/2‖ f‖2 . The original proof of this inequality is analytical. Friedgut and Rödl (2001) gave an alternative proof of the slightly weaker Hypercontractive Inequality ‖ f‖4 ≤ 28m/4‖ f‖2 by combining tools from information theory and combinatorics. Specifically, they recast the problem as a statement about multi-hypergraphs, generalized Shearer’s lemma, and used probabilistic arguments to obtain the inequality. We show that Shearer’s Lemma and elementary arguments about the entropy of random variables are sufficient to recover the optimal Hypercontractive Inequality for all even integers q. ACM Classification: G.2, G.3, F.1.3 AMS Classification: 68Q87


Introduction
The Hypercontractive Inequality plays a fundamental role in the analysis of Boolean functions.Discovered independently 1 by Aline Bonami [3,4] and several years later by Leonard Gross [14], the inequality is concerned with the noise operator T ρ that acts on functions f : where y is a ρ-correlated copy of x (i.e., y is drawn from the product distribution where E[y i x i ] = ρ for all i ∈ [n]).Intuitively, the noise operator smooths f by replacing each f (x) with the average of f 's values in a neighborhood around x; this effect is also evident when considering the Fourier expansion ρ |S| f (S)χ S , where χ S (x) = ∏ i∈S x i .
Here we see that each Fourier coefficient f (S) is attenuated by a factor ρ |S| that gets harsher the larger |S| is.The Hypercontractive Inequality quantifies this smoothing effect by showing that applying T ρ to f allows one to bound its 2-norm by a smaller norm: The Hypercontractive Inequality is equivalent (via duality, see, e. g., [22]) to the following inequality.
the projection of f to its degree-m part.Then for all q > 2, That is, the 2 → q norm of the projection operator P =m that maps f : {−1, 1} n → R to its degree-m part is at most (q − 1) m/2 .We denote this as P =m 2→q ≤ (q − 1) m/2 .First introduced into theoretical computer science by the celebrated work of Kahn, Kalai, and Linial [15], the Hypercontractive Inequality has seen utility in a surprisingly wide variety of areas, spanning distributed computing, random graphs, k-SAT, social choice, inapproximability, learning theory, metric spaces, statistical physics, convex relaxation hierarchies, etc. [2,6,22,8,9,10,11,5,18,17,21,13,19,1].In almost every one of these results there are no known alternate proofs that do not require the use of hypercontractivity.(See de Wolf's survey [23,Sec. 4] and O'Donnell's monograph [20,Ch. 9] for more details on the Hypercontractive Inequality and its applications.) The well-known analytic proof of the Hypercontractive Inequality proceeds by induction on n.The crux of the inductive step (sometimes referred to as the tensoring property of the Hypercontractive Inequality) is Minkowski's inequality, the triangle inequality for L p spaces; the base case is a twopoint inequality that reduces to standard analytic calculations but is nonetheless technically involved.Considering the ubiquity of the Hypercontractive Inequality in discrete settings, there has been significant interest in obtaining alternative proofs of the inequality.
Ehud Friedgut and Vojtech Rödl [12] obtained one such alternative proof by exploiting a novel connection between the Hypercontractive Inequality and Shannon entropy.Specifically, their main result is an information-theoretic/combinatorial proof of the inequality This is a slightly weaker version of the q = 4 special case of Theorem 1.1, but it is nonetheless qualitatively sufficient for many applications of the Hypercontractive Inequality, with a slight loss in the corresponding bounds.Friedgut and Rödl obtain this result by recasting the inequality in terms of multi-hypergraphs, invoking a generalization of Shearer's entropy lemma for such hypergraphs, and applying probabilistic arguments to complete the proof.
Friedgut and Rödl's result raises two fundamental questions: Is the combinatorial/information theoretic argument strong enough to recover the optimal value of P =m 2→q ?And, can this result be obtained directly by elementary information theoretic arguments?We give positive answers to both questions: we show that Friedgut and Rödl's argument can be simplified and sharpened by reasoning directly about the Fourier spectrum of f , without requiring the translation to hypergraphs or generalization of Shearer's lemma.With this direct approach, we obtain a simple proof of the optimal Hypercontractive Inequality P =m 2→2k ≤ (2k − 1) m/2 for all k ∈ N (i.e., Theorem 1.1 for all even integers q) using only elementary information-theoretic facts.
Let us briefly mention two other alternative proofs for special cases of the Hypercontractive Inequality.First, there is a short and elegant inductive proof of the q = 4 special case of Theorem 1.1 that requires only the Cauchy-Schwarz inequality.This proof first appeared in the literature in [19], although Bonami's original paper [4] contains a proof along similar lines.This proof, however, does not appear to generalize beyond the q = 4 special case.Second, in independent recent work, Kauers et al. noted that the two-point inequality in the standard analytic proof of the Hypercontractive Inequality becomes particularly simple when q is an even integer [16].However, this proof still uses the inductive step as a black-box (i.e., the fact that the Hypercontractive Inequality tensorizes, requiring Minkowski's inequality), and remains very much an analytic proof.

Basics of Shannon entropy
Let X be a (scalar or vector valued) random variable over the discrete sample space Ω and let p : Ω → [0, 1] be the probability mass function of X.The entropy of X is Here and throughout this note, logarithms are taken to base 2. The conditional entropy of X given Y is We use the following basic properties of entropy in our analysis.Lemma 2.1 (Universal upper bound).For any random variable X over the sample space Ω, The equality H(X) = log |Ω| holds if and only if X is uniformly distributed over Ω.
THEORY OF COMPUTING, Volume 9 (29), 2013, pp.889-896 Lemma 2.2 (Chain rule).The entropy of a sequence X 1 , . . ., X n of random variables satisfies Finally we recall Shearer's lemma, a generalization of the subadditivity of entropy.(For a simple proof, see [7].)For a sequence of random variables X = (X 1 , . . ., X n ) and a set S ⊆ [n], we write X S to denote the projection of X onto the coordinates in S, i. e., X S = (X j ) j∈S .
Lemma 2.3 (Shearer's Lemma).Let X ∈ Ω n be a vector of random variables and let S 1 , . . ., S m ⊆ [n] be a collection of sets that cover each element in [n] at least t times.Then H(X S j ) .

Hypercontractivity via the entropy method
In this section we prove Theorem 1.1 for all even integers q using the entropy method.
Theorem 3.1 (Special case of Theorem 1.1).For any f : {−1, 1} n → R and any even integer q > 2, Proof.By a limiting argument, it suffices to prove the inequality for f : {−1, 1} n → Q.By homogeneity, we may further assume that the Fourier coefficients of f are integral (i.e., f (S) ∈ Z for all S ⊆ [n]).
For each S ∈ [n]  m , let W S denote a set of | f (S)| elements which we call witnesses for S. The witness sets for any two distinct sets S = T are disjoint, and we write W = S W S to denote the disjoint union of all n m witness sets.We say that q witnesses in W are a legal q-tuple if their corresponding sets S 1 , . . ., S q ∈ [n]  m satisfy S 1 • • • S q = / 0. Let w = (w 1 , . . ., w q ) ∈ W q be a random variable drawn uniformly at random from the collection of ordered legal q-tuples in W q (where repetitions are allowed) and let S = (S 1 , . . ., S q ) ∈ [n]   m q be the sets corresponding to the q witnesses in w.By the chain rule, the joint entropy of w and S satisfies where the final equality uses the fact that q is even.
be a sequence of q 2 sets such that for every i < j ∈ [q], Recall that the symmetric difference of the sets in S is empty, and so every x ∈ [n] occurs in an even number of them.Therefore, if an element x satisfies {i ∈ [q] : x ∈ S i } = {i 1 , . . ., i k } where i 1 ≤ . . .≤ i k , then x is added to the sets Let T {i, * } := (T {i, j} ) j =i = (T {1,i} , T {2,i} , . . ., T {i−1,i} , T {i,i+1} , . . ., T {i,q} ) .
This construction guarantees that for every i ∈ [q], the sets in T {i, * } partition S i (since every x ∈ S i is in exactly one set T {i, j} for some j = i).In particular, S determines T and vice-versa so H(S) = H(T).
With a different application of the chain rule to the joint entropy of w and S, we obtain The chain rule also yields Given S i , the random variable w i is uniformly distributed among the witnesses for S i and, in particular, is independent of w j and S j for each j = i.So the conditional entropy of w given S satisfies The random variables T {1, * } , . . ., T {q, * } cover each set in {T {i, j} } i = j twice, so by Shearer's Lemma Since T {i, * } is a partition of S i , H(T {i, * } ) = H(S i , T {i, * } ).Also, there are at most (q − 1) m possible ordered partitions of the m elements of S i into q − 1 parts so the universal upper bound yields H(T {i, * } | S i ) ≤ m log(q − 1) .
We can now combine (2)-( 5) to obtain the upper bound H(w, S) ≤ qm 2 log(q − 1) + 1 2 Finally, we observe that H(S i ) + 2H(w i | S i ) = H(w i , w i , S i ) where w i , S i are as above, and w i is another witness of S i chosen uniformly at random from W S i .By the universal upper bound, Combining ( 1), (6), and ( 7) and rearranging completes the proof.