A Monotone Function Given By a Low-Depth Decision Tree That Is Not an Approximate Junta

We present a family a monotone functions fd :f0; 1g n !f0; 1g so that fd can be computed as a depth-d decision tree and so that fd disagrees with any k-junta on a constant fraction of inputs for any k = exp(o( p d)). This gives a negative answer to a problem


Introduction
In [3], O'Donnell and Servedio show that any monotone function given by a depth-d decision tree can be learned to constant accuracy from random samples in poly(n, 2 d ) time.The impact of this result is somewhat lessened by an apparent lack of interesting monotone functions given by low-depth decision trees.In particular, Elad Verbin as well as Rocco Servedio and Li-Yang Tan independently suggested in 2010 that all such functions may essentially depend on few variables [1, page 10].
Conjecture 1.1.For every ε > 0 and every monotone function f : {0, 1} n → {0, 1} given by a depth-d decision tree, there is a k-junta, g, for k = poly ε (d) so that f and g agree on all but an ε-fraction of inputs.
In this note, we disprove the above conjecture, and in particular provide an example of a monotone low-degree function that is not well approximated by any small junta.In particular we prove: Theorem 1.2.There exists a constant ε > 0 so that for every positive integer d, there exists a k = exp(Ω( √ d)) and a monotone function f : {0, 1} n → {0, 1} given by a depth-d decision tree, so that for every k-junta g, f and g disagree on at least an ε-fraction of inputs.
In fact it is known that the bound on k in Theorem 1.2 is tight up to the constant in the exponent.In particular, it is shown in [3] that any monotone function given by a depth-d decision tree has total influence We combine this with the main result of [2], which says that any Boolean function f can be ε-approximated by a k-junta for k = exp O(I( f )/ε) .Combining these results we find that: Observation 1.3.If f is a monotone function given by a depth-d decision tree, and if ε > 0, then there is a k-junta g that agrees with f on all but an ε fraction of inputs for k = exp O( √ d/ε) .
The function we construct to show Theorem 1.2 will combine ideas from two previous constructions, the monotone addressing function and Talagrand's function.
The monotone addressing function is defined by This is an example of a monotone function given by a depth-d decision tree that depends on exponentially many variables, and thus provides us with a good starting point.The monotone addressing function fails to provide a counter-example to Conjecture 1.1 though since it agrees with the majority function except on a set of measure O(1/ √ d).Given the bound on the total sensitivity of a low-depth monotone function, we know that any f satisfying the conditions of Theorem 1.2 must not only have near the maximum possible total influence for a low-depth monotone function, but also must not be approximable by a function with much lower total influence.Because of this restriction, our construction will look somewhat similar to a construction of Talagrand in [4].In particular, Talagrand constructs a monotone function f on {0, 1} d so that on a constant fraction of inputs, f has sensitivity (i.e., the number of coordinates such that changing the input at that coordinate would change the output of f ) Ω( √ d).Since, as is easily seen, the average sensitivity over all inputs is equal to the total influence, this is as large as possible.On the other hand, this condition tells us that f retains large average sensitivity even after ignoring any ε-fraction of inputs for sufficiently small constant ε.Talagrand's function fails to provide a counter-example to Conjecture 1.1 on its own, because it is already a d-junta.

The construction
In order to define the function f with the properties specified by Theorem 1.2, we first introduce some background notation.We let d,t and m be integers with t = Θ( √ d) and m = Θ(2 t ).We furthermore assume that 2 −t m is sufficiently small given the value of t/ √ d.We let S = (S 1 , . . ., S m ) be a random sequence of sets, where the S i are chosen independently and uniformly from the set of subsets of {1, 2, . . ., d − 1} of size exactly t.Given this S, we define the function T S on {0, 1} d−1 as follows: We will hereafter abbreviate T by suppressing the explicit dependence on S, and abbreviate (x 1 , . . ., x d−1 ) by x.
We finally define f as Again, we will often suppress the dependence of f on S. It is clear that f is monotone.Furthermore, f is given by a depth-d decision tree, since after fixing the values of the x i , the value of f depends on at most one more coordinate.In the next section, we show that f cannot be approximated by any k-junta for small k.
Note that Talagrand's function is given (for appropriately chosen S) by

Approximation bounds
Theorem 1.2 will follow from the following proposition: Proposition 3.1.There exists an ε > 0 so that for f S defined as above, with constant probability over the choice of S, f is not ε-approximated by any k-junta for k = o(2 t ).
A key step in our proof will be to show that with constant probability f actually depends on one of the y i .Proof.We will show the further claim that Since the term in the expectation is positive only if |T | = 1, this will complete our proof.We note that On the other hand, we have that To compute this conditional probability we let S j = {a 1 , . . ., a t } where the a i are picked randomly from {1, 2, . . ., d − 1} without replacement.We compute it as the product These probabilities are approximated by first fixing the values of S i and a 1 , . . ., a k−1 .After additionally fixing the value of a k , the probability in question becomes 1 if a k ∈ S i and 1/2 otherwise.Thus the probability that x a r = 1 is Hence the probability that j ∈ T S (x) given that i ∈ T S (x) is Therefore, we have that and hence As long as 2 −t m is bounded below by a constant and above by exp(−O(t 2 /d))/2, this is Ω(1).
We are now ready to prove Proposition 3.1.By Lemma 3.2, we note that with constant probability over S, that Pr x (|T (x)| = 1) = Ω(1).For such S, we claim that f has the desired property.In particular we claim the following: Lemma 3.3.If f is as above and g is a k-junta, then Proof.This follows from the simple observation that if, after fixing the value of x, we have that T = {i} where g does not depend on y i , then Pr y ( f (x, y) = g(x, y)) = 1/2.This is because after further conditioning on the values of all y j for j = i, g becomes a constant function (by assumption) and f takes the values 0 and 1 each with probability 1/2.Therefore we have that Pr( f (x, y) = g(x, y)) ≥ Pr(T (x) = {i} for some i, and g does not depend on y i ) 2