www.theoryofcomputing.org Distribution-Free Testing for Monomials with a Sublinear Number of Queries

We consider the problem of distribution-free testing of the class of monotone monomials and the class of monomials over n variables. While there are very efficient testers for a variety of classes of functions when the underlying distribution is uniform, designing distribution-free testers (which must work under an arbitrary and unknown distribution) tends to be more challenging. When the underlying distribution is uniform, Parnas et al. (SIAM J. Discr. Math., 2002) give a tester for (monotone) monomials whose query complexity does not depend on n, and whose dependence on the distance parameter is (inverse) linear. In contrast, Glasner and Servedio (Theory of Computing, 2009) prove that every distribution-free tester for monotone monomials as well as for general monomials must have query complexity e W(n 1=5 ) (for a constant distance parameter e). In this paper we present distribution-free testers for these classes with query complexity e O(n 1=2 =e). We note that in contrast to previous results for distribution-free testing, our testers do not build on the testers that work under the uniform distribution. Rather, we define and exploit certain structural properties of monomials (and functions that differ from them on a non-negligible part of the input space), which were not used in previous work on property testing.


Introduction
Testers (for properties of functions) are algorithms that separate the functions with a prespecified property from the functions that are "far" from having the property with respect to some fixed distance measure.In most works on property testing, distance is measured with respect to the uniform distribution over the function domain.While in many contexts this distance is appropriate, as it corresponds to assigning equal "importance" or weight to each point in the domain, there are scenarios in which we may want to deal with an underlying weight distribution that is not uniform, and furthermore, is not known to the tester.We refer to the latter model as distribution-free property testing, while testing under the uniform distribution is considered to be the standard model.In the standard model the tester is given query access to the tested function; in the distribution-free model the tester is also given access to sample inputs distributed according to the unknown underlying distribution.By the complexity of the tester we mean its query complexity.(In the complexity of a tester we count both the queries on the points sampled according to the underlying distribution and the queries on points selected by the tester.)Indeed, the notion of distribution-free testing is inspired by the distribution-free (Probably Approximately Correct (PAC)) learning model [19] and understanding the relation between testing and learning is one of the issues of interest in the study of property testing.As observed in [9], the complexity of testing a function class F (that is, testing the property of membership in F) is not higher than (proper) learning the class F (under the same conditions, e. g., with respect to the uniform distribution or distribution-free).In view of this, a natural question is for what classes of functions is the complexity of testing strictly lower than that of learning.Note that, as opposed to learning, if we have a tester for (membership in) a class of functions F, this does not imply that we have a tester (with similar complexity) for all subclasses F of F.
There is quite a large variety of function classes for which the complexity of testing is strictly lower than that of learning when the underlying distribution is uniform (e. g., linear functions [1], low-degree polynomials [17], singletons, monomials [16] and small monotone DNF [16], monotone functions (e. g., [5,4]), small juntas [6], small decision lists, decision trees and (general) DNF [3] linear threshold functions [14], and more).In contrast, there are relatively few such positive results for distribution-free testing [10,11,12], and, in general, designing distribution-free testers tends to be more challenging.
One of the main positive results for distribution-free testing [10] is that every function class that has a standard tester and can be efficiently self-corrected [1], has a distribution-free tester whose complexity is similar to that of the standard tester.In particular this implies that there are efficient distribution-free testers for linear functions and more generally, for low-degree polynomials [10].However, there are function classes of interest (in particular from the point of view of learning theory), which have efficient standard testers, but for which self-correctors do not exist (or are not known to exist).Several such classes (of Boolean functions over {0, 1} n ) were studied by Glasner and Servedio [7].Specifically, they consider monotone monomials, general monomials, decisions lists, and linear threshold functions.They prove that for these classes, in contrast to standard testing, where the query complexity does not depend on the number of variables n, every distribution-free tester must make Ω((n/ log n) 1/5 ) queries (for a constant distance parameter ε).While these negative results establish that a strong dependence on n is unavoidable for these functions classes in the distribution-free case, it still leaves open the question of whether some sublinear dependence on n is possible (while distribution-free learning (with queries) requires at least linear query complexity [18]).

Our results
In this work we prove that both for monotone monomials and for general monomials, a sublinear dependence on n can be achieved for distribution-free testing.Specifically, we describe distribution-free testers for these families whose query complexity is O( √ n log n/ε).Thus we advance our knowledge concerning efficient distribution-free testing for two basic function classes.Furthermore, while previous distribution-free testers have been based on, and are similar to the corresponding standard testers, this is not the case for our testers.Rather, we define and exploit certain structural properties of monomials (and functions that differ from them in a non-negligible manner), which were not used in previous work on property testing in the standard model.In what follows we give some intuition concerning the difficulty encountered when trying to extend standard testing of (monotone) monomials to distribution-free testing and then shortly discuss the ideas behind our testers.

Standard vs. distribution-free testing of monomials
The first simple observation concerning testing monomials under the uniform distribution is the following.If f is a k-monomial (that is, a conjunction of k literals), then Pr[ f (x) = 1] = 2 −k (where the probability is over a uniformly selected x).This implies that we can effectively consider only relatively small monomials, that is, k-monomials for k = log(O(1/ε)), and it allows the tester to have an exponential dependence on k (since this translates to a linear dependence on 1/ε).This is not in general the case when the underlying distribution is arbitrary.In particular, the functions considered in the lower bound proof of [7] (some of which are monomials, and some of which are far from being monomials), depend on Ω(n) variables.Thus, for these functions, considering uniformly selected points, essentially gives no information (since the function assigns value 0 to all but a tiny fraction of the points).Furthermore, the support of the distribution D defined in [7] is such that the following holds.If one takes a sample (distributed according to D) of size smaller than the square-root of the support size of D (where there are roughly n 2/5 points in the support), and performs queries on the sampled points, then it is not possible to distinguish between the monomials and the functions that are far from being monomials (with respect to D).Thus, by sampling according to D, we essentially get no information unless the size of the sample is above a (fairly high) threshold.On the other hand, if we perform queries outside the support of D, then intuitively (and this is formalized in [7]), violations (with respect to being a monomial) are hard to find.
Before continuing with a high level description of our testers, we note that if we restrict the task of testing to distribution-free testing of (monotone) k-monomials, where k is fixed, then there is a tester whose query complexity grows exponentially with k.This follows by combining two results: (1) The aforementioned result of Halevy and Kushilevitz [10] concerning the use of "self-correction" in transforming standard testers to distribution-free testers; (2) The result of Parnas et al. [16] for testing (monotone) monomials, which has a self-corrector (with complexity 2 k ) as a building block.This implies that for small k (i.e., k ≤ log n − ω(1)), we have a tester with complexity that is sublinear in n.Hence, the question is what can be done when it is not assumed that k is small.

Our testers: Ideas and techniques
In what follows we discuss the tester for monotone monomials over {0, 1} n .The tester for general monomials has the same high-level structure, and can be viewed as a generalization of the tester for monotone monomials.
Our tester tries to find evidence that the tested function f is not a monotone monomial (where if f is a monotone monomial then clearly the tester will not find such evidence).The tester looks for evidence in the form of a small subset of points {y 0 , y1 , . . ., y t }, each in {0, 1} n , where f (y 0 ) = 0 and f (y i ) = 1 for each 1 ≤ i ≤ t, such that no monotone monomial agrees with f on this subset.
Based on these subsets we introduce a notion that is central to our tester and its analysis: the violation hypergraph of f , which we denote by H f .The vertex-set of H f is {0, 1} n , and its edge-set corresponds to subsets of the form described above.By the definition of H f , if f is a monotone monomial, then H f has no edges.On the other hand, we prove that if f is far from being a monotone monomial (with respect to the underlying distribution, D), then every vertex cover of the (edges of) H f must have relatively large weight (with respect to D).We use this fact to show that if f is far from being a monotone monomial, then our tester finds a (small) edge in H f (with high probability).We next give a high level description of the tester.
Our tester works in two stages, where in each stage it takes a sample of size O( √ n/ε), distributed according to D. In the first stage it only considers the points y in the sample such that y ∈ f −1 (0).If f is a monotone monomial, then for each such point y there must be at least one index j such that y j = 0 and x j is a variable in the monomial.Hence, for each point y ∈ f −1 (0) that is selected in the first sample, the tester searches for such a representative index j (as explained in detail in Subsection 3.2).If any search fails, then the tester rejects, since it has evidence that f is not a monotone monomial.This evidence is in the form of an edge (of size 3) in H f .Otherwise, the tester has a set J of indices.
In the second stage, the tester considers only the sample points y ∈ f −1 (1).For each such sample point y it checks whether there exists an index j ∈ J such that y j = 0.If such an index exists, then an edge in H f is found and the tester rejects.The crux of the proof is showing that if the probability that the tester does not find evidence (in both stages) is small, then it is possible to construct a small-weight vertex cover in H f (implying that f is close to being a monotone monomial).
Other related work In addition to the results mentioned previously, Halevy and Kushilevitz [10] study distribution-free testing of monotonicity for functions f : Σ n → R (where Σ and R are fully ordered).Building on the (one-dimensional) standard tester in [5] they give a distribution-free tester whose query complexity is O((2 log |Σ|) n /ε).Thus, the dependence on the dimension, n is exponential, in contrast to some of the standard testers for monotonicity [8,4] where the dependence on n is linear. 1 Halevy and Kushilevitz [10] prove that the exponential dependence on n is unavoidable for distribution-free testing even in the case of Boolean functions over the Boolean hypercube (that is, |Σ| = |R| = 2).In [12], Halevy and Kushilevitz further the study of testing monotonicity over graph products.
Halevy and Kushilevitz [11] also study distribution-free testing of graph properties in sparse graphs, and give a distribution-free tester for connectivity, with similar complexity to the standard tester for this property.
We note that for some properties that have efficient standard testers, the testers can be extended to work under more general families of distributions such as product distributions (e. g., [6,3]).In recent work, Kopparty and Saraf [13] consider tolerant testing [15] of linearity under non-uniform distributions (that have certain properties).
Organization We start by introducing some notation and definitions in Section 2. In Section 3 we describe and analyze the distribution-free tester for monotone monomials, and in Section 4 we explain how to extend it to general monomials.
Further research Perhaps the first question that comes to mind is what is the exact complexity of distribution-free testing of (monotone) monomials given the gap between our upper bound and the lower bound of [7].It will also be interesting to design sublinear testers for the other function classes studied in [7].Another direction is to study testing of monomials and other basic function classes under known distributions (other than the uniform distribution).

Preliminaries
For an integer k we let [k] def = {1, . . ., k}.In all that follows we consider Boolean functions f whose domain is {0, 1} n .and") of a subset of the literals {x 1 , x1 , . . ., x n , xn }.It is a monotone monomial if it is a conjunction only of variables (and no negations of variables).We denote the class of monomials by MON and the class of monotone monomials by MON M .

Definition 2.1 (Monomials). A function
We note that we allow the special case that the subset of literals (variables) is empty, in which case f is the all-1 function.In Subsections 3.4 and 4.4 we discuss how to augment our tests so that they work for the case that the subset of literals (variables) must be non-empty.Definition 2.2 (Distance).Let D be a distribution over {0, 1} n .For two functions f , g : {0, denote the distance between f and g with respect to D. For a function f : {0, 1} n → {0, 1} and a set of Boolean functions denote the distance between f and the set F n .
Note that the distance between f and F n with respect to D can be 0 while f / ∈ F n .
Definition 2.3 (Distribution-Free Testing).Let F = {F n } be a class of Boolean functions, where each F n is a set of Boolean functions over {0, 1} n .A distribution-free tester for (membership in) F is given the value of n, access to examples that are distributed according to an unknown distribution D over {0, 1} n , and query access to an unknown function f : {0, 1} n → {0, 1}.The tester is also given a distance parameter 0 < ε < 1, and is required to behave as follows.
• If f ∈ F n , then the tester should accept with probability at least 2/3.
If the tester accepts every f ∈ F n with probability 1, then it is a one-sided error tester.
In all that follows f always denotes the (unknown) tested function, and D denotes the (unknown) underlying distribution with respect to which the tester should work.For a point y ∈ {0, 1} n let D(y) denote the probability assigned to y by D, and for a subset S ⊆ {0, 1} n let D(S) = ∑ y∈S D(y) denote the weight that D assigns to the subset S. For the sake of simplicity, with a slight abuse of notation, we may write As noted in the introduction, by the complexity of a tester we mean its query complexity (which includes both queries on examples that are generated according to the unknown underlying distribution D as well as queries on points selected by the tester).
We assume without loss of generality that ε ≥ 2 −n since otherwise, by performing a number of queries that is linear in 1/ε (that is, querying f on all domain elements), it is possible to determine whether or not f is a monotone monomial.

Distribution-free testing of monotone monomials
We start by introducing the notion of a violation hypergraph of a function and establishing its relation to (the distance to) monotone monomials.

The violation hypergraph
Before defining the violation hypergraph, we introduce some notation.For each point y ∈ {0, 1} n let We use 1 n to denote the all-1 vector (point).
Let g be a Boolean function over {0, 1} n and let {y 0 , y 1 , . . ., y t } ⊆ {0, 1} n be a subset of points such that g(y 0 ) = 0 and g(y i ) = 1 for all 1 ≤ i ≤ t.A simple but useful observation is that if g is a monotone monomial, then Z(y 0 ) must include at least one index j such that j ∈ t i=1 O(y i ).This is true because if g is a monotone monomial, then the subset of indices that correspond to the variables that g is a conjunction of must be a subset of t i=1 O(y i ).But then there must be at least one index j ∈ t i=1 O(y i ) for which y 0 j = 0, or else g(y 0 ) would be 1.In other words, if Z(y 0 ) does not include any index j such that j ∈ t i=1 O(y i ), which is equivalent to t i=1 O(y i ) ⊆ O(y 0 ), then g cannot be a monotone monomial.This observation motivates the next definition.
and whose edge set, E(H f ), contains all subsets {y 0 , y 1 , . . ., y t } ⊆ {0, 1} n , t ≥ 1, of the following form: For example, suppose that f (0011) = 0, f (0110) = 1 and f (1011) = 1.Then O(0011) = {3, 4}, O(0110) = {2, 3}, and O(1011) = {1, 3, 4}, so that O(0110) ∩ O(1011) = {3}, which is a subset of O(0011), implying that {0011, 0110, 1011} is an edge in H f .Also note that the edge-set of the hypergraph may be exponentially large in n, and edges may have large size (e.g., Ω(n)).Finally, observe that the second condition in the definition of the edges of H f , that is, t i=1 O(y i ) ⊆ O(y 0 ), is equivalent to Z(y 0 ) ⊆ t i=1 Z(y i ).By the observation preceding Definition 3.1, if f is a monotone monomial, then E(H f ) = / 0. We next claim that the reverse implication holds as well, so that we obtain a characterization of monotone monomials that is based on Lemma 3.2 follows directly from the next claim (setting R = {0, 1} n ).The claim will also serve us in proving an additional lemma.
In this case we let h be the monomial that is the conjunction of all variables x j (so that it has value 1 only on 1 n and 0 elsewhere).Since f −1 (1) ∩ R = / 0, this monomial is consistent with all points in R.
Note that since E(H f (R)) = / 0, necessarily f (1 n ) = 1, and so f −1 (1) = / 0. Let h be the monotone monomial that is the conjunction of all x j such that j ∈ M (where if M is empty, then h is the all-1 function).That is, h(x) = j∈M x j .We next show that f (y) = h(y) for all y ∈ {0, 1} n ∩ R.
By the definition of h, for all y ∈ f −1 (1)∩R, h(y) = 1.To establish that h(y) = 0 for all y ∈ f −1 (0)∩R, assume in contradiction that there is a point y 0 ∈ f −1 (0) ∩ R such that h(y 0 ) = 1.Since h(y 0 ) = 1 we have (by the definition of h) that y 0 j = 1 for all j ∈ M. That is, But this means that {y 0 } ∪ ( f −1 (1) ∩ R) is an edge in H f (R) and we have reached a contradiction.
Recall that a vertex cover of a hypergraph is a subset of the vertices that intersects every edge in the hypergraph.We next establish that if f is far from being a monotone monomial (with respect to D), then every vertex cover of H f must have large weight (with respect to D).This lemma strengthens Lemma 3.2 in the following sense.Lemma 3.2 is equivalent to saying that if f is not a monotone monomial, then E(H f ) = / 0. In particular this implies that if f is not a monotone monomial, then every vertex cover of H f is non-empty.Lemma 3.4 can be viewed as quantifying this statement (and taking into account the underlying distribution D).Proof.Assume, contrary to the claim, that there exists a vertex cover C of H f such that D(C) ≤ ε.Let R = {0, 1} n \C be the vertices that do not belong to the vertex-cover.Since C is a vertex cover, we have that E(H f (R)) = / 0. By Claim 3.3 there is a monotone monomial that is consistent with f on R. Since D(C) ≤ ε this implies that dist D (h, f ) ≤ ε, in contradiction to the premise of the lemma.By Lemmas 3.2 and 3.4, if f is a monotone monomial, then E(H f ) = / 0, so that trivially every minimum vertex cover of H f is empty, while if dist D ( f , MON M ) > ε, then every vertex cover of H f has weight greater than ε with respect to D. We would like to show that this implies that if dist D ( f , MON M ) > ε, then we can actually find (with high probability) an edge in H f , which provides evidence to the fact that f is not a monotone monomial.

The tester
We first introduce some more notation.Let ē j = 1 j−1 01 n− j .For any subset Z ⊆ [n], let y(Z) be the point in {0, 1} n such that for every j ∈ Z its j th coordinate is 0, and for every j / ∈ Z its j th coordinate is 1.For any subset S ⊆ {0, 1} n , let S f ,0 = {y ∈ S : f (y) = 0} and S f ,1 = {y ∈ S : f (y) = 1}.
The first observation on which our tester is based is that for every point y ∈ f −1 (0), there must be at least one index j ∈ Z(y) for which f ( ē j ) = 0, or else we have evidence that f is not a monotone monomial.In fact, we do not need to verify that f ( ē j ) = 0 for every j ∈ Z(y) in order to obtain evidence that f is not a monotone monomial.Rather, if we search for such an index (in a manner described momentarily), and this search fails, then we already have evidence that f is not a monotone monomial.
The search procedure (which performs a binary search), receives as input a point y ∈ f −1 (0) and searches for an index j ∈ Z(y) such that f ( ē j ) = 0.This is done by repeatedly partitioning a set of indices, Z, starting with Z = Z(y), into two parts Z 1 and Z 2 of (almost) equal size, and continuing the search with a part Z i , i ∈ {1, 2} for which f (y(Z i )) = 0. (If both parts satisfy the condition, then we continue with Z 1 .)Note that if both f (y(Z 1 )) = 1 and f (y(Z 2 )) = 1, then we have evidence that f is not a monotone monomial because f (y(Z 1 ∪ Z 2 )) = 0 (so that {y(Z 1 ∪ Z 2 ), y(Z 1 ), y(Z 2 )} is an edge in H f ).The search also fails (from the start) if Z(y) = / 0 (that is, y = 1 n ).For the precise pseudo-code of the procedure, see Figure 1.
The tester starts by obtaining a sample of Θ( √ n/ε) points, where each point is generated independently according to D. (Since the points are generated independently, repetitions may occur.)For each point in the sample that belongs to f −1 (0), the tester calls the binary search procedure.If any search fails, then the tester rejects f (recall that in such a case the tester has evidence that f is not a monotone monomial).Otherwise, the tester has a collection of indices J such that f ( ē j ) = 0 for every j ∈ J.The tester then takes an additional sample, also of size Θ( √ n/ε), and checks whether there exists a point y in the sample such that f (y) = 1 and Z(y) contains some j ∈ J.In such a case the tester has evidence Algorithm 1. Binary Search (Input: y ∈ {0, 1} n ) 2. if |Z| = 0, then output fail and halt.

While
• else output fail and halt.
4. Output the single index that remains in Z. that f is not a monotone monomial (specifically, { ē j , y} is an edge in H f ), and it rejects.For the precise pseudo-code of the tester, see Figure 2.
We shall use Lemma 3.4 to show that if dist D ( f , MON M ) is relatively large, then either the first sample will contain a point on which the binary search procedure fails (with high probability over the choice of the first sample), or the second sample will contain a point y such that f (y) = 1 and Z(y) ∩ J = / 0 (with high probability over the choice of both samples).

Algorithm 2. Monotone Monomials Test
1. Obtain a sample T of Θ( √ n/ε) points, each generated independently according to D.

2.
For each point y ∈ T f ,0 run the binary search procedure (Algorithm 1) on y.
3. If the binary search fails for any of the points, then output reject and halt.Otherwise, for each y ∈ T f ,0 let j(y) be the index returned for y, and let J(T f ,0 ) = y∈T f ,0 { j(y)}.

Obtain another sample T of size Θ(
√ n/ε) (generated independently according to D).

5.
If there is a point y ∈ T f ,1 such that Z(y) ∩ J(T f ,0 ) = / 0, then output reject, otherwise output accept.

The analysis of the tester for monotone monomials
The next definition will serve us in the analysis of the tester.THEORY OF COMPUTING, Volume 7 (2011), pp.155-176 Definition 3.5 (Empty points and representative indices).For a point y ∈ f −1 (0), we say that y is empty (with respect to f ) if the binary search procedure (Algorithm 1) fails on y.We denote the set of empty points (with respect to f ) by Y / 0 ( f ).If y is not empty, then we let j(y) ∈ Z(y) denote the index that the binary search procedure returns.We refer to this index as the representative index for y.If y ∈ Y / 0 ( f ), then j(y) is defined to be 0.
Note that since the binary search procedure is deterministic, the index j(y) is uniquely defined for each y / ∈ Y / 0 ( f ).As in Algorithm 2, for a sample T and T f ,0 = T ∩ f −1 (0), we let denote the set of representative indices for the sample.For any subset J ⊆ [n], let Y f ,1 (J) denote the set of all points y ∈ f −1 (1) for which Z(y) J = / 0. In particular, if we set J = J(T f ,0 ), then each point y ∈ Y f ,1 (J), together with any index j in the intersection of Z(y) with J, provide evidence that f is not a monotone monomial (i.e., { ē j , y} ∈ E(H f )}).We next state our main lemma.Lemma 3.6.Suppose that dist D ( f , MON M ) > ε and consider a sample T of c 1 √ n/ε points generated independently according to D. For a sufficiently large constant c 1 , with probability at least 5/6 over the choice of T , either T f ,0 contains an empty point (with respect to f ) or D As we show subsequently, the correctness of the tester follows quite easily from Lemma 3.6.Before proving Lemma 3.6 in detail, we give the high level idea of the proof.Lemma 3.6 is established by proving the contrapositive statement.Namely, if the probability (over the choice of T ) that T f ,0 does not contain an empty point (with respect to f ) and D(Y f ,1 (J(T f ,0 ))) < ε/(4 √ n) is at least 1/6, then dist D ( f , MON M ) ≤ ε.This is proved by applying a probabilistic argument to construct a vertex cover C of H f such that D(C) ≤ ε (assuming the counter-assumption holds), and then applying (the contrapositve of) Lemma 3.4.
Specifically, we first put in the cover C all empty points.This takes care of all edges that contain empty points, where by the (first part of) the counter-assumption, the total weight of all empty points is very small.In the next stage we work in O( √ n) iterations.Each iteration (except, possibly, for the last iteration), is associated with a subset J of new representative indices (i.e., indices that were not associated with previous iterations).We prove that in each iteration (except, possibly, the last) there exists such a subset of indices having size Ω( √ n).The subset of points (all from f −1 (1)) that are added to C in iteration covers all edges {y 0 , y 1 , . . ., y t } such that j(y 0 ) ∈ J .The second part of the counter-assumption is used to ensure that the weight of each subset of points that is added to the cover is O(ε/ √ n) (so that we get a total weight of O(ε) over all iterations).In the last stage we add to C all points in f −1 (0) that reside in edges that are not yet covered, where we can show that the total weight of these points is O(ε) as well.
The above discussion hints to the reason why the sizes of the samples taken by the tester grow like √ n.Roughly speaking, if the first sample, T , is significantly smaller than √ n, then the second sample, T , has to be significantly larger.In other words, if we want to decrease the size of the sample T in Lemma 3.6, then we also need to decrease the lower bound on D(Y f ,1 (J(T f ,0 ))).The reason is that as the size of T decreases, the sizes of the subsets J (defined in the proof of Lemma 3.6) decrease as well, and the number of iterations in the construction of the vertex cover C increases.But then, in order to obtain a vertex cover with small total weight, the weights of the subsets of points that are added in each iteration, must be smaller.
More formally, the proof of Lemma 3.6 builds on Claim 3.7, stated next, which in turn uses the following notation: For a subset J ⊆ [n], we let Y f ,0 (J) denote the set of points y ∈ f −1 (0) for which3 j(y) ∈ J. Claim 3.7.Let I be any fixed subset of [n], and consider a sample T of s = c 1 √ n/ε points generated independently according to D. For a sufficiently large constant c 1 , with probability at least 9/10 over the choice of T , either To make Claim 3.7 more concrete, consider the special case in which I = / 0. The lemma simply says that with probability at least 9/10 (over the choice of T ) either the subset of indices J(T f ,0 ) is relatively large, or the weight of the set of points y in f −1 (0) for which j(y) is not contained in J(T f ,0 ) is relatively small.
2. If χ = 1, then necessarily χ = 1.Therefore, for any threshold t we have that It therefore suffices to analyze the behavior of χ1 , . . ., χs .Letting c 1 = 64 (so that s = 64 √ n/ε), by the foregoing discussion and by applying a multiplicative Chernoff bound [2], we have that This means that with probability at least 9/10, either D(Y ) < ε/2 for some , in which case we have Proof of Lemma 3.6.Assume, contrary to the claim, that with probability at least 1/6 over the choice of T (which consist of c 1 √ n/ε points selected independently according to D), there is no point in T that is empty with respect to f and D(Y f ,1 (J(T f ,0 ))) < ε/(4 √ n).We will show how, using this assumption, it is possible to prove that there exists a vertex cover C of H f with D(C) ≤ ε.By Claim 3.4, this contradicts the fact that dist D ( f , MON M ) > ε.THE CONSTRUCTION OF C. We show how to construct a vertex cover C of H f in three stages.In the first stage we put in C all empty points, that is, all points in Y / 0 ( f ).By the counter-assumption, . This is true because otherwise, the probability that the sample T does not contain an empty point is at most In the second stage we work in iterations.In each iteration we add to the cover a subset of points Y ⊆ f −1 (1), which is determined by a subset T ⊆ {0, 1} n .These subsets are determined as follows.Let I 1 = / 0, and for > 1 let I = I −1 ∪ J(T −1 f ,0 ).We shall think of I as the subset of representative indices that have "already been covered" in a sense that will become clear momentarily.Suppose we apply Claim 3.7 with I = I .The claim says that with probability at least 9/10 over the choice of T , either that is, the total weight of the points whose representative index is not already in I or J(T f ,0 ) is small).Thus, the probability that neither of the two hold is at most 1/10.
Combining this with our counter-assumption (and taking a union bound), we have that with probability at least 1/6 − 1/10 > 0 over the choice of T : Since this (combined) event occurs with probability greater than 0, there must exist at least one set T for which it holds.Denote this set by T , and let Y = Y f ,1 (J(T f ,0 )) (so that all points in Y are added to the cover).If D(Y f ,0 ([n] \ (J(T f ,0 ) ∪ I ))) ≤ ε/2, then this (second) stage ends, and in the third stage we add to the cover C all points in Y f ,0 ([n] \ (J(T f ,0 ) ∪ I )).Otherwise, we set I +1 = I ∪ J(T f ,0 ) and continue with the next iteration.THEORY OF COMPUTING, Volume 7 (2011), pp.155-176 ESTABLISHING THAT C IS A VERTEX COVER OF H f .Consider any point y ∈ f −1 (0) that is contained in at least one edge in H f .We shall show that either y ∈ C, or, for every edge B ∈ H f that contains y, there is at least one point y ∈ B ∩ f −1 (1) that belongs to C. Since each edge in H f contains some point y ∈ f −1 (0), this implies that C is a vertex cover.Details follow.
If y ∈ Y / 0 ( f ), then it is added to the cover in the first stage.In particular, if f (1 n ) = 0 (so that H f contains the edge {1 n }), then 1 n ∈ Y / 0 ( f ), implying that the edge {1 n } is covered.Otherwise (y / ∈ Y / 0 ( f )), we have that y has a representative index j(y) ∈ Z(y).Consider the iterations in the second stage of the construction of C. If for some iteration we have that j(y) ∈ J(T f ,0 ) (where we consider the first iteration in which this occurs), then the cover contains all points in Y f ,1 ({ j(y)}) ⊆ Y f ,1 (J(T f ,0 )) = Y .Since, by the definition of H f , each edge in H f that contains y must contain some y ∈ f −1 (1) such that j(y) ∈ Z(y ), all edges containing y are covered after iteration .On the other hand, if j(y) / ∈ J(T f ,0 ) for every iteration , then let * denote the index of the last iteration and consider the third stage.In this stage we add to C all points in Y f ,0 But this set consists of those points whose representative index is ), and in particular the set contains y, so that y is added to the cover in the third stage.
BOUNDING THE WEIGHT OF C. The main observation is that in each iteration of the second stage except, possibly, for the last one ( * ), we have that where the different subsets are disjoint.This implies that * ≤ √ n + 1.By the construction of C, where the last inequality holds for c 1 ≥ 8 (assuming n ≥ 4).
Proof.Consider first the case that f is a monotone monomial.Observe that the tester rejects only if it finds evidence that f is not a monotone monomial.This evidence is either in the form of two (disjoint) subsets of indices, Z 1 and Z 2 such that f (y(Z 1 )) = f (y(Z 2 )) = 1 while f (y(Z 1 ∪ Z 2 ))) = 0 (found by the binary search procedure), or it is of the form of an index j and a point y ∈ f −1 (1), such that f ( ē j ) = 0 and j ∈ Z(y).Therefore, the tester never rejects a monotone monomial.Consider next the case that dist D ( f , MON M ) > ε.By Lemma 3.6, for a sufficiently large constant c 1 in the Θ(•) notation for T (the first sample), with probability at least 5/6 over the choice of T , either there is an empty point in If there is an empty point in T f ,0 , then the binary search will fail on that point and the tester will reject.On the other hand, if by 1/6 for c 1 ≥ 8.But if such a point is selected, then the tester rejects. 5Therefore, the probability that the tester rejects a function f for which dist D ( f , MON M ) > ε is at least 2/3.Finally, the number of points sampled is O( √ n/ε) since the tester obtains two samples of this size.Since for each point in the first sample that belongs to f −1 (0) the tester performs a binary search, the query complexity of the tester is O( √ n log n/ε).

Disallowing the all-1 function
Our tester as described in Subsection 3.2 works for the class of monotone monomials, MON M , when it is assumed to contain the "empty" monomial, that is, the all-1 function.Let 1 denote the all-1 function (over {0, 1} n ) and let MON M denote the class of monotone monomials that are a conjunction of at least one variable.Thus, MON M = MON M \ {1}.We next show how to augment our tester for MON M (Algorithm 2) so that it work for MON M .First we run Algorithm 2 with the distance parameter ε set to ε/2 and with slightly larger constants in the sizes of the sample, so that its failure probability is at most 1/6 rather than 1/3.If the tester rejects, then we reject as well.Otherwise, it is possible that the tester accepted We are interested in detecting this (with high probability) and rejecting f in such a case.
To this end we take an additional sample R of Θ(log n/ε) points, each generated independently according to D. We then check whether there exists at least one index j ∈ [n] such that y j = 1 for all points y ∈ R ∩ f −1 (1).If there is no such index, then we reject, otherwise we accept.We next show that if dist D ( f , 1) ≤ ε/2 and dist D ( f , MON M ) > ε, then this simple procedure rejects f with high constant probability.
The simple observation is that if dist D ( f , for every j ∈ [n], then by taking a union bound over all j ∈ [n] we get the following.With high constant probability over the choice of a sample of size Θ(log n/ε), for each j ∈ [n], the sample will contain a point y such that f (y) = 1 and y j = 0.
4 Distribution-free testing of (general) monomials The high-level structure of the tester for general monomials is similar to the tester for monotone monomials, but several modifications have to be made (and hence the tester and the notions it is based on are seemingly more complex).In this section we explain what the modifications are and how this affects the analysis.
Observe that the difference between Definition 3.1 and Definition 4.1 is in the additional requirement that Z({y 1 , ..., y t }) ⊆ Z(y 0 ) (as well as the fact that Similarly to Lemma 3.2 here we have the next lemma. Proof.Based on the values that f assigns to points in R, we next define a monomial h and show that it is consistent with f on R. If f −1 (1) ∩ R = / 0, then we let h be the monomial that is the conjunction of x 1 and x1 , so that h is the all-0 function.Since f −1 (1) ∩ R = / 0, this monomial is consistent with f on all points in R.
Otherwise, let R f ,1 = f −1 (1) ∩ R and R f ,0 = f −1 (0) ∩ R, and let h be the monomial that is the conjunction of all variables x j such that j ∈ O(R f ,1 ) and all negations of variables x j such that j ∈ Z(R f ,1 ) (where if O(R f ,1 ) and Z(R f ,1 ) are empty, then h is the all-1 function).We next show that by the premise of the claim, f (y) = h(y) for all y ∈ {0, 1} n ∩ R.
By the definition of h, for all y ∈ R f ,1 , h(y) = 1.To establish that h(y) = 0 for all y ∈ R f ,0 , assume in contradiction that there is a point y 0 ∈ R f ,0 such that h(y 0 ) = 1.Since h(y 0 ) = 1 we have (by the definition of h) that y 0 j = 1 for all j ∈ O(R f ,1 ) and y 0 j = 0 for all j ∈ Z(R f ,1 ).That is, Z(R f ,1 ) ⊆ Z(y 0 ) and O(R f ,1 ) ⊆ O(y 0 ).But this means, by the definition of H f (R), that {y 0 ∪ R f ,1 } is an edge in H f (R) and we have reached a contradiction.
The next lemma is analogous to Lemma 3.4.Its proof is the same as the proof of Lemma 3.4 except that the application of Claim 3.3 is replaced by an application of Claim 4.3.

The tester for general monomials
For a vector y ∈ {0, 1} n , and for an index j ∈ [n], let y ¬ j be the same as y except that the j th coordinate in y is flipped.That is, y ¬ j = y for all = j and y ¬ j j = ȳ j .For a subset I ⊆ [n] let y ¬I be the vector y with each coordinate j ∈ I flipped.That is, y ¬I = y for all / ∈ I and y ¬I = ȳ for all ∈ I. Let ∆(y, w) ⊆ [n] be the subset of indices j such that y j = w j , and note that y = w ¬∆ for ∆ = ∆(y, w).(c) else if f (w ¬∆ 2 ) = 0, then ∆ ← ∆ 2 ; (d) else output fail and halt.
3. Output the single index j ∈ ∆; We start by describing the binary search procedure (for general monomials).Its pseudo-code is given in Algorithm 3 (see Figure 3).The procedure receives as input two points w, y ∈ {0, 1} n such that f (w) = 1 and f (y) = 0 and outputs an index j ∈ [n] such that y j = w j and such that f (w ¬ j ) = 0.If f is a monomial, then at least one such index must exist.Note that if w = 1 n , then the output of the search is as specified by the binary search procedure for monotone monomials (Algorithm 1).In fact, Algorithm 1 itself (and not only its output specification) is essentially the same as Algorithm 3 for the special case of w = 1 n .(Since f (1 n ) must equal 1 if f is a monotone monomial, we can think of the binary search procedure for monotone monomials as implicitly working under this assumption.) The search is performed by repeatedly partitioning a set of indices ∆, starting with ∆ = ∆(y, w), into two parts ∆ 1 and ∆ 2 of (almost) equal size, and querying f on the two points, w ¬∆ 1 and w ¬∆ 2 .If f returns 1 for both, then the search fails.Otherwise, the search continues with ∆ i for which f (w ¬∆ i ) = 0, unless |∆ i | = 1, in which case the desired index is found.If the search fails, then we have evidence that f is not a monomial.Namely, we have three points, w ¬∆ 1 , w ¬∆ 2 and w ¬∆ , where ∆ = ∆ 1 ∪ ∆ 2 , such that f (w ¬∆ ) = 0 and f (w ¬∆ 1 ) = f (w ¬∆ 2 ) = 1.Since w ¬∆ 1 and w ¬∆ 2 disagree on all coordinates in ∆, and all three points agree on all coordinates in [n] \ ∆, we have that Z({w ¬∆ 1 , w ¬∆ 2 }) ⊆ Z(w ¬∆ ) and O({w ¬∆ 1 , w ¬∆ 2 }) ⊆ O(w ¬∆ ), so that the three points constitute an edge in H f .Algorithm 4. General Monomials Test 1. Obtain a sample S of Θ(1/ε) points, each generated independently according to D.
3. Obtain a sample T of Θ( √ n/ε) points, each generated independently according to D.

4.
For each point y ∈ T f ,0 run the binary search procedure (Algorithm 3) on w, y.
5. If the binary search fails for any of the points, then output reject and halt.Otherwise, for each y ∈ T f ,0 let j w (y) be the index returned for y, and let J w (T f ,0 ) = { j w (y) : y ∈ T f ,0 }.
7. If there is a point y ∈ T f ,1 and an index j ∈ J w (T f ,0 ) such that y j = w j , then output reject, otherwise output accept.The tester for general monomials starts by obtaining a sample of Θ(1/ε) points, each generated independently according to D. The tester arbitrarily selects a point w in this sample that belongs to f −1 (1).If no such point exists, then the tester simply accepts f (and halts).Otherwise, this point serves as as a kind of reference point.As in the case of the binary search procedure, the tester for monotone monomials (Algorithm 2) is essentially the same as the tester for general monomials (Algorithm 4) with w (implicitly) set to be 1 n .
Next, the tester obtains a sample of Θ( √ n/ε) points (each generated independently according to D).For each point y in the sample that belongs to f −1 (0), the tester performs a binary search on the pair w, y.If any search fails, then the tester rejects (recall that in such a case it has evidence that f is not a monomial).Otherwise, for each point y in the sample that belongs to f −1 (0), the tester has an index, j w (y) ∈ ∆(y, w), such that f (w ¬ j w (y) ) = 0. Let the subset of all these indices be denoted by J.Note that by the construction of J, if f is a monomial, then for every j ∈ J, if w j = 1, then the variable x j must

Lemma 3 . 4 .
If dist D ( f , MON M ) > ε, then for every vertex cover C of H f we have D(C) > ε.

Figure 1 :
Figure 1: The binary search procedure for monotone monomials.

Figure 2 :
Figure 2: The distribution-free tester for monotone monomials.

Lemma 4 . 2 .
If E(H f ) = / 0 then f is a monomial.Lemma 4.2 follows from the next claim, which is similar to Claim 3.3 (by setting R = {0, 1} n ).

Claim 4 . 3 .
Let R be a subset of {0, 1} n and let H f

Lemma 4 . 4 .
If dist D ( f , MON) > ε, then for every vertex cover C of H f we have D(C) > ε.

Figure 3 :
Figure 3: The binary search procedure for general monomials.

Figure 4 :
Figure 4: The tester for general monomials.