Pseudorandomness for Width 2 Branching Programs

Recently Bogdanov and Viola (FOCS 2007) and Lovett (ECCC-07) constructed pseudorandom generators that fool degree k polynomials over F 2 for an arbitrary constant k . We show that such generators can also be used to fool branching programs of width 2 and polynomial length that read k bits of inputs at a time. This model generalizes polynomials of degree k over F 2 and includes some other interesting classes of functions, for instance k -DNF. The constructions of Bogdanov and Viola and Lovett consist of adding a constant number of independent copies of a generator that fools linear functions (an (cid:15) -biased set). It is natural to ask, in light of our ﬁrst result, whether such generators can fool branching programs of width larger than 2. Our second result is a lower bound showing that a sum of o ( √ n/ log n ) independent copies of any n − O (1) -biased set does not fool branching programs of width 5. To the best of our knowledge this is the ﬁrst lower bound for such constructions.


Introduction
Reingold's proof [Rei05] that SL = L has brought renewed interest to derandomizing space bounded computations.Some attempts were made to apply these new techniques towards resolving the L versus RL question [RTV06,CRV07], but so far without success.The best known deterministic simulation of randomized logspace is by Saks and Zhou [SZ99], who show that RSPACE(log n) ⊆ DSPACE(O(log 3/2 n)).This construction is based on Nisan's generator [Nis92], which is to this date the best pseudorandom source (both in terms of time and space efficiency) for randomized space bounded computations.
Logarithmic space is one of the simplest models of computation that we know of, yet progress on improving the use of randomness in this model has been stuck for over a decade now.One line of attack is to try and derandomize even simpler models of space bounded computation.In the uniform setting, restricting the model to use less than logarithmic space is not particularly natural, but there is a good way to specialize the definitions in the nonuniform setting.
A nonuniform model for a computation that uses space s(n) and runs in time t(n) is a branching program of width 2 s(n) and length t(n).This device can be described by a layered directed acyclic graph, where there are t(n) layers and each layer contains 2 s(n) nodesexcept for last layer which consists of only two nodes, "accept" (1) and "reject" (0).Each layer j is associated with a bit x| j of the input x.Each node in layer j has 2 outgoing edges labelled by possible values of the bit x| j .On input x, the computation starts in the first node in the first layer, then follows the edge labelled by x| 1 onto the second layer, and so on until a node in the last layer is reached.The identity of this last node is the outcome of the computation.
When s(n) is very small (say constant), it is interesting to generalize the above definition so that the branching program is allowed to read k > s(n) bits of the input at each step.Now x| j denotes a k-bit sub-string of the input x (not necessarily consecutive bits) and each node in layer j has 2 k outgoing edges labelled by all possible values of x| j .The computation is done in the same way as before.One could think of such a branching program as having a 'global' space of s(n) bits and a larger 'local' space of k bits.For the rest of this paper we consider this generalized formulation of branching programs.
The nodes in layer j represent the possible states of the randomized space-bounded computation at time j, and the outgoing edges represent the possible transitions depending on the contents of the random tape x.The block x| j is the part of the random tape "viewed" by the machine at time j.In randomized space-bounded computation, we usually restrict the machine to have one-way access to the random tape.In the branching program setting, this imposes the requirement that x is the concatenation of all the blocks x| j in order, namely x = x| 1 . . .x| t .That is, at each step we read the 'next' k bits of the input, without being able to go back and look at bits we have already read.We call such a branching program read-once.General branching programs are much more powerful than read-once branching programs For instance, the inner product function, x i x n/2+i mod 2, n even, can be computed by a branching program of width 2 that reads 2 bits at a time but not by any read-once branching program of width o(n) that reads o(n) bits at a time (note that in this example the order of the variables is important).

Pseudorandom Generators
We start by giving a formal definition of a pseudorandom generator against a class of functions.
Definition 1.1.We say a distribution D on {0, 1} n is -pseudorandom against a class C of functions from {0, 1} n to {0, 1} if for every f ∈ C, We call m the seed length of the generator.
We use (k, t, n)-2BP to denote width 2 branching programs of length t that read k bits of input at a time and compute a function from {0, 1} n to {0, 1}.Our first main result is a positive one, showing that a PRG for degree k polynomials over F 2 is also a PRG for the class of functions computed by a (k, t, n)-2BP.
Theorem 1.2.Let G be an -PRG against degree k polynomials in n variables over F 2 .Then G is an -PRG against the class of functions computed by a (k, t, n)-2BP, with = t • .
Recently Lovett [Lov07] (following work by Bogdanov and Viola [BV07]) constructed an -PRG against degree k polynomials in n variables over F 2 with seed length 2 O(k) • log(n/ ) by summing together 2 k independent copies of a generator against linear functions with bias 2 O(k) .Using Theorem 1.2, this automatically yields generators for (k, t, n)-2BP's with seed length 2 O(k) • log(n • t/ ).Observe that a width 2 branching program that reads k bits at a time in particular can compute every polynomial of degree k (with t = O(n k )).However, such programs are strictly stronger than degree k polynomials; e.g., they can compute any k-DNF.

Lower Bounds
It is tempting to test the same pseudorandom generator against branching programs of larger width.In general this is impossible: Theorem 1.3.For every n > 1 and k, there exists a distribution D on {0, 1} n that is exp(−Ω(n/4 k ))-pseudorandom against degree k polynomials but is not 0.66-pseudorandom against read-once width 3 branching programs of length n that read one bit at a time.
This theorem is a consequence of a recent correlation bound of Viola and Widgerson [VW07].However, we do not find this lower bound completely satisfying, since even though it rules out general pseudorandom generators for polynomials as a mean to fooling small width branching programs, it says nothing about the constructions in [BV07, Lov07] -sums of independent copies of generators against linear functions.Our second main result is the following theorem which shows the limitations of these constructions.
Theorem 1.4.For every n, , and k such that k log(1/ ) < n/2 − 1, there exists a distribution D such that D is -pseudorandom against linear functions over {0, 1} n , but the sum of k independent copies of D is not 1/3-pseudorandom against width 5 branching programs of length 2 O(log(k log(1/ ))) 2 that read one bit at a time.
It is known [MRRW77,FT05] that the seed length of an -biased generator must be at least Ω(log n + log(1/ )).Therefore, if we want the generator to be efficient, we are restricted to using = poly(n).For this setting of parameters, Theorem 1.4 tells us that for any constant k and sufficiently large n, a branching program of width 5 and length n will not be fooled by a sum of k independent -biased generators.
The branching programs that realize this lower bound are not read-once, so it does not rule out the possibility of using sums of independent generators against linear functions to fool randomized space bounded computations even of width poly(n).We leave it as an intriguing open question whether Lovett's generator helps against width 3 and width 4 branching programs (read-once or not).Such devices are fairly powerful: width 3 branching programs of length t, for instance, can compute all DNF of size t (even for k = 1); this is a class of functions that has resisted the construction of polynomial size pseudorandom sets for some time.Width 4 can compute any sparse polynomial with at most t/n terms.

Proof Technique
It has been known for some time that read-once width 2 branching programs that read one bit at a time can be fooled by linear generators. 1 One way to argue this is to think of the computation of the branching program B as a boolean function over F n 2 and show inductively over the layers of B that the sum of the absolute values of the Fourier coefficients of B is bounded from above by t.It is easy to see that linear generators of bias are L-pseudorandom against any boolean function whose sum of absolute values of Fourier coefficients is at most L, and the correctness follows from there.
For branching programs that read more than one bit at a time this argument cannot work, as there exist width 2 branching programs that read 2 bits at a time and that are not fooled by any small bias linear generator.One such branching program computes the inner product function IP (x 1 , . . ., x n ).Nevertheless, we argue along the same lines.Instead of using the Fourier transform of the branching program, we resort to "higher-order" representations of functions using low-degree polynomials.We show that every branching program B with length t and width 2 that reads k bits at a time admits a "representation of length t" in terms of degree k polynomials.By "representation of length t" we mean that B can be written as a sum over the reals of the form where p ranges over all degree k polynomials over F 2 , and α p are real coefficients such that p |α p | ≤ t.Unlike the Fourier transform, for degree 2 and larger this representation is not unique.Once this representation is obtained, we argue that a pseudorandom generator for degree k polynomials is also pseudorandom for B by linearity of expectation.
While our proof is not technically difficult we find the application of "higher-order" Fourier type analysis conceptually interesting and potentially relevant for other computer science applications.

Organization
In Section 2 we prove our main positive result, Theorem 1.2.Then, in Section 3 we show limitations of existing PRG's and prove Theorem 1.3 and Theorem 1.4.

Fooling width branching programs
Recall that we use (k, t, n)-2BP to denote width 2 branching programs of length t that read k bits of input at a time and compute a function from {0, 1} n to {0, 1}.For a function f : {0, 1} n → {0, 1}, we denote f = (−1) f , a map from {0, 1} n to {1, −1}.Define deg(f ) to be the degree of f when viewed as a multilinear polynomial in F 2 [x 1 , . . ., x n ].

Width 2 Branching Programs as Sum of Polynomials
The following theorem is the basis for the proof of Theorem 1.2.It shows that width 2 branching programs have a "short representation by polynomials of small degree".

For all
We defer the proof of Theorem 2.1 to Section 2.2 and proceed by showing how it implies our main result.

3.
Thus, for all x ∈ {0, 1} n We complete the proof by renaming the polynomials and the coefficients in the above sum.For j = 1, . . ., s, set and for j = s + 1, . . ., 2s, set (where ⊕ denotes summation in F 2 ).Set β 2s+1 = β 2s+2 = 1/2, set h 2s+1 (x) = p 0 (x| t−1 ), and set h (x) = p 1 (x| t−1 ).Finally, set s = 2s + 2. Thus, for all x ∈ {0, 1} n .In addition, every h j is of degree at most k (since addition in F 2 does not increase the degree), and s j=1 3 Limitations of Existing PRG's In this section we explore the limitations of pseudorandom generators of two kinds.First, we show that pseudorandom generators for degree k polynomials fail for read-once width 3 branching programs (Theorem 1.3).Second, we show that a sum of several copies of pseudorandom generators for linear functions fail for width 5 branching programs (Theorem 1.4).

Proof of Theorem 1.3
We derive Theorem 1.3 from a special case of a correlation bound of Viola and Widgerson.
Let ω = e 2πi/3 be the cube root of unity and mod 3 : {0, 1} n → C denote the function where the summation x 1 + • • • + x n is evaluated over the integers.

Theorem 3.1 (Viola and Wigderson [VW07]
).There is a constant α > 0 such that for every n and every polynomial p : Assume n > 100.Let D be the uniform distribution on the set of all x ∈ F n 2 such that mod 3 (x) = 1.
We will first show that D is exp(−Ω(n/4 k ))-pseudorandom against degree k polynomials.For every polynomial p : Therefore, by Theorem 3.1, so D is exp(−αn/4 k )-pseudorandom against all degree k polynomials.
We will now show that D is not 0.66-pseudorandom against read-once width 3 branching programs of length n that read one bit at a time.Let f : {0, 1} n → {0, 1} be the function f (x) = 1, if mod 3 (x) = 1 0, otherwise.
Then E x∼D [f (x)] = 1 while and for a ∈ {1, 2} so that E x∼{0,1} n [f (x)] ≤ 1/3 + 2 −n+1 /3 < 0.34 and D is not 0.66-pseudorandom against f .Since f can be computed by a read-once width 3 branching program that reads one bit at a time, the proof is complete.
This suggests the following test for X: Arrange the first 2m blocks of X as rows in an m×2m matrix M and compute the rank of M over F 2 .(By our choice of parameters, 2m 2 ≤ n so this is always possible.)If the matrix has full rank output one, otherwise output zero.If X is chosen from D k , then all the rows of M are chosen from the same subspace of dimension m − 1 so M will never have full rank.If X is chosen from the uniform distribution, then M is a random m × 2m matrix and, by a union bound, the probability it doesn't have full rank is at most 2 −m < 1/3.
It remains to observe that the above test, which is essentially a rank computation, can be implemented by a circuit of depth O((log m) 2 ) via Cook's theorem [Coo85].