Lower Bounds for Non-Commutative Skew Circuits

: Nisan (STOC 1991) exhibited a polynomial which is computable by linear-size non-commutative circuits but requires exponential-size non-commutative algebraic branching programs. Nisan’s hard polynomial is in fact computable by linear-size “skew circuits.” Skew circuits are circuits where every multiplication gate has the property that all but one of its children is an input variable or a scalar. Such multiplication gates are called skew gates . We prove that any non-commutative skew circuit which computes the square of the polynomial deﬁned by Nisan must have exponential size. A simple extension of this result then yields an exponential lower bound on the size of non-commutative circuits where each multiplication gate has an argument of degree at most one-ﬁfth of the total degree. We also extend our techniques to prove an exponential lower bound for a class of circuits which is a restriction of general non-commutative circuits and a generalization of non-commutative skew circuits. We deﬁne the non-skew depth of a circuit to be the maximum number of non-skew gates on any path from an input gate to the output gate. We prove lower bounds for non-commutative circuits of small non-skew depth. More precisely, we show that for any k < d , there is an explicit polynomial of degree d over n variables that has non-commutative circuits of polynomial size but such that any circuit with non-skew depth k must have size at least n Ω ( d / k ) . It is not hard to see that any


Introduction 1.Non-commutative arithmetic circuits
If we want to design an efficient algorithm for a computational problem that is naturally stated as a polynomial-such as the determinant or the permanent, matrix multiplication, Fast Fourier Transform, etc.-then arithmetic circuits capture most natural candidate algorithms that we might consider.An arithmetic circuit is an algorithm that starts with the input variables and possibly some constants in the underlying field, and iteratively applies addition and multiplication operations until it computes the desired polynomial.There has been a large body of work proving upper and lower bounds on the arithmetic circuit complexity of various polynomials (see, e. g., the surveys [22,6]).In particular, proving explicit superpolynomial lower bounds for general arithmetic circuits is a celebrated open question in complexity theory and one of the possible approaches to the P versus NP question (see, e. g., [4]).However, despite more than three decades of intensive study, it has seen little tangible progress (in the sense of concrete lower bounds for general circuits).
In this paper, we concentrate on non-commutative arithmetic circuits, which compute polynomials in the non-commutative polynomial ring F X .Here, variables do not commute upon multiplication; that is, xy and yx (for distinct x, y ∈ X) are distinct monomials.There are two reasons for looking at such circuits.The first is that such circuits yield algorithms for polynomial functions over non-commutative algebras, which arise naturally and can even have applications to commutative computations (see [7,2], in particular the use of non-commutative determinants to approximate the commutative permanent).The second reason is that proving explicit lower bounds for non-commutative arithmetic circuits is formally an easier problem than that of proving lower bounds for (commutative) arithmetic circuits described in the previous paragraph, and it is hoped that techniques discovered in the course proving non-commutative lower bounds will be useful in the commutative setting as well.
The results of Hyafil [11] and Nisan [15] were among the first to motivate the study of arithmetic circuits from this latter point of view.In a breakthrough, Nisan [15] showed exponential lower bounds for non-commutative arithmetic formulas (a restriction of general non-commutative arithmetic circuits) and more generally for non-commutative algebraic branching programs (ABPs).(The formal definition of an ABP is given in Definition 3.1.)This might have led one to think that a superpolynomial lower bound for general (non-commutative) arithmetic circuits 1 was also close at hand.However, Nisan also showed using the same techniques that general arithmetic circuits are exponentially more powerful than arithmetic formulas and ABPs, thus suggesting that his techniques may not be sufficient to prove lower bounds for general arithmetic circuits.Indeed, there is no known lower bound for general non-commutative arithmetic circuits that is stronger than those that we already have for general commutative arithmetic circuits.
In a more recent result, Hrubeš, Wigderson, and Yehudayoff [9] suggested a new line of attack on the general arithmetic circuit lower bound question.Their result introduces a "product lemma" for general arithmetic circuits that generalizes a decomposition of ABPs due to Nisan [15].Using this lemma, they are able to show that superpolynomial lower bounds for general arithmetic circuits would follow from a strong enough lower bound for the classical Sum-of-squares problem.However, as of now, this approach has not yielded superpolynomial arithmetic circuit lower bounds.Therefore, the strongest known computational model for which we have superpolynomial lower bounds remains the ABPs from the paper of Nisan [15].

Our results
In this paper, we prove exponential lower bounds for skew circuits.Skew circuits are arithmetic circuits where every multiplication involves at least one argument2 that is either an input variable or a field element.More formally, we prove the following theorem.
Theorem 1.1.For infinitely many d ∈ N and any n ∈ N, there exists an explicit polynomial on n variables of degree d such that any skew circuit computing it must have size Ω(n d/4 ) where the Ω(•) hides poly(d) factors.
Skew circuits are a well-studied model of computation [23,14,1,13], especially in the commutative setting, where they are equivalent in power to ABPs and to the evaluation of the determinant polynomial.However, the picture seems more complicated in the non-commutative setting.Nisan [15] has shown that skew circuits are exponentially more powerful than ABPs.Thus, our lower bound for this model can be seen as one step towards the goal of superpolynomial lower bounds for general non-commutative circuits.
Note that a superpolynomial lower bound for non-commutative skew circuits was claimed by Allender et al. [1], but, unfortunately, the proof of this particular result in the paper (Theorem 7.12) seems to fail because it did not take into account possible cancellations. 3Indeed, they argue that considering a non-commutative skew-circuit and switching multiplication gates so that it is now left-skew yields a polynomial which is weakly equivalent to the original one, i. e., which has exactly the same monomials with possibly different coefficients.But this is not true, as there might have been cancellations of monomials in the original skew circuit which do not happen anymore in the resulting left-skew one, because of differing variable orders, thus leaving extraneous monomials.
Theorem 1.1 also clarifies the relative power of skew circuits vis-à-vis general arithmetic circuits.In fact, our lower bound shows that skew circuits are exponentially less powerful than circuits with just one non-skew gate (that is, neither of its arguments is an input variable or field element).This is because the explicit polynomial for which we prove a lower bound is just the square of a polynomial considered by Nisan, and this polynomial in turn has skew circuits of linear size.
We also consider the problem of extending our techniques to more powerful classes of circuits.We obtain a first simple generalization of our lower bound to circuits where every multiplication gate has an argument of degree at most δ , which we call δ -unbalanced circuits.For instance, this yields an exponential lower bound for the same polynomial as above, when computed by circuits where each multiplication gate has an argument of degree at most one fifth of the total degree.
Another natural way to extend our results (and one that is analogous to a large body of work in the Boolean circuit setting; see, e.g.[3,5,12]) is to augment a circuit for which we do have lower bounds with a few "powerful" gates and see if one can still prove a lower bound.We therefore consider the problem of proving lower bounds for skew circuits with a "few" non-skew multiplication gates.
We say that the non-skew depth of a non-commutative circuit is the maximum number of non-skew gates on a path from a variable to the output gate in the DAG underlying the circuit.We prove the following result for such circuits.
Theorem 1.2.For infinitely many d ∈ N and any k, n ∈ N, there exists a polynomial of degree d on n variables which is computable by a polynomial-size non-commutative circuit of non-skew depth O(k) but requires size n Ω(d/k) for any non-commutative circuit of non-skew depth k.
In particular the above theorem implies that there exists a polynomial of degree d which is computable by a polynomial-size non-commutative circuit of non-skew depth d, but requires a superpolynomial size for any non-commutative circuit of non-skew depth k(d) = o(d).It is not hard to see that any polynomial of degree d that can be computed by a polynomial-size arithmetic circuit can also be computed by a polynomial-size arithmetic circuit of non-skew depth d.Hence, strengthening our lower bound substantially would prove lower bounds for general non-commutative circuits.
We also show that the determinant polynomial can simulate our hard polynomial, thus completing the picture in the non-commutative setting by showing that skew circuits are exponentially less powerful than the determinant polynomial.Finally, we show that to prove superpolynomial lower bounds for general non-commutative circuits, our complexity measure (to be defined formally in the upcoming section) will need to be further refined.Slightly more precisely, we show that there is a polynomial that has polynomial-size non-commutative circuit, but for which our complexity measure is as large as possible.
Organization.The rest of the paper is organized as follows.We start with a proof outline in Section 2. We then present some definitions in Section 3 and preliminaries in Section 4. The proof of Theorem 1.1 is presented in Section 5, with the extension to unbalanced circuits, and the proof of Theorem 1.2 is presented in Section 6. 4 We also prove lower bounds for the permanent and determinant polynomials in Section 7. Finally, we show the limits of these complexity measures in Section 8.

Proof outline 2.1 A lower bound for ABPs
Our overall proof strategy is similar to that of Nisan [15] for non-commutative formulas and algebraic branching programs (ABPs).(The formal definition of an ABP is given in Definition 3.1.)In his result, Nisan considered the partial derivative matrix corresponding to a homogeneous polynomial f ∈ F X of degree d, originally introduced by Hyafil [11], which is defined to be an n d/2 × n d/2 matrix M[ f ] where the rows and columns are labelled by monomials in X of degree d/2.The (m 1 , m 2 )-th entry of the matrix M[ f ] is defined to be the coefficient of the monomial m 1 m 2 in f . 5isan observed that if f has a formula or ABP of small size, then f can be decomposed as a small sum of polynomials of the form g • h where g and h are homogeneous polynomials of degree d/2.Crucially, it may be seen that for any such g, h the matrix M[g • h] has rank 1 and hence, by subadditivity of rank, M[ f ] has small rank.Thus, choosing an f such that rank(M[ f ]) is large gives us a lower bound.
Intuitively speaking, the rank of the matrix M[ f ] is a measure of how "correlated" the first half of a monomial appearing in f is with its second half: M[ f ] being full rank would mean that they are perfectly correlated, whereas M[ f ] being low rank would mean that they are not very correlated at all.Nisan's argument shows that small ABPs have "information bottlenecks" at degree d/2 (and indeed at any degree d ≤ d), and hence the amount of correlation is small.

Nisan's measure applied to skew circuits
A natural question to ask is if this argument can give a lower bound for non-commutative skew circuits as well.Unfortunately, the answer is no, as is already implicit in Nisan's paper.Consider the Palindrome polynomial PAL d/2 (X), which is the sum of all monomials of degree d that are palindromes when viewed as strings of length d over the alphabet X. Nisan observed that PAL d/2 (X) has a skew circuit of linear size but at the same time M[PAL d/2 (X)] has full rank.In fact, M[PAL d/2 (X)] is a permutation matrix since the first half of a palindrome uniquely determines the second half (thus, the first and second halves of monomials appearing in f are perfectly correlated).Hence, the partial derivative matrix of polynomials with small skew circuits can have as large a rank as possible.This means that in our lower bound argument for skew circuits, we need to use a different measure of complexity.

A new measure for skew circuits
The measure that we use is a modified version of the partial derivative matrix, defined as follows.Let f ∈ F X be a homogeneous polynomial of degree d over n variables, and given an ordered partition Π = (Y, Z) of [d] into two parts, we define M[ f , Π] to be the matrix whose rows and columns are indexed by monomials in X of degree |Y | and |Z|, respectively.The (m 1 , m 2 )-th entry of M[ f , Π] is defined to the coefficient of the unique monomial m of degree d which equals m 1 if we keep only the variables indexed by locations in Y and delete the others, and equals m 2 if we only keep the variables indexed by locations in Z.As above, the rank of M[ f , Π] measures the correlation between the restriction of a monomial to the locations in Y and the locations in Z.We are usually interested in Π where |Y | ≤ |Z|, since in this case we know that the maximum possible rank is min{n In this notation, the measure of complexity used by Nisan is rank ) and we have seen above that this measure is as large as it can be for, say, the Palindrome polynomial PAL d/2 (X), which has a small skew circuit.However, it is an easy observation that if one considers the partition Thus, we might hope that for every polynomial f that has a small skew circuit, we could find a Π such that M[ f , Π] has low rank.We are in fact able to show something much stronger: we can show in general that if f has a small skew circuit, then rank(M[ f , Π 0 ]) is "small" for the particular Π 0 defined above.(Here, "small" means that the rank is much smaller than full rank.)In terms of correlation, this statement could be interpreted as saying that though skew circuits can compute polynomials that are perfectly correlated w. r. t.Nisan's partition ), they can only do so by correlating the initial few indices in the monomial with the final few indices, as in the Palindrome polynomial.Consequently, these "extreme" indices end up uncorrelated with those in the middle.This is the weakness of skew circuits that we exploit in our lower bound.

Decomposition lemma for skew circuits
The proof of this fact rests on a decomposition of skew circuits that is motivated by the similar ABP decomposition mentioned above.Like in the ABP decomposition, we can show that given any homogeneous polynomial f of degree d that has a small skew circuit and any degree parameter d ∈ [d], we can decompose f as a small sum of polynomials of the form g × j h where g and h are polynomials of degrees d and d − d , respectively, (we refer the reader to Section 3 for the definition of × j , but it intuitively means that the polynomial g is multiplied on the left by the sum of the prefixes of the monomials of h of degree j and on the right by the sum of the suffixes of degree d − d − j).The proof of this lemma is obtained by specializing the proof of a lemma of Hrubeš, Wigderson and Yehudayoff [9] regarding general non-commutative arithmetic circuits to the case of skew circuits, where it yields a stronger conclusion.
Given this decomposition lemma, we prove the lower bound as follows.We apply the lemma with d being a large number close to d; for concreteness, say d = 3d/4.In other words, we decompose f as a small sum of polynomials g × j h where g and h are homogeneous polynomials of degrees 3d/4 and d/4, respectively.In each such polynomial, a set I g ⊆ [d] of 3d/4 indices corresponds to g and a set I h = [d] \ I g corresponds to the polynomial h as shown in Figure 1 below.As we mentioned above, we will consider the rank of the matrix where the partitions Π g = (Y g , Z g ) and Π h = (Y h , Z h ) are the natural restrictions of Π 0 to I g and I h , respectively.Note that if rank(M[g × j h, Π 0 ]) is to be close to full, i. e., n |Y 0 | , then we need both rank(M[g, Π g ]) and rank(M[h, Π h ]) to be close to n |Y g | and n |Y h | , respectively.However, it is easily seen that, irrespective of the value of j, the matrix M[h, Π h ] is always a rank 1 matrix (this happens since Y h occupies all of I h and thus Z h = / 0) and hence rank(M[g × j h, Π 0 ]) falls exponentially short of its maximum possible value.Since f is a small sum of such polynomials, the same is true of rank(M[ f , Π 0 ]) as well.More generally, the same strategy shows that rank(M[ f , Π]) is small as long as Π = (Y, Z) has the "left-right monochromatic" form (LRM partitions for short) shown in Figure 2 (for d 1 , d 2 large enough).The above argument implies a strong exponential lower bound on the size of a skew circuit computing any homogeneous polynomial F of degree d such that M[F, Π 0 ] is full rank.It is easy to find explicit examples of such polynomials.For example, we could take F to be the square of PAL d/4 (X) or the Lifted Identity polynomial of Hrubeš et al. [9].In either of these cases, it can be checked that M[F, Π 0 ] is again a permutation matrix and hence full rank.Since (PAL d/4 (X)) 2 can be computed by a small circuit with just a single non-skew gate, this also gives an exponential separation between skew circuits and circuits with one non-skew gate.However, this also implies that if we want to extend our lower bound to non-commutative circuits of small non-skew depth, then we need to modify our measure further.

New measure and decomposition lemma for circuits with small non-skew depth
We prove our lower bound for circuits of small non-skew depth by induction on the non-skew depth k of the circuit.As in the skew case, we choose a partition Π k of [d] such that no small non-skew depth k circuit can compute a polynomial that has large rank w. r. t. the partition Π k .The inductive argument is based on showing that if a non-skew depth k circuit C computes a polynomial of large rank w. r. t.Π k , then it must contain a depth k − 1 circuit that computes a polynomial of large rank w. r. t.Π k−1 (or an even "harder" partition).We then apply the inductive hypothesis to prove the lower bound.
Let us consider the problem of constructing such a partition in the case k = 1 (i.e., non-skew depth 1).Ideally, we would like to construct a partition Π 1 such that if C is a circuit of non-skew depth 1 that is high rank w. r. t.Π 1 , then a sub-circuit of C is high rank w. r. t. an LRM partition as in Figure 2 (with perhaps a slightly smaller degree).However, it can be checked that we cannot choose such a partition even if we know beforehand that C is just a product of two skew circuits.That is, for any candidate partition Π 1 , there are skew circuits of degree d ≤ d and d − d computing polynomials g 1 and g 2 such that neither the partition restricted to g 1 , nor the partition restricted to g 2 , is LRM.
Hence, we are first led to the problem of enlarging the family of partitions that are hard for skew circuits.Building on the techniques outlined for skew circuits above, we can also show that small skew circuits cannot compute high rank polynomials w. r. t. the larger family of "extended LRM" (XLRM) partitions, illustrated in Figure 3, which are obtained by extending an LRM partition on the left and right THEORY OF COMPUTING, Volume 12 (12), 2016, pp.1-38 sides with segments of length that are contained in Y and Z, respectively. 6Intuitively, a skew circuit that computes a large rank polynomial w. r. t. such a partition would try to pairwise correlate indices in the segments (of length ) on the two extremes.However, after having done this, it is still left with the task of computing a high rank polynomial w. r. t. an LRM partition, which we know to be a hard problem.We are now ready to tackle the problem of proving lower bounds for circuits of non-skew depth k.We choose our hard partition Π k = (Y k , Z k ) to have the form shown in Figure 4.That is, starting from the left, our partition assigns an initial segment of length roughly d/4 to Y k .The remaining indices are assigned to Y k and Z k in k pairs of segments of lengths roughly d/4k and d/2k , respectively, for k = O(k), so that overall we have Note that Π k is in particular an XLRM partition, and hence is clearly hard for skew circuits.We show that any small circuit C of non-skew depth at most k cannot compute a polynomial of large rank w. r. t.Π k .
To get an idea of the proof, consider first the easier case when the output of C is a non-skew homogeneous multiplication gate and hence C is a product of two homogeneous polynomials g 1 and g 2 that have small circuits of non-skew depth at most k − 1.In this case, the indices in [d] are distributed between g 1 and g 2 as shown in Figure 4. Now, as we have argued previously, if the polynomial f computed by C is to have rank nearly ) is the natural restriction of Π k to the indices corresponding to g i for i ∈ [2].For this to occur, however, we must have |Y k,i | ≈ |Z k,i | for each i, since otherwise for some i, we will have |Z k,i | much smaller than |Y k,i |, and then rank( for each i, then the only possibility is that one of g 1 or g 2 , say g 1 for concreteness, has very small degree and the other "occupies" almost all the indices in [d] and is hence already computing a polynomial of large rank w. r. t.Π k .Since g 2 has a small circuit of non-skew depth at most k − 1, this allows us to induct on g 2 .The general case puts together a couple of arguments we have already outlined.Using a decomposition lemma that is similar in spirit to the skew circuit decomposition lemma described above, we can show that any homogeneous polynomial f of degree d computed by a small circuit of non-skew depth at most k can be written as a small sum of polynomials of the form where g 1 and g 2 are homogeneous polynomials computed by small circuits of non-skew depth at most k − 1 and h has a small skew circuit.In the easy case above, we have already handled the case when deg(h) = 0, and so now we try to see how h can help produce a polynomial of large rank w. r. t. the partition Π k .As in the proof of the hardness of XLRM partitions, one would guess that the worst that h could do is to match up the d/2k indices in Y and Z on either extreme.In this case, we can argue as in the easier case above that one of g 1 or g 2 occupies all that is remaining, which corresponds to a partition that is hard for non-skew depth at most k − 1, as desired.
As might be expected, the actual proof is not quite as neat, since we need to handle some other cases that we have not describe above.It turns out, however, that these cases are easy, even if somewhat tedious, to handle.

Definitions
Throughout the paper, we refer to a fixed set X = {x 1 , . . ., x n } of non-commuting variables.We work with the ring F X of polynomials in our non-commuting variables.We start by recalling the definition of an algebraic branching program.Definition 3.1 (ABPs [15]).An Algebraic Branching Program (ABP) is a directed acyclic graph with one vertex of in-degree zero, called the source, and a vertex of out-degree zero, called the sink.The vertices of the graph are partitioned into levels numbered 0, 1, • • • , d. Edges may only go from level i to level i + 1 for i ∈ {0, • • • , d − 1}.The source is the only vertex at level 0 and the sink is the only vertex at level d.Each edge is labeled with a homogeneous linear form in the input variables.The size of the ABP is the number of vertices.
For i, j ∈ N, we define [i, j] to be the set {i, i + 1, . . ., j}. (This set is empty if i > j.)We also use the standard notation [i] to denote the set [1, i].
For d ∈ N, we use M d (X) to denote the set of monomials of degree exactly d over the variables in X.
Definition 3.2 ( j-products).Given homogeneous polynomials g, h ∈ F X of degrees d g and d h , respectively, and an integer j ∈ [0, d h ], we define the j-product of g and h, denoted g × j h, as follows.
• When g and h are monomials, then we can factor h uniquely as a product of two monomials h 1 h 2 such that deg(h 1 ) = j and deg(h 2 ) = d h − j.In this case, we define g × j h to be • The map is extended bilinearly to general homogeneous polynomials g, h.Formally, let g, h be general homogeneous polynomials, where g = ∑ g , h = ∑ i h i and g , h i are monomials of g, h respectively.For j ∈ [0, d h ], each h i can be factored uniquely into THEORY OF COMPUTING, Volume 12 ( 12), 2016, pp.1-38 Note that g × 0 h and g × d h h are just the products g • h and h • g, respectively.
The following easily verifiable facts about j-products will be useful.
1.The operator × j is bilinear, i. e., (g such that all the above expressions are well defined.
2. Assume g and h are such that g × j h is defined and let f be a homogeneous polynomial of degree d.Then (g 3. Assume g and h are as above and further that g we denote by m S the product of all the variables in the locations indexed by S, i. e., m S = ∏ j∈S x i j where the product is taken in increasing order of j. Let Π denote a partition of We will use the rank of the matrix M[ f , Π] (for a suitably defined Π = (Y, Z)) as a measure of the complexity of f .Note that since the rank of the matrix is at most the number of rows, we have for any As in many papers on multilinear formulas and circuits [16,17,18,19,20,8], we will be interested in how close the rank of M[ f , Π] can be to this trivial upper bound.
Clearly, rel-rank( f , Π) ∈ [0, 1] for any f and Y as above.Furthermore, note that since rank(M[ f , Π]) is also bounded by n |Z| , the number of columns in the matrix, when |Y | > d − |Y |, this measure cannot approach 1 for any choice of f .THEORY OF COMPUTING, Volume 12 (12) The non-skew depth of a non-commutative circuit C is the maximum number of non-skew gates on a path from a variable to the output gate in the DAG underlying C.

Preliminaries
We need the following lemmas that are straightforward adaptations of previous work.Lemma 4.1 (Homogenization Lemma [9]).Let f ∈ F X be a homogeneous polynomial of degree d computed by a non-commutative circuit C of size s.Then there is a homogeneous non-commutative circuit C of size at most O(sd 2 ) computing f .Moreover, if C has non-skew depth at most k, then so does C .In particular, if C is a skew circuit, then so is C .Lemma 4.2 (Tensor Lemma).Let g, h ∈ F X be homogeneous polynomials of degrees d g and d h , respectively, and let f = g × j h for j ∈ [0, where Π g , Π h are as defined in Section 3.
Proof.We observe that under a suitable labelling of the rows and columns of the matrices, the ma- where ⊗ represents the standard tensor (or Kronecker) product of matrices.This will prove the lemma.
By the bilinearity of both the ⊗ and × j maps, it suffices to do this when g and h are both monomials.In this case, M[g, Π g ] is a 0-1 matrix with a 1 only in the (g Y g , g Z g )-th entry and similarly for M[h, Π h ].Since f is also a monomial, the matrix M[ f , Π] is also a 0-1 matrix with a 1 only in the ( f Y , f Z )-th entry according to the original labelling.Under our alternate labelling of M[ f , Π], this corresponds to the This completes the proof of the lemma.

Hard polynomials
Let w = (w 1 , w 2 , . . ., w d ) be a string in [n] d and let w R = (w d , w d−1 , . . ., w 1 ) denote the reverse of the string.Let xw denote the monomial x w 1 x w 2 . . .x w d over the variable set X = {x 1 , x 2 , . . ., x n }.We consider the n-variable palindrome polynomial, defined below.
Nisan [15] studied the palindrome polynomial for n = 2.We denote by PAL 2 d (X) the squared palindrome polynomial.
5 Lower bound for skew circuits In this section, we prove an exponential lower bound for skew circuits.We start by giving a decomposition lemma for such circuits.A similar decomposition was given by Nisan [15] for non-commutative ABPs.
More recently Hrubeš et al. [9] proved a decomposition lemma for general non-commutative circuits.
Our result can be thought of as an interpolation between the decomposition for ABPs and that for general non-commutative circuits.
We then formally define left-right monochromatic (LRM) partitions and prove that any skew circuit of "small" size has "small" relative rank with respect to LRM partitions.Finally, we give an explicit polynomial which has full relative rank with respect to a suitably chosen LRM partition.This gives a lower bound for skew circuits.
Let us now give a decomposition lemma for skew circuits.We will prove two other decomposition lemmas and, though they take slightly different forms, they can all be presented with similar arguments, as ways of grouping the monomials computed by a circuit.We will use the notion of parse trees from [14] to describe how a circuit computes monomials.
Definition 5.1.The set of parse trees of a circuit C is defined by induction on its size.
• If C is of size 1 it has only one parse tree: itself; • if the output gate of C is a +-gate whose arguments are the gates α and β , the parse trees of C are obtained by taking either a parse tree of the subcircuit rooted at α and the arc from α to the output or a parse tree of the subcircuit rooted at β and the arc from β to the output; • if the output gate of C is a ×-gate whose arguments are the gates α and β , the parse trees of C are obtained by taking a parse tree of the subcircuit rooted at α, a parse tree of a disjoint copy of the subcircuit rooted at β , and the arcs from α and β to the output.
A parse tree T computes a polynomial val(T ) in a natural way: this is the monomial equal to the product of the variables labeling the leaves of T (from left to right).So parse trees are in one-to-one correspondence with the monomials computed by the circuit (before regrouping), and summing the values of the parse trees thus yields the computed polynomial.
Proof.The polynomial f is the sum of the values of all the parse trees of C. Parse trees are obtained by starting at the root and following along exactly one argument when encountering an addition gate and along both arguments when encountering a multiplication gate.In a skew circuit at any multiplication gate one argument will be an input gate, so the degree decreases by at most 1 and the parse tree looks like a path with dangling input gates on the left or on the right (we will call such a parse tree path-like).Therefore THEORY OF COMPUTING, Volume 12 (12), 2016, pp.1-38 any parse tree will reach a unique gate of degree d (to get unicity if d = 1, at the last multiplication gate we choose the left argument).We will stop building our parse tree once such a gate is found and consider the resulting partial parse tree.Let α 1 , . . ., α t be the gates of C of degree d .Consider a partial parse tree T stopping at α i .It is possible that different partial parse trees will "stop" at α i , and each will compute a monomial of the form L • g i • R, where L (respectively R) is the monomial obtained by the multiplications by input gates on the left side (respectively right side) in the parse tree.We can thus partition the set of parse trees depending on which gate of degree d it stopped at.We can further partition it with regard to the degree of L. Grouping monomials according to this partition we get the desired decomposition.Lemma 5.4 (Main Lemma: Relative rank of skew circuits).Let f ∈ F X be a homogeneous polynomial of degree d ∈ N computed by a homogeneous skew circuit C of size s.For any rel-rank(g i × j h i, j , Π) . (5.1) Y or Z: Y : Fix any (i, j) and consider rel-rank(g i × j h i, j , Π).By Corollary 4.3, we have where 2), we see rel-rank(g i × h i, j , Π) ≤ n −D and hence by (5.1), we have the claimed upper bound on rel-rank( f , Π).A similar theorem can be proved for the Lifted Identity polynomial of Hrubeš et al. [9]: where, for (e 1 , . . ., e 2r ) ∈ {0, 1} 2r , z e = z e 1 • • • z e 2r .For the partition Π defined above, M[LID r , Π] is a square permutation matrix, since choosing the prefix and suffix of degree r defines a unique monomial appearing in LID r , and its relative rank is therefore 1.
A natural generalization of the skew circuits is the class of circuits wherein each multiplication gate has a certain bound on the degree of one of its arguments.We call such circuits δ -unbalanced.Formally, δ -unbalanced circuits can be defined as follows.
Definition 5.6.A circuit is called δ -unbalanced if every multiplication gate has an argument of degree at most δ .
In the following corollary we observe that our exponential lower bound on skew circuits can also be extended to δ -unbalanced circuits.For instance, it yields an exponential lower bound for the computation of PAL 2 d/4 (X) by circuits where every multiplication gate has an input of degree at most d/5.Proof sketch.The corollary follows by the observation that any δ -unbalanced circuit can be converted into a skew circuit with O(n δ ) loss in size.Let g = g 1 × g 2 be a multiplication gate where (without loss of generality) degree of g 1 is at most δ .Then one can write down g 1 as a sum of monomials g 1 = ∑ t i=1 m i , where the degree of each m i is at most δ and t = O(n δ ).As g = ∑ t i=1 m i × g 2 , it can be computed as a sum of t terms of the form m × g 2 , where m is a monomial of degree at most δ .It is easy to see that each m × g 2 can be computed by a skew circuit (with a loss of additional O(δ )).

Lower bounds for circuits with small non-skew depth
Recall that the non-skew depth of a non-commutative circuit is the maximum number of non-skew gates on a path from a variable to the output gate in the DAG underlying the circuit.We call a gate v in C top-most if there is a path from v to the output gate in C that does not pass through any non-skew gates other than possibly v itself.
6.1 A decomposition lemma for circuits of non-skew depth k Lemma 6.1 (Decomposition Lemma for non-skew circuits).Let f ∈ F X be a homogeneous polynomial of degree d computed by a non-skew homogeneous circuit C of size s.Let g 1 , . . ., g t (t ≤ s) be the polynomials computed by the top-most non-skew gates in C and let d i = deg(g i ) for i ∈ [t].Then there exist homogeneous polynomials h i, j (i Furthermore, each h i, j and h 0 can be computed by a homogeneous skew circuit of size at most sd.
Proof.For the decomposition, we will give a proof sketch in the spirit of the proof given for Lemma 5.2.Any given parse tree of C is path-like until either it reaches exactly one of the top-most non-skew gates or it ends with a multiplication of two input gates.Collecting the values of the parse trees in the latter case yields the polynomial h 0 , while we can as before partition the remaining parse trees depending on the top-most gate reached and the degree of the monomial multiplied on the left.
Let us now show that the resulting polynomials h i, j and h 0 can each be computed by a homogeneous skew circuit of size at most sd.Let α 1 , . . ., α t be the top-most non-skew gates.When a parse tree does not stop at one of these gates, it must end at a multiplication of two input gates.We will call β 1 , . . ., β u the set of these multiplication gates.We start by replacing α 1 , . . ., α t by input gates labelled with new variables y 1 , . . ., y t .Setting all these variables to 0 yields a circuit for h 0 .
We set all the y 1 , . . ., y t to 0 except y i , set the gates β 1 , . . ., β u to 0, and delete gates taking the value 0. This yields a skew circuit C computing ∑ j∈[0,d−d i ] y i × j h i, j : since a parse tree cannot both contain an α gate and a β gate, setting the β gates to 0 does not modify the rest of the computation.Note that, apart from input gates which are arguments of skew multiplications, all the gates in this circuit belong to a path from α i to the output.In particular the arguments of any addition gate belong to paths from α i to the output.
We now build a new circuit by replacing each gate γ on a path from α i to the output, i. e., each gate which is not an input gate argument of a skew multiplication, by a set of "component" gates.More precisely, we replace each such gate γ by gates γ 0 , . . ., γ d−d i and we will think of the gate γ k as representing the sum of the monomials computed by γ where the degree to the left of y i is k.Thus α i is replaced by a first gate labelled y i and d − d i gates labelled 0, since the polynomial y i has one monomial with degree 0 on the left of y i and no monomials with another degree on the left.The circuit C is then modified by induction to compute the desired values.
Let γ be a multiplication gate with left argument δ and right (skew) argument an input gate labelled x.Then gate γ k of the new circuit computes δ k × x.
Let γ be a multiplication gate with right argument δ and left (skew) argument an input gate labelled with a constant c.Then gate γ k of the new circuit computes c × δ k .
Let γ be a multiplication gate with right argument δ and left (skew) argument an input gate labelled with a variable x.Then gate γ k of the new circuit computes x × δ k−1 .
Finally addition gates are made component-wise.Replacing y i by 1, the j-th component of the output gate computes h i, j .The size of the original circuit, s, has been multiplied by at most d − d + 1, for a total size at most sd.

More partitions with respect to which small skew circuits are low rank
For any n ∈ N + and θ ∈ R, we use exp n (θ ) to denote n θ .Definition 6.2.We say that a partition Y or Z: Y : where D denotes min{d 1 , d 2 }.
We will only apply the above lemma when d 2 = Θ(d 1 ) and 2 = O(d 1 ), in which case the upper bound on rel-rank( f , Π) is The idea of the proof is simple.When 2 = 0, we have a (d 1 , d 2 )-LRM partition and we are done.If that is not the case, we use induction on 2 .We first apply Lemma 5.2 to decompose f as a sum of a small number of polynomials of the form g × j h where g has degree roughly d − (D/2): if the partition corresponding to h takes (roughly) as large a chunk out of the 1 length initial segment as it takes out of the final 2 length segment, we can use the induction hypothesis and we are done; otherwise, the partition corresponding to h has many more elements of Y than Z and we are done since the relative rank of h w. r. t. this partition is small.
Proof.We start with defining some parameters.Let ∆ = D/2 and r = min Also, let δ = (∆ − r)/2.Note that ∆/4 ≤ δ ≤ ∆/2.We prove a more general statement that is amenable to induction.For any integer i ≥ 0, we show that for d 1 , d 2 , 1 , 2 as in the statement of the lemma additionally satisfying 2 ≤ iδ and i ≤ d 1 /2r, then the maximum possible relative rank of f w. r. t.Π, which we denote by ρ( 1 , 2 , d 1 , d 2 ), can be bounded by We will prove the above by induction on i.First, we note that it implies the lemma.For i = 4 2 /∆ , we have both 2 ≤ iδ (using δ ≥ ∆/4) and also i ≤ d 1 /2r by choice of r.Hence, (6.1) implies the statement of the lemma.
The base case is i = 0, which corresponds to 2 = 0 and follows directly from Lemma 5.4 since the partition Π is (d 1 , d 2 )-LRM.
For the inductive case, consider any i ≥ 1.We apply Lemma 5.2 to the circuit C with d = ∆.For some t ≤ s, we obtain where g 1 , . . ., g t are the intermediate polynomials of degree d − ∆ computed by C (and hence themselves are computed by skew circuits of size at most s).
We have rel-rank( f , Π) ≤ ∑ i, j rel-rank(g i × j h i, j , Π) by the subadditivity of relative rank and hence it suffices to bound each rel-rank(g i × j h i, j , Π).We analyze this term in two ways depending on j.
The easier case is when j ≥ ∆ − j + r.In this case, it can be seen that the partition r and hence by Corollary 4.3, we have rel-rank(g i × h i, j , Π) ≤ rel-rank(h i, j , Π h ) ≤ n −r for each such j.Now we consider the case when j ≤ ∆ − j + r.Note that in this case, we have j ≤ (∆ + r)/2 and hence ∆ − j ≥ (∆ − r)/2 = δ .For each such j, we see that the partition Π g corresponding to g i (i.e., )-XLRM by "moving" some of the degree from the "d 1 part" to the " 1 part."As noted above, we have ∆ − j ≥ δ and hence 2 − (∆ − j) ≤ iδ − δ = (i − 1)δ .Also, as j ≥ (∆ − j) + r, we have and the induction hypothesis along with Corollary 4.3 can be applied to yield rel-rank(g i × h i, j , Π) ≤ rel-rank(g i , Π g ) ≤ (sd) 1+(i−1) • n −r .

The candidate hard partition for circuits of non-skew depth at most k
Throughout, let d 0 ∈ N + be a fixed parameter.
Let d ∈ N. Given an (ordered we define the signature of Π to be the sequence sgn(Π) = σ = (i 1 , i 2 , . . ., i p ) of non-negative integers such that the first i 1 elements of [d] belong to Y , the next i 2 elements belong to Z, the next i 3 again to Y , and so on.Formally, We denote by |σ | the quantity ∑ q≤p i q = d and use |σ | 0 to denote p.
Given two signatures σ 1 ∈ N n and σ 2 ∈ N m , we use σ 1 • σ 2 ∈ N m+n to denote their concatenation.We also use σ r 1 to denote the r-fold repetition of σ 1 .Given a signature σ = (i 1 , . . ., i p ), we say that a signature τ is a prefix of σ if τ = (i 1 , . . ., i q ) for q ≤ p, where i j = i j for j < q and i q ≤ i q .
Clearly, we may define a partition Π of [d] using its signature.For any k ∈ N, we now define a partition (for suitable d) such that small circuits of non-skew depth at most k computing a homogeneous polynomial of degree d have low rank w. r. t.Π k .
Fix any k ∈ N and let D k = 8d 0 + 12d 0 k.We define the partition 9 illustrates the partition Π 0 and also the relation between the partitions Π k and Π k−1 , which will be important in our lower bound.We will later show that small circuits of non-skew depth at most k computing a homogeneous polynomial of degree D k cannot compute a polynomial that has high relative rank w. r. t.Π k .In the remainder of this section, we show that there are small circuits of non-skew depth O(k) (in fact, circuits using only O(k) many non-skew gates) that can compute a homogeneous polynomial f k of degree D k that has full rank w. r. t.Π k .The basic "gadget" in this construction is the palindrome polynomial, and the construction of f k involves "wrapping" a copy of PAL D k /4 (X) around O(k) copies of PAL d 0 (X).Proof.We define the polynomials f k inductively.For k = 0, we define In the notation of Section 4.1, we can write f 0 as Figure 10 illustrates the positioning of the segments of the monomial corresponding to w 1 , w 2 , w 3 , and w 4 w.r. t. the partition Π 0 .
The construction of polynomials f 0 (above) and f k from f k−1 (below).
We observe that f 0 can be computed by a homogeneous non-commutative arithmetic circuit of size O(nD 0 ) = O(nd 0 ) with exactly one non-skew gate.To see this, note that g 0 := (PAL 2d 0 (X) • PAL d 0 (X)) can be computed by first computing each of the terms of the product using homogeneous skew circuits of size O(nd 0 ) and then multiplying them using exactly one non-skew gate.We can then compute f 0 by using g 0 and only homogeneous skew multiplication gates by using the following inductive definitions.
THEORY OF COMPUTING, Volume 12 (12), 2016, pp.The polynomial g 0 requires only O(n) additional gates.Thus, the size of the circuit computing f 0 is O(nd 0 ).
For k > 0, we define the polynomial f k inductively as follows.The construction is illustrated in Figure 10.
It can be easily checked that the matrix We need to check that f k defined as above has a small non-commutative circuit with O(k) many non-skew gates.For k ≥ 1, we define The circuit for h k is obtained from the circuit for f k−1 in a manner similar to the construction of the circuit for f 0 , and similarly, we can obtain a circuit for g k and then a circuit for f k .We omit the details.It is easy to check that only 3 additional non-skew multiplication gates are used by the above procedure and hence the number of non-skew gates used overall is O(k).

The lower bound for circuits of non-skew depth k
In this section, we show that small non-commutative circuits of non-skew depth k computing a homogeneous polynomial of degree D k cannot compute a polynomial that has high relative rank w. r. t.Π k .Throughout, let d 0 ∈ N be a fixed parameter.
For ∈ N + , we say that a pair (g, , c) where • a ≥ 3( + 1)d 0 , r ≥ 0, and Intuitively, the (g, Π) being -good means that D ≥ D and Π "contains" a copy of Π as a sub-segment and Π is furthermore similarly contained in Π for some ≥ .See Figure 11, where the top partition corresponds to the case c = 0 and the bottom one to the case b = d 0 as mentioned above.
The main lemma is the following.The basic idea of the proof is to repeatedly use Lemma 6.1 to decompose the polynomial f as a sum of polynomials computed by circuits with smaller non-skew depth.When we apply Lemma 6.1, we repeatedly obtain polynomials of the form g × j h where g and h are homogeneous polynomials of degrees d g and D k − d g , respectively, and and ∈ [0, k], we say that the pair (g, j) is -admissible if the pair (g, Π g ) is -good, where Proof.First let us introduce some notation.Let the non-skew depth of a node v of C be the maximum number of non-skew gates on any path from a leaf to v.For ∈ [k], let G (resp.G = ) be the set of all polynomials computed by gates in the circuit that have non-skew depth at most (resp.exactly ); note that |G = | ≤ |G | ≤ s.We also denote by A the set {(g, j) | g ∈ G and (g, j) is -admissible}.Finally, we define V by Note that V ⊆ F X is a vector space over F. Our proof proceeds in two steps.
1. We first show that for each ∈ [0, k], the polynomial f can be decomposed as f = p + e where p ∈ V and e is such that rel-rank(e , Π k ) is small.The proof is by downward induction on .
2. We then show that rel-rank(p 0 , Π k ) is small for each p 0 ∈ V 0 .Along with the above decomposition, this will finish the proof.
We start with 1. above.Formally, we prove that there are absolute constants α, β > 0 such that for each ∈ [0, k], the polynomial f can be written as THEORY OF COMPUTING, Volume 12 (12), 2016, pp.where p ∈ V and e ∈ F X is homogeneous of degree D 0 and satisfies The proof is by downward induction on .We will choose α, β so that they satisfy some constraints that come up during the course of the proof.The base case when = k is trivial, since we can choose p k = f ∈ V k and e k to be the zero polynomial.Both (6.3) and (6.4) are thus satisfied for any choice of α, β .Now for the induction case.Say that ∈ [0, k − 1].By the induction hypothesis we have f = p +1 + e +1 , where By the definition of V +1 , we know that where (Here, we have used the fact that if (g, j) is ( + 1)-admissible and g ∈ G , then (g, j) is also -admissible.) As noted above, the terms corresponding to (g, j) ∈ A already sum to a polynomial p +1 ∈ V .To prove the induction statement (6.3), it therefore suffices to decompose each polynomial g × j H g j where (g, j) ∈ A +1 .To do this, we need the following claim, whose proof is deferred.Claim 6.6.Fix any ∈ [k].Also fix any g ∈ G = of degree d g ∈ [D , D k ], any homogeneous polynomial H ∈ F X of degree D k − d g , and j such that (g, j) is -admissible.Then the polynomial g × j H can be decomposed as g × j H = p + e where p ∈ V −1 and e ∈ F X is homogeneous of degree D k and satisfies Applying the above claim (with replaced by + 1) to each pair (g, j) ∈ A +1 from the right hand side of (6.5), we obtain for each such (g, j) that g × j H g j = p g j + e g j where p g j ∈ V and rel-rank(e g j , Π k ) ≤ (sD k ) α 1 • n −β 1 d 0 for suitably large α 1 > 0 and small β 1 > 0. Substituting in (6.5), we get Note that p ∈ V (since V is a vector space).Also, as Setting p as above and e = e +1 + e , we have the required decomposition.The inequality (6.4) follows since rel-rank(e , Π k ) ≤ rel-rank(e +1 , Π k ) + rel-rank(e , Π k ).This finishes the proof of the induction.
Thus, for = 0, we have f = p 0 + e 0 for some p 0 ∈ V 0 and rel-rank(e 0 , Π 0 we only need to bound rel-rank(p 0 , Π k ).Since p 0 ∈ V 0 , we have To analyze rel-rank(p 0 , Π k ), we will need the following claim, the proof of which is also deferred.
Fix (g, j) ∈ A 0 and consider the polynomial g × j H g j in the right hand side of (6.6).By Claim 6.7 and using the fact that g is computable by a skew circuit of size at most s, we know that rel-rank(g which finishes the proof of the lemma.
It remains to prove the two claims used in the proof of Lemma 6.5.We prove Claim 6.7 first and then Claim 6.6.
Proof of Claim 6.7.We first prove Part (a) of the claim.Since (h, Π h ) is 0-good, we have sgn(Π h ) = (a, 2d 0 ) • (d 0 , 2d 0 ) 1+r • (b, c), for a ≥ 3d 0 , r ≥ 0 and b, c such that either c = 0 and b ∈ We need to show that rel-rank(h, We divide the analysis into the following cases (see also Figure 13).
Part (b) of the claim follows from Part (a) as follows.Let Since (h, j) is 0-admissible, we know that (h, Π h ) is 0-good.By Corollary 4.3, we have where the last inequality follows from Part (a).Proof of Claim 6.6.
To do this, consider the subcircuit C g of C that computes g.Since g is at non-skew depth , we may assume that C g has non-skew depth also by removing gates at larger non-skew depths.Recall that C and hence C g has size at most s.
By applying Lemma 6.1 to the polynomial g, we can see that where g 1 , . . ., g t are the polynomials computed by the top-most non-skew gates in C g and Further, each of the h i,m and h 0 have skew circuits of size at most sd g ≤ sD k .Thus, we have We argue that polynomial on the right hand side of (6.8) either belongs to V −1 or has relative rank at most (sD k ) O (1) • n −Ω(d 0 ) w. r. t.Y k .Since V −1 is a vector space and rel-rank(•, Π k ) is subadditive, this will complete the proof.
First we consider the polynomial h 0 × j H.Note that (h 0 , j) is -admissible (since (g, j) is) and hence it is also 0-admissible.Moreover, h 0 is computable by a skew circuit of size at most sD k .Hence, by Claim 6.7, we have rel-rank which completes the analysis of this term.Now consider any polynomial q i,m := (g i × m h i,m ) × j H appearing in (6.8).For notational simplicity, we let d g := d i = deg(g i ) and d h := deg(h i,m ) = d g − d i .We will show that either q i,m ∈ V −1 or rel-rank(q i,m , Π k ) is small; to prove the latter, we will use the following inequalities which follow from Lemma 4.2 and Corollary 4.3: where Π g = (Y g , Z g ) and Π h = (Y h , Z h ) are the natural restrictions of Π g to g i and h i,m , respectively.That is, The upper bound on rel-rank(q i,m , Π k ) is based on a case analysis.We refer the reader to the accompanying figures for an intuitive description of each case.
where a g ≥ (3 + 1/2)d 0 and σ is some signature: in particular, d g ≥ D −1 + d 0 /2.In what follows, we will argue that either g i has low relative rank w. r. t.Π g or q i,m ∈ V −1 .
Since g i is computed by a top-most non-skew gate in the circuit C g , we can write g i = g i,1 • g i,2 where g i,1 and g i,2 are homogeneous polynomials computed by homogeneous circuits of size at most s and non-skew depth at most − 1.Let e 1 and e 2 = d g − e 1 denote the degrees of g i,1 and g i,2 , respectively.Let Π g,1 = (Y g,1 , Z g,1 ) and Π g,2 = (Y g,2 , Z g,2 ) be the induced partitions on g i,1 and g i,2 , respectively, i. e., Our analysis is further divided into two cases depending on e 1 .
Hence, (g i,2 , Π g,2 ) is ( − 1)-good.Thus, the polynomial q i,m (which by Fact 3.3 can be written as g i,2 × j 2 H 2 for some homogeneous polynomial H 2 of degree D k − d g,2 and some j 2 ) belongs to V −1 and hence we are done.
for some signature σ , then as in the previous case, we have q i,m = g i,1 × j 1 H 1 for some suitable H 1 and j 1 , and hence q i,m ∈ V −1 .
Π −1 . . .(ii) d g < d 0 /2: In this case, it can be checked that d h ≥ D −1 and (h, sgn(Π h )) is ( − 1, d h )-good and hence also (0, d h )-good.Thus, we have where the second equality uses Fact 3.3, the first inequality uses two applications of Corollary 4.3, and the last inequality follows from Part 1 of Claim 6.7.
Instead of going through the explicit case analysis, we refer the reader to Figure 17 for the various cases that can occur.It can be checked that in each of these, the resulting partition     by noting that irrespective of the placing of g i , the partition 5d 0 and some 2 ≤ (4 + (1/2))d 0 ≤ 1 and using Lemma 6.3.See Figure 18.
The main lower bound for non-commutative circuits of small non-skew depth follows.Let C be any non-commutative circuit of non-skew depth at most k computing f and let s denote the size of C. By Lemma 4.1, we know that there is also a homogeneous circuit C of non-skew depth at most k and size at most sd O (1) computing f .Thus, Lemma 6.5 implies that As rel-rank( f , Π k ) = 1, we have the required lower bound on s.Remark 6.8.The divisibility constraints on the degree in the statement of Theorem 1.2 can easily be removed at the expense of additional constant factors in the exponent in the lower bound.For example if the degree d does not have the required form, then we can find the largest d 1 ≤ d of the required form and consider the polynomial F = f k • z d−d 1 where z is a new variable and f k is the hard polynomial of degree d 1 as defined above.If F has a circuit of non-skew depth k of size s, then so does f k , which yields s ≥ n Ω(d 1 /k) .Since d 1 = Ω(d), this yields an n Ω(d/k) bound.

Lower bound for the determinant and permanent
Nisan's lower bounds from [15] held not only for the palindrome polynomial seen above, but also for the permanent and the determinant polynomials, because it is easy to see that their partial derivative matrices have high rank.In our case, we could also try to study the rank of the permanent or the determinant, using our version of the partial derivative matrix.However it is simpler to use the fact that the permanent and determinant can easily express the palindrome polynomial.
Recall that the non-commutative (Cayley) determinant and permanent of an n × n matrix of variables X = (X i, j ) i, j∈[n] are defined as follows.
That is, we just take the commutative determinant and permanent and make it non-commutative by ordering the variables in each monomial in increasing order of the rows in which they appear.Lemma 7.1.Let P d be the 2d × 2d matrix with x 0 on the diagonal, x 1 on the anti-diagonal, and 0 everywhere else.Let D d be the 2d × 2d matrix with x 0 on the diagonal, x 1 on the first d positions of the anti-diagonal and −x 1 on the last d positions of the anti-diagonal.Then PAL d (x 0 , x 1 ) = per P d = det D d .
Proof.The permanent of P d can be obtained by choosing in each row of P d a column index, while ensuring that each column index is taken only once; multiplying the values obtained; and then adding the results for all possible choices.Since there are only two non-zero values per row, for the row i (with 1 ≤ i ≤ d), we can either choose the index i with value x 0 or the index 2d + 1 − i with value x 1 .In the first case, the column of index i is now forbidden and therefore for the row 2d + 1 − i the only available non-zero value is x 0 with the column index 2d + 1 − i.In the the second case, the column of index 2d + 1 − i is now forbidden and therefore for the row 2d + 1 − i the only available non-zero value is x 1 with column index i.
For the determinant, note that the above reasoning shows that a permutation yielding a non-zero value is a combination of fixed points (when choosing the value x 0 at row i in column i one must then choose value x 0 at row 2d + 1 − i in column 2d + 1 − i) and transpositions (when choosing the value x 1 at row 2d + 1 − i in column i one must then choose value x 1 at row i in column 2d + 1 − i).Therefore adding a minus sign to the last d values x 1 cancels out the sign of the permutation in the determinant.Proof.Let us show the corollary for the permanent only, since the case for the determinant is similar.We will show that there exists a matrix P k such that the permanent of P k is f k , where f k is f k but built with the 2-variable palindrome polynomial (n = 2).We will follow the construction of f k from the proof of Lemma 6.4.Lemma 7.1 shows that there exists a matrix of order d 0 whose permanent is PAL d 0 (x 0 , x 1 ).To get f 0 from this polynomial, or to go from f k−1 to f k we basically need two types of steps.
1. Computing the product of two previously obtained polynomials.If we have already built two matrices M and N whose permanents are f and g, respectively, then clearly f • g is the permanent of the block diagonal matrix with M and N on the diagonal.The order of the block matrix is the sum of the orders of M and N.
2. Computing a j-product of a previously computed polynomial with a palindrome polynomial.If we have already built a matrix M whose permanent is the polynomial f , then we can build a matrix whose permanent is f × d 0 PAL d 0 (x 0 , x 1 ) by considering the block matrix where D is the order-d 0 matrix with x 0 on the diagonal and A is the order-d 0 matrix with x 1 on the anti-diagonal (the reasoning is similar to the one in the proof of Lemma 7.1).The order of this matrix is the order of M plus 2d 0 .
Thus f 0 is the permanent of a matrix of order 8d 0 and going from f k−1 to f k increases the size of the matrix by 12d 0 (refer once again to the proof of Lemma 6.4).The order of the matrix P k whose permanent is f k is thus d := D k = (8 + 12k)d 0 .By Theorem 1.2, any circuit of non-skew depth k for the permanent must have size 2 Ω(d 0 ) = 2 Ω(d/k) .
Remark 7.3.We note that a result similar to the one for the permanent proved above can be deduced from the VNP-completeness of the permanent, which also holds in the non-commutative setting as shown by Hrubeš, Wigderson, and Yehudayoff [10].However, by making the reduction explicit we gain slightly in terms of parameters and additionally, a very similar proof works also for the determinant.
8 Full rank with respect to all partitions Our lower bound proofs have been based on showing that any arithmetic circuit of non-skew depth at most k cannot compute a polynomial that has large rank w. r. t. some fixed partition Π k .We can ask if this strategy can yield lower bounds for general non-commutative arithmetic circuits (i.e., with no restrictions on non-skew depth) as well.Our aim in this section is to show that the answer to this question is possibly no: we show that over any sufficiently large field F and any set X of n variables, there is a polynomial p ∈ F X that has non-commutative arithmetic circuits of polynomial size, but which furthermore satisfies the property that for all partitions Π = (Y, Z) with |Y | ≤ |Z|, rel-rank(p, Π) = 1.This shows that we cannot even hope to prove that for any polynomial p computed by a polynomial-size non-commutative circuit, there exists some partition with respect to which p has small rank.The proof follows closely a very similar construction due to Raz and Yehudayoff from [19] in the context of commutative multilinear circuits.
Notation.We first introduce some notation.Given a finite set S of even cardinality, we define an S-matching to be an unordered partition of S into sets of size two, i. e., M is an S-matching if M ⊆ S 2 and the sets in M partition S.
Fix any degree parameter d ∈ N that is even.For any i, j ∈ [d] with i < j and |[i, j]| = j − i + 1 even, we define a set M i, j of [i, j]-matchings as follows.The set M i, j is defined by induction on |[i, j]|.The base case is when j = i + 1 and in this case, we set M i, j = {{i, i + 1}}.In the case that j − i + 1 = 2 for > 1, we define the set M i, j as follows.
2 , we denote by λ M the product ∏ e∈M λ e .Finally, we define the polynomial p λ (where λ denotes the tuple (λ 1,2 , . . ., where p M is defined as follows.We will show that for any choice of λ e (e ∈ [d]  2 ), the polynomial p λ has a non-commutative circuit of size poly(n, d).On the other hand, if the field F is large enough, then there exists a choice of λ e (e ∈ [d]  2 ) such that for any partition Π = (Y, Z) with |Y | ≤ |Z|, rank(M[p λ , Π]) = n |Y | (i.e., rel-rank(p λ , Π) = 1).
The first lemma gives us the circuit upper bound.
Lemma 8.1.Fix any field F and d, n ∈ N such that d is even.For any choice of field elements λ e ∈ F (e ∈ [d]  2 ), the polynomial p λ has a non-commutative arithmetic circuit of size poly(n, d).
Proof.We first define several intermediate polynomials that are computed in the course of computing the polynomial p λ .For any i, j ∈ [d] such that i < j and := j − i + 1 is even, define the polynomial p λ i, j to be p λ i, j (X) = ∑ M∈M i, j where p M , for M ∈ M i, j is defined as Note that p λ is the same as p λ 1,d .Our circuit for p λ computes p λ i, j for each i, j ∈ [d].The construction is increasing order of the parameter .When = 2 (the smallest value possible), the polynomial is simply p λ i,i+1 = λ {i,i+1} ∑ x∈X xx, which can be computed by a circuit of size O(n).
Now say we have a circuit C of size S that computes p λ s,t when t − s + 1 < .To compute p λ i, j where j − i + 1 = , we use the following simple identity, which follows from the definition of M i, j : p λ i, j = ∑ j ∈{i+1,i+3,..., j−2} p λ i, j • p λ j +1, j + λ i, j ∑ x∈X x • p λ i+1, j−2 • x .
Since each of the polynomials p λ i, j , p λ j +1, j , and p λ i+1, j−2 have already been computed by the circuit C, the additional size required to compute p λ i, j is O(d + n).We continue this way until we have computed all the p λ i, j .The total number of pairs i, j is O(d 2 ) and hence the size of the circuit thus constructed is Lemma 8.3.Let d ∈ N be even and F be any field such that F is either infinite or |F| > d2 2d .Then there is a choice of field elements λ e ∈ F (e ∈ [d]  2 ) such that for any balanced partition Π, we have rel-rank(p λ , Π) = 1.
Proof.We fix any finite subset F ⊆ F of size at least d2 2d +1 and choose each λ e (e ∈ [d]  2 ) independently and uniformly at random from F. We will show that p λ (X) has the required property with non-zero probability over the choice of the λ e .
Fix any balanced partition Π = (Y, Z).We say that a [d]-matching M is good for Π if, for each i ∈ Y , there is a j ∈ Z such that {i, j} ∈ M.
We use the following simple fact about the set of matchings M 1,d .
Fact 8.4.For any balanced partition Π = (Y, Z), there is a matching M ∈ M 1,d that is good for Π.
By Fact 8.4, there is a matching M 0 ∈ M 1,d such that M 0 is good for Π.It follows then from the definition of p M 0 above that the matrix M[p M 0 , Π] is a permutation matrix and hence rank(M[p M 0 , Π]) = n d/2 .We argue that, with high probability over the choice of λ , this is true of the polynomial p λ as well.
In order to do this, we consider det(M[p λ , Π]).By the definition of p λ , we have Since M[p N , Π] is a 0-1 matrix for each N, we see that det(M[p λ , Π]) is a polynomial in λ e (e ∈ [d]  2 ) of degree at most d2 d .We claim that this polynomial is in fact non-zero.To see this, note that if we substitute λ e = 1 for e ∈ M 0 and 0 for e ∈ M 0 in the above expression for M[p λ , Π], we obtain the matrix M[p M 0 , Π]; hence, under this substitution, the polynomial det(M[p λ , Π]) takes the value det(M[p M 0 , Π]) which is non-zero since M 0 is a permutation matrix.We have thus shown that det(M[p λ , Π]) is a non-zero polynomial in λ e (e ∈ [d]  2 ).Since the degree of this polynomial is at most d2 d , for λ e uniformly randomly chosen from F, we have by the Schwartz-Zippel lemma [21,24] Pr since |F| > d2 2d .Union bounding over the d d/2 ≤ 2 d choices for Π, we see that with probability greater than 0 over the choice of λ , we have det(M[p λ , Π]) = 0 for each balanced partition Π and hence, rel-rank(p λ , Π) = 1 for every balanced partition Π. Theorem 8.5.Let d ∈ N be even and F be any field such that F is either infinite or |F| > d2 2d .Let X be any set of n variables.Then there is a homogeneous polynomial p ∈ F X of degree d such that p has a circuit of size poly(n, d) but given any partition Π = (Y, Z) such that |Y | ≤ |Z|, we have rel-rank(p, Π) = 1.

Figure 1 :
Figure 1: The partition Π 0 and the set I g .

d 1 d 2 Y
or Z: Y :

Figure 2 :
Figure 2: Left-right monochromatic (LRM) partitions, where segments on both the left and right ends are contained in Y .

d 1 d 2 Y
or Z: Y :Z:

Figure 5 :
Figure 5: j product for monomials g, h.
[d] given by an ordered pair (Y, Z), where Y ⊆ [d] and Z = [d] \Y .In what follows we only use partitions of sets into two parts.Definition 3.4 (Partial derivative matrix).Let f ∈ F X be a homogeneous polynomial of degree d.Given a partition Π = (Y, Z) of [d], we define a n |Y | × n |Z| matrix M[ f , Π] with entries from F as follows.The rows of M[ f , Π] are labelled by monomials from M |Y | (X) and the columns by elements of M |Z| (X).Let m ∈ M |Y | (X) and m ∈ M |Z| (X); the (m , m )-th entry of M[ f , Π] is the coefficient in the polynomial f of the unique monomial m such that m Y = m and m Z = m .
we have labellings from the definitions of these matrices, i. e., the rows and columns of M[ f , Π f ] are labelled by elements of M |Y f | and M |Z f | , respectively; and similarly for M[g, Π g ] and M[h, Π h ].For M[ f , Π], we note that each monomial m ∈ M |Y | can be identified with a pair of monomials (m , m ) of degree |Y g | and |Y h |, respectively, using the map m → (m Y ∩I , m Y \I ); this map is a bijection and hence, we also have an alternate labelling of the rows of M[ f , Π] by M |Y g | × M |Y h | ; similarly, we also obtain a labelling of the columns of M

Corollary 4 . 3 .
Assume that f ,Y, d g , d h are as in the statement of Lemma 4.2.Then

Lemma 5 . 2 (
Decomposition Lemma for skew circuits).Let f ∈ F X be a homogeneous polynomial of degree d ∈ N computed by a homogeneous skew circuit C of size s.Fix any d ∈ [d].Let g 1 , . . ., g t (t ≤ s) be the intermediate polynomials of degree d computed by C. Then there exist homogeneous polynomials

Proof.
Assume that D = min{d 1 , d 2 }.Apply the Decomposition Lemma for skew circuits (Lemma 5.2) to C with d = d − D to get polynomials g i and h i, j for (i, j) ∈ [t] × [0, D] as in the statement of the lemma.By the subadditivity of rank, we have rel-rank( f , Π) ≤ ∑ (i, j)∈[t]×[0,D]

Figure 7 :
Figure 7: For fixed d 1 , d 2 , a generic positioning of g i of degree d in g i × j h i, j .

Theorem 1 . 1 [
Precise version].Any skew circuit for PAL 2 d/4 (X) must have size Ω(n d/4 ) where the Ω(•) hides poly(d) factors.Proof.Let C be any skew circuit computing PAL 2 d/4 (X) and let s denote its size.By Lemma 4.1, we know that there is a homogeneous circuit of size s = O(sd 2 ) computing the same polynomial.Let Y = [d/4] ∪ [3d/4 + 1, d], Z = [d] \Y , Π = (Y, Z).Note that Π is a (d/4, d/4)-LRM partition of [d].Apply Lemma 5.4 to the circuit C with d 1 = d 2 = d/4.The lemma implies that rel-rank PAL 2 d/4 (X), Π ≤ (s d) • n −d/4 .On the other hand, it is easy to verify that M[PAL 2 d/4 (X), Π] is a square permutation matrix and hence rel-rank(PAL 2 d/4 (X), Π) = 1, which implies the claimed lower bound on s.Remark 5.5.It is not hard to see that the lower bound of Theorem 1.1 is close to tight, since PAL 2 d/4 (X) does have a skew circuit of size O(n d/4 ).

Figure 15 :
Figure 15: The subcases in Case 1: The first figure represents Case 1(i), and the second and third represent Case 1(ii).

Figure 17 :
Figure 17: The subcases that can occur in Case 3.

Corollary 7 . 2 .
Let k, d ∈ N be any parameters such that (64(8 + 12k)) | d.Any circuit of non-skew depth k for the permanent or the determinant of an d × d matrix must have size 2 Ω(d/k) .

O(d 2 Lemma 8 . 2 .
(d + n)) = poly(n, d) .The second lemma tells us that it suffices to consider only balanced partitions (Y, Z), i. e., partitions such that |Y | = |Z| = d/2.Let d ∈ N be even.Let f ∈ F X be any homogeneous polynomial of degree d.If there is a partition Π = (Y, Z) with |Y | ≤ |Z| such that rel-rank( f , Π) < 1, then for any balanced partition Π = (Y , Z ) such that Y ⊇ Y , we have rel-rank( f , Π ) < 1. Proof.Consider the matrix M[ f , Π ].Each row is labelled by a monomial m of degree |Y |, which can be identified with a pair (m , m ) where m is the natural restriction of m to the locations in Y and m is the restriction to the locations in Y \Y .Fix any m and consider all the monomials m that give rise to this particular m .The resulting matrix has exactly n |Y | rows and n |Z | columns.Each column is labelled by a monomial m of degree |Z | and each row by a monomial m of degree |Y |.The (m , m )-th entry of the the matrix is the coefficient, in the polynomial f , of the monomial m which equals m when restricted to Y , equals m when restricted to Y \Y , and equals m when restricted to Z .It is not hard to check that this matrix is a submatrix of the matrix M[ f , Π] (obtained by removing some columns).Since rel-rank( f , Π) < 1, we have rank(M[ f , Π]) < n |Y | .Thus, for any fixed m , the rank of the submatrix obtained as above has rank < n |Y | .Since there are n |Y |−|Y | such matrices, the rank of M[ f , Π ] is strictly less than n |Y |−|Y | • n |Y | = n |Y | .Hence, we have rel-rank( f , Π ) < 1.
, 2016, pp.1-38 LOWER BOUNDS FOR NON-COMMUTATIVE SKEW CIRCUITS which in particular is (d 1 /2, d 2 /2)-LRM.In this case, by Corollary 4.3 and Lemma 5.4, we immediately get rel-rank(g 1, 8.2, and 8.3.GUILLAUME MALOD received his Ph.D. from Université Claude Bernard Lyon 1 in 2003; his advisor was Bruno Poizat, whose writing style he admires.His thesis focused on arithmetic circuits and coefficient functions and his scientific interests have not changed much since.He lives in Paris with his wife Asako and children Naoto and Miyuki, except when they enjoy Japan's hot and humid summer.He likes standing motionless or moving very slowly and cooking.He likes to fall asleep while listening to Anima.SRIKANTH SRINIVASAN got his undergraduate degree from the Indian Institute of Technology Madras, where his interest in the theory side of CS was piqued under the tutelage of N. S. Narayanswamy.Subsequently, he obtained his Ph.D. from The Institute of Mathematical Sciences in 2011; his advisor was V. Arvind.His research interests span all of TCS (in theory), but in practice are limited to circuit complexity, derandomization, and related areas of mathematics.He enjoys running and pretending to play badminton.