ON THE CLASSICAL HARDNESS OF SPOOFING LINEAR CROSS-ENTROPY BENCHMARKING

. Recently, Google announced the ﬁrst demonstration of quantum computational supremacy with a programmable superconducting processor (Arute et al. [2019]). Their demonstration is based on collecting samples from the output distribution of a noisy random quantum circuit, then applying a statistical test to those samples called Linear Cross-Entropy Benchmarking (Linear XEB). This raises a theoretical question: How hard is it for a classical computer to spoof the results of the Linear XEB test? In this short note, we adapt an analysis of Aaronson and Chen [2017] to prove a conditional hardness result for Linear XEB spooﬁng. Speciﬁcally, we show that the problem is classically hard, assuming that there is no eﬃcient classical algorithm that, given a random n -qubit quantum circuit C , estimates the probability of C outputting a speciﬁc output string, say 0 n , with variance even slightly better than that of the trivial estimator that always estimates 1 / 2 n . Our result automatically encompasses the case of noisy circuits.


Introduction
Quantum computational supremacy refers to the solution of a well-defined computational task by a programmable quantum computer in significantly less time than is required by the best known algorithms running on existing classical computers, for reasons of asymptotic scaling. It is a prerequisite for useful quantum computation, and is therefore seen as a major milestone in the field. The task of sampling from random quantum circuits (called RCS) is one proposal for achieving quantum supremacy [4,5,2]. Unlike other proposals such as Boson Sampling [1] and Commuting Hamiltonians [6], RCS involves a universal quantum computer -one theoretically capable of applying any unitary transformation. Furthermore, RCS currently appears to be the easiest proposal to implement at a large enough scale to demonstrate quantum supremacy.
A research team based at Google has announced a demonstration of quantum computational supremacy, by sampling the output distributions of random quantum circuits [3]. To verify that their circuits were working correctly, they tested their samples using Linear Cross-Entropy Benchmarking (Linear XEB). This test simply checks that the observed samples tend to concentrate on the outputs that have higher probabilities under the ideal distribution for the given quantum circuit. More formally, given samples z 1 , . . . , z k ∈ {0, 1} n , Linear XEB entails checking that E i [P(z i )] is greater than some threshold b/2 n , where P(z) is the probability of observing z under the ideal distribution. In the regime of 40-50 qubits, these probabilities can be calculated by a classical supercomputer with enough time.
While there is some support for the conjecture that no classical algorithm can efficiently sample from the output distribution of a random quantum circuit [5], less is known about the hardness of directly spoofing a test like Linear XEB. Results about the hardness of sampling are not quite results about the hardness of spoofing Linear XEB; a device could score well on Linear XEB while being far from correct in total variation distance by, for example, always outputting the items with the k highest probabilities.
Under the assumption that the noise in the device is purely "depolarizing" -that a sample from the circuit was sampled correctly with probability b − 1 and otherwise sampled uniformly at random -there is stronger evidence that it is difficult to spoof Linear XEB. Namely, if there is a classical algorithm for sampling from a quantum circuit with perfectly depolarizing noise in time T , then with the help of an all-powerful but untrusted prover, one can calculate a good estimate for output probabilities in time 10T /(b − 1) with high probability over circuits. Together with results of [9], it follows that under the Strong Exponential Time Hypothesis there is a quantum circuit from which one cannot classically sample with depolarizing noise in time (b − 1)2 (1−o(1))n [3]. We are unaware of any evidence that does not depend on such a strong assumption about the noise.
However, Aaronson and Chen were able to prove the hardness of a different, related verification procedure from a strong hardness assumption they called the Quantum Threshold Assumption (QUATH) [2]. Informally, QUATH states that it is impossible for a polynomial-time classical algorithm to guess whether a specific output string like 0 n has greater-than-median probability of being observed as the output of a given n-qubit quantum circuit, with success probability 1/2 + Ω(1/2 n ). They went on to investigate algorithms for breaking QUATH by estimating the output amplitudes of quantum circuits. For certain classes of circuits, output amplitudes can be efficiently calculated, but in general even efficiently sampling from the output distribution is impossible unless the polynomial hierarchy collapses [1,6]. Aaronson and Chen found an algorithm for calculating amplitudes of arbitrary circuits that runs in time d O(n) and THEORY OF COMPUTING, Volume 16 (11), 2020, pp. 1-8 poly(n, d) space, where d is the circuit depth. This is now used in some state-of-the-art simulations, but is still too slow and of the wrong form to violate QUATH for larger circuits, as there is no way to trade the accuracy for polynomial-time efficiency.
Here, we formulate a slightly different assumption that we call XQUATH and show that it implies the hardness of spoofing Linear XEB. Like QUATH, the new assumption is quite strong, but makes no reference to sampling. In particular, while we don't know a reduction, refuting XQUATH seems essentially as hard as refuting QUATH. Note that our result says nothing, one way or the other, about the possibility of improvements to algorithms for calculating amplitudes. It just says that there's nothing particular to spoofing Linear XEB that makes it easier than nontrivially estimating amplitudes.
Indeed, since the news of the Google group's success broke, at least four results have potentially improved on the classical simulation efficiency, beyond what Google had considered. First, Gray and Kourtis were able to optimize tensor network contraction methods to obtain a faster classical amplitude estimator, though it is not competitive for calculating millions of amplitudes at once [7]. Second, Pednault et al. argued that, by using secondary storage, the largest existing classical supercomputers should be able to simulate the experiments done at Google in a few days [11]. Third, Napp et al. produced an efficient algorithm for approximately simulating average-case quantum circuits from a certain distribution of constant depth circuits, which is impossible to efficiently exactly simulate classically in the worst-case unless the polynomial hierarchy collapses [10]. This algorithm is not efficient for circuits as deep as those used by the Google team. Fourth, Zhou et al. used tensor network algorithms to simulate circuits as large as in the Google experiment, but with different 2-qubit gates that were easier to simulate [12]. Our result provides some explanation for why these improvements had to target the general problem of amplitude estimation, rather than doing anything specific to the problem of spoofing Linear XEB.

Preliminaries
Throughout this note we will refer to random quantum circuits. Our results apply to circuits chosen from any distribution D over circuits on n qubits that is unaffected by appending NOT gates to any subset of the qubits at the end of the circuit. 1 For every such distribution there is a corresponding version of XQUATH. For instance, we could consider a distribution where d alternating layers of random single-and neighboring two-qubit gates are applied to a square lattice of n qubits, similar to the Google experiment. Note that the actual distribution in the Google experiment might have been affected by appending NOT gates, but they could have applied random NOT gates to the end of their circuits classically and achieved the same fidelity. If circuits from D include randomly-chosen NOT gates in the final layer, then D obviously satisfies our condition.
Our assumption XQUATH states that no efficient classical algorithm can estimate the probability of such a random circuit C outputting 0 n , with mean squared error even slightly lower than the trivial algorithm that always estimates 1/2 n . Definition 2.1 (XQUATH, or Linear Cross-Entropy Quantum Threshold Assumption). There is no polynomial-time classical algorithm that takes as input a quantum circuit C ← D and produces an estimate p of p 0 = Pr[C outputs 0 n ] such that 2 where the expectations are taken over circuits C as well as the algorithm's internal randomness.
The simplest way to attempt to refute XQUATH might be to hope that C is near to a circuit that is classically simulable -e. g., if C contains only near-Clifford gates. However, the fraction of such circuits will decay exponentially with the number of gates in the circuit, rather than the number of qubits. Alternatively, one might try k random Feynman paths of the circuit, all of which terminate at 0 n , and take the empirical mean over their contributions to the amplitude. This approach will similarly only yield an improvement in mean squared error over the trivial algorithm that decays exponentially with the number of gates. When the number of gates is much larger than 3n, as in the Google experiment, it is clear that these approaches cannot violate XQUATH. Even the best existing quantum simulation algorithms do not appear to significantly help in refuting XQUATH for reasonable circuit distributions.
The problem XHOG is to generate outputs of a given quantum circuit that have high expected squared-magnitude amplitudes. These outputs are required to be distinct for reasons that will become clear in the proof of Theorem 1.

Problem 2.2 (XHOG, or Linear Cross-Entropy Heavy Output Generation). Given a circuit
The interesting case is when b > 1, and we will generally think of b as a constant. Without faulttolerance, b − 1 will quickly become very small for circuits larger than the experiment can handle. This is a difficulty of applying complexity theory to finite experiments, which fail when the problem instance is too large.
When the depth is large enough, the output probabilities p of almost all circuits are empirically observed to be accurately described by the distribution 2 n e −2 n p , although this has only been rigorously proven in some special cases [3,4,8]. Under this assumption, for observed outputs z from ideal circuits C ← D we have So we expect an ideal circuit to solve XHOG with b ≈ 2, and a noisy circuit to solve XHOG with b slightly larger than 1. Theorem 1 says that, assuming XQUATH, solving XHOG with b > 1 is hard to do classically with many samples and high probability. For completeness, we show in the Appendix that with Google's number of samples and estimated circuit fidelity, they would be expected to solve XHOG with sufficiently high probability.

The Reduction
We now provide a reduction from the problem in XQUATH to XHOG. Since we only call the XHOG algorithm once in the reduction, and all other steps are efficient, solving XHOG actually requires as many computational steps as solving the problem in XQUATH, minus O(k). Theorem 3.1. Assuming XQUATH, no polynomial-time classical algorithm can solve XHOG with probability s > 1 2 + 1 2b , and k ≥ 1 .
With b = 1 + δ and s = 1 2 + 1 2b + ε, the right-hand side is approximately 1/2εδ . Proof. Suppose that A is a classical algorithm solving XHOG with the parameters above. Given a quantum circuit C ← D, first draw a uniformly random z ∈ {0, 1} n , and apply NOT gates at the end of C on qubits i where z i = 1 to get a circuit C . According to our assumption on D, C is distributed exactly the same as C, even conditioned on a particular z. Also, 0 n |C |0 n = z|C |0 n , so Pr[C outputs 0 n ] = Pr[C outputs z]. Call this probability p 0 .
Run A on input C to get z 1 , . . . , z k with E i [| z i |C |0 n | 2 ] ≥ b2 −n (when A succeeds). If z ∈ {z i }, then our algorithm outputs p = b2 −n ; otherwise it outputs p = 2 −n . Let Since E[X | z ∈ {z i }] = 0, and since z is uniformly random even conditioned on the output of A and its success or failure, which is Ω(2 −3n ) as long as k ≥ 1/((2s − 1)b − 1)(b − 1). This completes the proof.
One simple instance of the theorem is to take s = 3 4 + 1 4b and k = 2(b − 1) −2 . Note that even with s = 1, we need k ≥ (b − 1) −2 samples for the proof to work.
In fact, if the number of samples k is much smaller than (b − 1) −2 , then even sampling uniformly at random would pass XHOG with non-negligible probability. This can be seen using the Kullback-Leibler (KL) divergence: For a single sample, It is not hard to calculate that the Taylor expansion of the above around b = 1 is By additivity, the KL divergence for k samples is approximately k(b − 1) 2 /2. By Pinsker's inequality, the total variation distance is at most k(b − 1) 2 /4 THEORY OF COMPUTING, Volume 16 (11), 2020, pp. 1-8 Therefore, in order to have total variation distance independent of b, one needs k ≈ (b − 1) −2 . Finally, we would like to be confident that one is solving XHOG with sufficiently high probability s for Theorem 1 to apply, without having to perform the experiment enough times to verify this directly. This is easy to address under mild assumptions. Let Y = | z|C |0 n | 2 , where z is sampled from our XHOG device which was given C ← D. Assuming E[Y ] ≥ (2b − 1)/2 n , Chebyshev's inequality shows that when Y has standard deviation bounded by σ andȲ k is the empirical mean of k samples of Y . So, as long as σ = O(2 −n ), one only needs Ω((b − 1) −2 ) samples -a condition we already used to prove Theorem 1.

Open Problems
We conclude with two open problems related to our reduction.
• Can the classical hardness of spoofing Linear XEB be based on a more secure assumption? Is there a similar assumption to XQUATH that is equivalent to the classical hardness of XHOG? •