%% ToC v033/a002 (ToC#252), by U. Feige and E. Ofek.
%% (acr) first ToC-typesetting pass complete on 1/26/07.
%% LB slight revision 2007-01-23: front data, bio; moved address to end;
%%    added key words and phrases
%% authors revised 2007-01-23

\documentclass{toc}

\tocdetails{%
   volume=3, number=2, year=2007, firstpage=25,
   received={May 2, 2006},
   published={February 9, 2007}
}

\newcommand{\bits}{\{ 0,1 \}}
\newcommand{\e}{\epsilon}
\newcommand{\norm}[2]{\| #2 \|_{_{#1}}}
\newcommand{\equivalent}{\stackrel{\rm [1]}{\approx}}
\newcommand{\eqSup}[1]{\stackrel{#1}{=}}
\newcommand{\eqdef}{\triangleq}
\newcommand{\Ex}[2]{\underset{#1}{\mathbb{E}} \left[#2\right] }
\DeclareMathOperator{\Exp}{\mathbb{E}}

\newcommand{\stackTwo}[2]{\substack{\text{#1}\\ \text{#2} }}
\newcommand{\stackFour}[4]{\substack{\text{#1}\\ \text{#2} \\ \text{#3} \\ \text{#4} }}
\newcommand{\stackThree}[3]{\substack{\text{#1}\\ \text{#2} \\ \text{#3}}}

\newcommand{\Extract}{\textrm{Extract}}
\newcommand{\Refute}{\textrm{Refute}}
\DeclareMathOperator{\Var}{\textrm{Var}}

\runningauthor{Uriel Feige and Eran Ofek}
\runningtitle{Easily refutable subformulas of large random 3CNF
formulas}

\copyrightauthor{Uriel Feige and Eran Ofek}

\bibliographystyle{tocplain}

\begin{document}
\begin{frontmatter}[classification=text]

\title{Easily refutable subformulas of large random 3CNF formulas}
\tocpdftitle{Easily refutable subformulas of large random 3CNF formulas}
\tocpdfauthor{Uriel Feige and Eran Ofek}


\author[Uri]{Uriel Feige}
\author[Eran]{Eran Ofek}

\tockeywords{proof complexity, average case analysis,
 Boolean formula, 3CNF, refutation, spectral methods}


\begin{abstract}
A simple nonconstructive argument shows that most $3$CNF formulas
with $cn$ clauses (where $c$ is a sufficiently large constant) are
not satisfiable. It is an open question whether there is an
efficient refutation algorithm that for most formulas with $cn$
clauses proves that they are not satisfiable. We present a
polynomial time algorithm that for most $3$CNF formulas with
$cn^{3/2}$ clauses (where $c$ is a sufficiently large constant)
finds a subformula with $\Theta(c^2n)$ clauses and then 
uses spectral techniques to prove that
this subformula is not satisfiable (and hence that the original
formula is not satisfiable). Previously, it was only known how to
efficiently certify the unsatisfiability of random $3$CNF formulas
with at least $\text{poly}(\log(n)) \cdot n^{3/2}$ clauses. Our
algorithm is simple enough to run in practice. We present some
experimental results.
\end{abstract}

\tocams{68Q17,68Q25} \tocacm{F.2.2}

\end{frontmatter}

\section{Introduction}

A $3$CNF formula $\phi$ over $n$ variables is a set of $m$ clauses,
each one contains exactly $3$ literals of three different variables.
A formula $\phi$ is satisfiable if there exists an assignment to its
$n$ variables such that in each clause there is at least one literal
whose value is true. The problem of deciding whether an input $3$CNF
formula $\phi$ is satisfiable is NP-hard, but this does not rule out
the possibility of designing a good heuristic for it. A heuristic
for satisfiability may try to find a satisfying assignment for an
input formula $\phi$, in case one exists. A refutation heuristic may
try to prove that no satisfying assignment exists. In this paper we
present an algorithm which tries to refute an input formula $\phi$.
The algorithm has one sided error, in the sense that it will never
say ``unsatisfiable'' on a satisfiable formula, but for some
unsatisfiable formulas it will fail to output ``unsatisfiable''. It
then follows that for a formula $\phi$ on which the algorithm
outputs ``unsatisfiable,'' its execution on $\phi$ is a witness for
the unsatisfiability of $\phi$.


%But how should a
%witness for unsatisfiability look like? A trivial witness for
%example would be the appearance of all $8$ possible clauses over a
%fixed $3$ variables.

How does one measure the quality of a refutation heuristic? A possible
test may be to check how good the heuristic is on a random input. But
then, how do we generate a random unsatisfiable formula? To answer
this question we review some known properties of random $3$CNF
formulas. The satisfiability property has the following interesting
threshold behavior. Let $\phi$ be a random $3$CNF formula with $n$
variables and $cn$ clauses (each new clause is chosen independently
and uniformly from the set of all possible clauses). As the parameter
$c$ is increased, it becomes less likely that $\phi$ is satisfiable, as
there are more constraints to satisfy. In~\cite{FriedgutBo99} it is
shown that there exists $c_n$ such that for $c < c_n(1-\e)$ almost
surely $\phi$ is satisfiable, and for $c > c_n(1+\e)$, $\phi$ is almost
surely unsatisfiable (for some $\e$ which tends to zero as $n$
increases). It is also known that $3.52 < c_n <
4.596$~\cite{KaporisKiLa03,HajiaghayiSo03,JansonStVa00}. We will use
random formulas with $cn$ clauses (for $c > c_n(1 + \e)$) to measure
the performance of a refutation heuristic.  Specifically, the
refutation heuristic is considered good if for some $c > (1+\e)c_n$ it
almost surely proves that a random formula with $cn$ clauses is
unsatisfiable.


Notice that for any $n$, as $c$ is increased (for $c>c_n(1+\e)$),
the algorithmic problem of refutation becomes less difficult since
we can always ignore a fixed fraction of the clauses. The
following question is still open: how small can $c$ be so that
there is still a polynomial time algorithm which almost surely
refutes random $3$CNF formulas with $cn$ clauses ($c$ may also be
an increasing function of $n$).

A possible approach for refuting a formula $\phi$ is to find a resolution
proof for the unsatisfiability of $\phi$. In this approach one derives
new clauses implied by $\phi$ by combining pairs of clauses in which one
clause contains a variable and the other clause contains the negation
of this variable. Any satisfying assignment must satisfy at least one
of the remaining literals contained in the two clauses, and hence the
collection of these literals is a CNF clause implied by $\phi$. A
sequence of iterations of this resolution step that eventually
generates the empty clause is a proof that $\phi$ is not satisfiable.
Chv\'{a}tal and Szemer\'{e}di~\cite{ChvatalSz88} proved that a
resolution proof of a random $3$CNF formula with linear number of
clauses is almost surely of exponential size. A result of a similar
flavor for denser formulas was given by Ben-Sasson and
Wigderson~\cite{Ben-SassonWi01} who showed that a random formula with
$n^{3/2 -\e}$ clauses almost surely requires a resolution proof of
size $2^{\Omega(n^{\e /(1-\e)})}$. These lower bounds imply that finding a
resolution proof for a random formula is computationally inefficient.

A simple refutation algorithm can be used to refute random formula
$\phi$ with $n^2$ clauses. This is done by considering only those
clauses that contain a particular variable $x$. Fixing $x$ to be
true leaves about half of the selected clauses as a random $2$CNF
formula with roughly $3n/2$ clauses. A 2CNF formula with this
number of clauses is almost surely not satisfiable. Moreover, any
polynomial time algorithm for 2SAT can be used to certify that
this particular sub-formula is not satisfiable. The same can be
done when fixing $x$ to be false, implying that no matter how $x$
is assigned, the formula $\phi$ cannot be satisfied.

A new approach, introduced by Goerdt and Krivelevich
in~\cite{GoerdtKr01}, gave a significant improvement and reduced the
bound to $\log^7 n \cdot n^{k}$ clauses for efficient refutation of
$2k$CNF formulas. This approach was later extended
in~\cite{FriedmanGoKr05} and~\cite{GoerdtLa03} to handle also random
$3$CNF formulas with $n^{3/2+\e}$ and $\text{poly}(\log n) \cdot n^{3/2}$
clauses, respectively. In~\cite{CojaGoLaSc03,FeigeOf05} it is shown
how to efficiently refute a random $2k$CNF instances with at least
$cn^{k}$ clauses.

The difficulty of finding refutation algorithms for formulas with
linearly many clauses may lead one to assume that no such
algorithm exists. It is shown in~\cite{Feige02} that this
assumption (that there is no polynomial time refutation heuristic
that works for most $3$CNF formulas with $cn$ clauses, where $c$
is an arbitrarily large constant) implies that certain
combinatorial optimization problems (like minimum graph bisection,
the $2$-catalog segmentation problem, and others) have no
polynomial time approximation schemes. It is an open question
whether it is NP-hard to approximate these problems arbitrarily
well, though further evidence that these problems are indeed hard
to approximate is given in~\cite{Khot04a}.


Our refutation algorithm is based on techniques similar to those used
in earlier work, such as~\cite{FriedmanGoKr05,CojaGoLaSc03,Feige02}.
We use these techniques in a different way, resulting in an algorithm
that is easier to implement, easier to analyze, and works at lower
densities than previous algorithms. For example, both the algorithms
in~\cite{FriedmanGoKr05,GoerdtLa03} and our algorithm perform
eigenvalue computations on some random matrices derived from the
random input formula $\phi$. However, our matrices are much smaller (of
order $n$ rather than $n^2$), making the computational task easier.
Moreover, the structure of our matrices is simpler, making the
analysis of our algorithm simpler, and easier to apply also to
formulas with fewer clauses than those
in~\cite{FriedmanGoKr05,GoerdtLa03}. As a result of this simplicity,
we can show that our algorithm refutes formulas with $cn^{3/2}$
clauses, whereas the algorithms given
in~\cite{FriedmanGoKr05} and~\cite{GoerdtLa03} are claimed only to refute
formulas with $\Omega(n^{3/2 + \epsilon})$ and $\Omega(\text{poly}(\log n) \cdot n^{3/2})$
clauses, respectively. An implementation of our algorithm refuted a
random formula with $n = 50000$ variables and $27335932 < 3n^{3/2}$
clauses (see details in \expref{Section}{sec: practice
  considerations}).

In some other respects, our algorithm is more limited then the
algorithms in~\cite{FriedmanGoKr05,GoerdtLa03}. An algorithm is said
to provide \emph{strong refutation} if it shows not only that the
input 3CNF formula is not satisfiable, but also that every assignment
to the variables fails to satisfy a constant fraction of the clauses.
Our refutation algorithm does not provide a strong refutation. The
problem of strong refutation was addressed in~\cite{CojaGoLa07}, where
it was shown that variations of the algorithms
of~\cite{FriedmanGoKr05,GoerdtLa03} can strongly refute random 3CNF
formulas with at least $\log^6 (n) \cdot n^{3/2}$ clauses.  The ability to
perform strong refutation is an important issue, and its relation to
approximability is discussed in~\cite{Feige02}.

\section{Preliminaries}

\subsection{The random model}

\begin{definition}
A clause is a $3$-tuple of literals that belong to three different
variables. The set $C_n$ is the set of all possible clauses over $n$
fixed variables (there are $2n \cdot  2(n-1) \cdot 2(n-2)$ such clauses).
\end{definition}

We use the following model for generating the random formula $\phi$.
\begin{definition}
A $3$CNF formula $\phi$ is generated by choosing $m$ clauses from
$C_n$ independently at random with repetitions. Such a random
formula is denoted by $\phi \in_R C_n^m$.
\end{definition}

Although we concentrate on a specific random model for generating
random formulas, our algorithm succeeds also on other related random
models. For example the formula can be a random set of $cn^{3/2}$
distinct clauses.

\subsection{Notation}
We write $a(n) \sim b(n)$ if $\lim_{n \to \infty}
\frac{a(n)}{b(n)} \to 1$. We use the term w.h.p. (with high
probability) to denote a sequence of probabilities that converges to
$1$ as $n$ increases.

\subsection{Efficient certification of a property}
\label{sec:maxcut}

An important concept which will be frequently used is the concept of
\textit{efficient certification}. Let P be some property of
graphs/formulas or any other combinatorial object. An algorithm $A$
\textit{certifies} the property P if the following holds:
\begin{enumerate}
\item On any input instance $\phi$ the algorithm returns either
``has P'' or ``don't know.''
\item \textit{Soundness:} The algorithm
never outputs ``has P'' on an instance $\phi$ which does not have
the property P. The algorithm may output ``don't know'' on an
instance $\phi$ which has the property P (the algorithm has one
sided error).
\end{enumerate}
We will use certification algorithms on random instances of
formulas/graphs taken from some probability space $\mathcal{C}$. We
shall consider properties that are almost surely true for the random
object taken from $\mathcal{C}$. A certification algorithm is
\textit{complete} with respect to the probability space
$\mathcal{C}$ and a property P if it almost surely outputs ``has P''
on an input taken from $\mathcal{C}$.

The computationally heavy part of our algorithm is certifying that
two different graphs derived from the random formula $\phi$ do not
have a large cut. One of these two graphs is random, and the other
is a multigraph that is the union of 6 graphs, where each of these
graphs by itself is essentially random, but there are correlations
among the graphs. A cut in a graph is a partition of its vertices
into two sets. The size of the cut is the number of edges with one
endpoint in each part. A certification algorithm for verifying that
an input graph with $m$ edges has no cut significantly larger than
$m/2$ is implicit in~\cite{Zwick99}. This algorithm is based on
semi-definite programming; if the maximum cut in the input graph is
of size at most $m(1/2 +\e)$, then the algorithm outputs a
certificate that the maximum cut in $G$ is bounded by
$m(1/2 + \delta(\e))$, where $\delta(\e) \to 0$ as
$\e \to 0$. A computationally simpler algorithm can be
applied if the graph is random. In~\cite{FeigeOf05} it is shown how
to certify that in a random graph taken from $G_{n,d/n}$ the size of
the maximum cut is bounded by $dn\bigl( 1/4 +
\theta(1/{\sqrt{d}})\bigr)$, thus bounding the maximum cut by
$m(1/2 + \e)$ when $d$ is large enough. This is done by
removing the vertices of highest degree from $G$, and then computing
the most negative eigenvalue of the adjacency matrix of the
resulting graph.
\subsection{An overview of our refutation algorithm}
\label{sec:overview} Our algorithm builds on ideas taken from earlier
work
(\cite{GoerdtKr01,FriedmanGoKr05,Feige02,FeigeOf05,CojaGoLaSc03,GoerdtLa03}).
This section gives an informal overview of the algorithm at a fairly
detailed level. Other sections of this manuscript fill in the formal
details.

The input is an arbitrary 3CNF formula $\phi$ with $n$ variables
and $m = cn^{3/2}$ clauses, where $c$ is a large enough constant.
Below we describe the expected behavior of our algorithm when the
input formula is random. The algorithm first greedily extracts
from $\phi$ a subformula $\phi'$. This is done as follows. We say
that two clauses {\em match} if they differ in their first
literal, but agree on their second literal and on their third
literal. For example, the clauses $(x_1 \lor \bar{x}_2 \lor x_3)$
and $(x_4 \lor \bar{x}_2 \lor x_3)$ match. The subformula $\phi'$
is constructed by greedily putting into $\phi'$ pairs of clauses
that match, until no further matches are found in $\phi$ (we allow
each clause of $\phi$ to participate in at most one matched pair
of $\phi'$). Let $m'$ be the number of clauses in $\phi'$. A
simple probabilistic argument shows that we can expect $m' =
\Theta(m^2/n^2) = \Theta(c^2n)$. Moreover, $\phi'$ is essentially
a union of two random (but correlated) formulas $\phi_1$ and
$\phi_2$ (each containing one clause from every pair of clauses
that are matched in $\phi'$). Our algorithm will now ignore the
rest of $\phi$, and refute $\phi'$. As explained, $\phi'$ is a
union of two random formulas. Here we use an observation that is
made in~\cite{Feige02}, that we shall call the 3XOR principle.

\medskip
\noindent
\textbf{The 3XOR principle:} In order to show that a random 3CNF
formula is not satisfiable, it suffices to strongly refute it as a
3XOR formula.
\medskip

Let us explain the terms used in the 3XOR principle. A clause in a
3XOR formula is satisfied if either one or three of its literals
are satisfied. A strong refutation algorithm is one that shows
that every assignment to the variables leaves at least a constant
fraction of the clauses not satisfied (as 3XOR clauses, in our
case).

A proof of the 3XOR principle is given in~\cite{Feige02}. We
sketch it here, and give it in more details in \expref{Section}{sec:
refutation algorithm}. Observe that in a random formula every
literal is expected to appear the same number of times, and if the
number of clauses is large enough, then things behave pretty much
like their expectation. As a consequence, every assignment to the
variables sets roughly half the occurrences of literals to true,
and roughly half to false. Hence every assignment satisfies on
average 3/2 literals per clause. Moreover, this property is easily
certifiable, by summing up the number of occurrences of the $n$
most popular literals.

Given that every assignment satisfies on average ${3}/{2}$
literals per clause, let us consider properties of satisfying
assignments (if such assignments exist). The good option is that
they satisfy one literal in roughly ${3}/{4}$ of the clauses,
three literals in roughly ${1}/{4}$ of the clauses, and 2
literals in a negligible fraction of the clauses. This keeps the
average roughly at ${3}/{2}$, and indeed nearly satisfies the
formula also as a 3XOR formula, as postulated by the 3XOR
principle. The bad option (which also keeps the average at
${3}/{2}$) is that the fraction of clauses that are satisfied
three times drops significantly below ${1}/{4}$, implying
that significantly more than ${3}/{4}$ of the clauses are
satisfied either once or twice, or in other words, as a 3NAE
(3-``not all equal'' SAT) formula. But here, let us combine two
facts. One is that for a random large enough formula, every
assignment satisfies roughly ${3}/{4}$ of the clauses as a
3NAE formula. The other (to be explained below) is that there are
known efficient algorithms for certifying that no assignment
satisfies more than ${3}/{4} + \epsilon$ fraction of the
clauses of a 3NAE formula. Hence for a random 3CNF formula, one
can efficiently certify that the bad option mentioned above does
not occur.

Having established the 3XOR principle, the next step of our
algorithm makes one round of Gaussian elimination. That is, under
the assumption that we are looking for near satisfiability as 3XOR
(which is simply a linear equation modulo 2), we can add clauses
modulo 2. Adding (modulo 2) two matched clauses, the common literals
drop out, and we get a clause with only two literals whose XOR is
expected to be 0, namely, a 2EQ clause (EQ for equality). For
example, from the clauses $(x_1 \oplus \bar{x}_2 \oplus x_3)$ and
$(x_4 \oplus \bar{x}_2 \oplus x_3)$ one gets the clause $(x_1 =
x_4)$. Doing this for all pairs of matched clauses in $\phi'$, we
get a random 2EQ formula $\phi_{2eq}$. If $\phi'$ was nearly
satisfiable as 3XOR, then $\phi_{2eq}$ must be nearly satisfiable as
2EQ. But if $\phi'$ is random, then $\phi_{2eq}$ is essentially a
random 2EQ formula. For such formulas, every assignment satisfies
roughly half the clauses. Moreover, there are known algorithms that
certify this (to be explained shortly). Hence we can strongly refute
$\phi_{2eq}$ as 2EQ, implying strong refutation of $\phi'$ as 3XOR,
implying strong refutation of $\phi'$ as 3SAT, implying refutation
(though not strong refutation) of $\phi$ as 3SAT.

Let us briefly explain here the major part that we skipped over in
the description of our algorithm, namely, how to certify that a
random 2EQ formula is not ${1}/{2} + \epsilon$ satisfiable, and
how to certify that a random 3NAE formula is not ${3}/{4} +
\epsilon$ satisfiable. In both cases, we reduce the certification
problem to certifying that certain random graphs do not have large
cuts, and then use the certification algorithms mentioned in
\expref{Section}{sec:maxcut}. (The principle of refuting random formulas
by reduction to random graph problems was introduced in~\cite{GoerdtKr01}.)


To strongly refute random 2EQ formulas, we negate the first
literal in every clause, getting a 2XOR formula. Now we construct
a graph whose vertices are the literals, and whose edges are the
clauses. A nearly satisfying 2XOR assignment naturally partitions
the vertices into two sides (those literals set to true by the
assignment versus those that are set to false), giving a cut
containing nearly all the edges. On the other hand, if the
original 2EQ formula was random, then so is the graph, and it does
not have any large cut. As explained in \expref{Section}{sec:maxcut},
we can efficiently certify that the graph does not have a large
cut, thus strongly refuting the 2XOR formula, and hence also
strongly refuting the original 2EQ formula.

To strongly refute random 3NAE formulas, we again consider a max-cut
problem on a graph (in fact, a multigraph, as there may be parallel
edges) whose vertices are the literals. From each 3NAE clause we
derive three edges, one for every pair of literals. For example,
from the 3NAE clause $(x_1, \bar{x}_2, x_3)$ we get the edges $(x_1,
\bar{x}_2)$, $(\bar{x}_2, x_3)$ and $(x_3,x_1)$. It is not hard to
see that if a ${3}/{4} + \epsilon$ fraction of the 3NAE clauses
are satisfied as 3NAE, than a $\frac{2}{3}(\tfrac{3}{4} + \epsilon)
= \tfrac{1}{2} + \frac{2 \epsilon}{3}$ fraction of the edges of the
graph are cut by the partition induced by the corresponding
assignment. Note that in our case (of $\phi'$ that is the union of
random $\phi_1$ and random $\phi_2$) this graph is essentially a
union of 6 random graphs: 3 graphs derived from the clauses of
$\phi_1$ (one with edges derived from the first two literals in
every clause, one with edges derived from the last two literals, and
one from the first and third literal), and 3 graphs derived from
$\phi_2$. Hence it is not expected to have a cut containing
significantly more than half the edges. One may certify that this is
indeed the case either by using the algorithm of~\cite{Zwick99} on
the whole graph, or by using the algorithm of~\cite{FeigeOf05} on
each of the 6 random graphs separately.


Summarizing, our refutation algorithm extracts from $\phi$ a
subformula $\phi'$ (composed of matched pairs of clauses), checks
that in $\phi'$ almost all literals appear roughly the same number
of times, derives from $\phi'$ certain graphs on $2n$ vertices and
certifies that they do not have large cuts (\eg, by computing the
most negative eigenvalue of their adjacency matrices). The
combination of all this evidence forms a proof that $\phi$ is not
satisfiable. If $\phi$ is random and large enough ($cn^{3/2}$
clauses), then almost surely the algorithm will indeed manage to
collect all the desired evidence.



\section{The refutation algorithm}\label{sec: refutation
algorithm} The input formula $\phi$ is taken from
$C_{n}^{cn^{3/2}}$. We will use $(?,w,\ell)$ to denote a clause in
which the second and the third literals are $w$ and $\ell$, respectively,
and the first literal can be
any literal. The following algorithm is used to extract $\phi'$ from $\phi$.\\

\begin{center}
  \fbox{
    \begin{minipage}{.8\textwidth}
      \vspace{0.2cm} \textbf{Algorithm  $\Extract (\phi)$}\\
      \hspace*{1em}Set $\phi_1=\phi_2 =\emptyset$.\\
      \hspace*{1em}For every ordered pair of literals $(w,\ell)$:
      \begin{enumerate}
      \item Count the number of
        clauses in $\phi$ of the form $(?,w,\ell)$ and store it in $N_{(w,\ell)}$.
      \item If $N_{(w,\ell)} \geq 2$ add to $\phi_1$ the first
        appearance of a $(?,w,\ell)$ clause and add to $\phi_2$ the second
        appearance of a $(?,w,\ell)$ clause.
      \end{enumerate}
      $~$Return $\phi' = (\phi_1,\phi_2)$. \vspace{0.1cm}
    \end{minipage}
  }
\end{center}
\medskip

Each of $\phi_1,\phi_2$ is a random formula, though clauses in
$\phi_1$ (and $\phi_2$) are not completely independent of each
other: if a clause $(x,y,z)$ appears, then the clause $(t,y,z)$
cannot appear. From now on we will concentrate on refuting
$\phi'$, ignoring the rest of $\phi$. The number of matched pairs
in $\phi'$ is denoted by $m$ (in \expref{Section}{sec:overview} we
used $m$ to denote the number of clauses in $\phi$; from here on,
$m$ will denote the number of matched pairs in $\phi'$).

\begin{lemma}\label{lemma: number_of_pairs}
Let $\phi'$ be the formula returned by $\Extract(\phi)$, where $\phi
\in_R C_n^{cn^{3/2}}$. W.h.p. the number of matched pairs in $\phi'$
is $\sim \frac{c^2n}{8}$.
\end{lemma}
The proof of \expref{Lemma}{lemma: number_of_pairs} is deferred to
\expref{Section}{sec: technical lemmas}. Before specifying the
algorithm, we introduce additional notation and definitions which
will ease the description of the algorithm.

\begin{definition}\label{def: graph induced by formula}
Let $\phi$ be any $3$CNF formula over $n$ variables. The \emph{graph
induced by $\phi$} has $2n$ vertices (corresponding to all possible
literals) and the following edges. Each clause of $\phi$ induces
three edges by taking all (unordered) pairs of literals from it
(\eg, the clause $(x,y,\bar{z})$ induces the edges $(x,y),
(x,\bar{z}),(y,\bar{z})$). We denote the (multi) graph induced by
$\phi$ by $G_{\phi}$.
\end{definition}

\begin{definition}\label{def: phi' graphs}
Let $\phi' = (\phi_1,\phi_2$) be a 3CNF formula with $m$ pairs of
matched clauses. The graph $G_{\phi'}$ is the graph induced by
$\phi'$ (as in \expref{Definition}{def: graph induced by formula}). The
graph $G_{\phi'}^{2eq}$ is a graph with $2n$ vertices (corresponding
to all literals). Its edges are as follows: each matched pair from
$\phi_1,\phi_2$, say $(x,w,\ell), (y,w,\ell)$, induces
exactly one edge $(\bar{x},y)$. Note that we negate the literal
which corresponds to the clause of $\phi_1$.
\end{definition}


\begin{definition}
Let $\phi$ be a formula with $n$ variables and $m$ clauses. The
\emph{imbalance} of a variable $i$ (denoted by $Im_i$) is the
difference in absolute value between the number of times it appears
with positive polarity and the number of times it appears with
negative polarity. The \emph{total imbalance} of $\phi$ is
$\sum_{i=1}^n Im_i$ and the \emph{normalized imbalance} of $\phi$ is
$(1/3m) \sum_{i=1}^n Im_i$.
\end{definition}
If the normalized imbalance of $\phi$ is bounded by $\delta$, then
$\phi$ is \emph{$\delta$-balanced}.

\begin{definition}
A 3CNF formula $\phi$ has the $(1-\e)~3$XOR property if for every
assignment $A$, if $A$ satisfies $\phi$ as 3CNF, then at least $1
-\e$ fraction of the clauses are satisfied as $3$XOR.
\end{definition}

\begin{definition}
A graph is said to have a $\delta$-cut if there is a partition of
its vertices into two disjoint sets such that at least a
$\delta$-fraction of the edges cross this partition.
\end{definition}

\begin{center}
\fbox{
\begin{minipage}[h]{.8\textwidth}
\vspace{0.2cm} \textbf{Algorithm  $\Refute(\phi')$}
\begin{enumerate}
\item Certify that $\phi'$ has the $(1- \gamma) ~3$XOR property.
Specifically:\\
(a) Find the normalized imbalance of $\phi'$ and denote it by $\delta'$.\\
(b) Certify that $G_{\phi'}$ has no $(\tfrac{1}{2} + \e')$-cut
($\e'$ is returned by a subroutine).\\
Set $\gamma = \frac{3}{2}(\delta' + 2 \e')$.

\item Certify that $G_{\phi'}^{2eq}$ has no $(\frac{1}{2}+\e)$-cut ($\e$
is returned by a subroutine). \item If $\e + 2 \gamma < \frac{1}{2}$
return ``unsatisfiable,'' otherwise return ``don't know.''
\end{enumerate}
\vspace{0.1cm}
\end{minipage}
}\\
\end{center}

In steps 1(b) and 2 of $\Refute(\phi')$ we use as a blackbox a subroutine for
bounding the maximum cuts in $G_{\phi'}$ and $G_{\phi'}^{2eq}$. This
subroutine is explained in \expref{Theorem}{thm: cut certification
  zwick}.

We first show that $\Refute(\phi')$ is sound, \ie, whenever it returns
``unsatisfiable'' it holds that $\phi'$ can not be satisfied. This
follows from \expref{Lemma}{lemma: 3xor lemma} and \expref{Theorem}{thm:
  soundness}.

\begin{theorem}[Soundness]\label{thm: soundness}
Let $\phi' =(\phi_1,\phi_2)$ be a 3CNF formula composed of pairs of
matched clauses. Denote by $G_{\phi'}^{2eq}$ the graph induced by
$\phi'$ as described in \expref{Definition}{def: phi' graphs}. The
formula $\phi'$ is not satisfiable if all the following conditions
hold:
\begin{enumerate}
\item $\phi'$ has the $(1-\gamma)3$XOR property,

\item $G_{\phi'}^{2eq}$ has no $(\tfrac{1}{2} + \epsilon)$-cut,

\item $\e + 2 \gamma < \tfrac{1}{2}$.
\end{enumerate}
\end{theorem}

\begin{proof}
We shall show that if $\phi'$ is satisfiable and has the
$(1-\gamma)3$XOR property, then $G_{\phi'}^{2eq}$ has a $(1
-2\gamma)$-cut. Combined with the fact that $G_{\phi'}^{2eq}$ has
no ${1}/{2} + \e$ cut we derive a contradiction (since $\e +
2 \gamma < {1}/{2}$). Consider the cut induced on the
vertices of $G_{\phi'}^{2eq}$ by a satisfying assignment $A$
(where in one side there are all the literals whose value is true
and in the other side there are all the literals whose value is
false). $A(x)$ denotes the value of the literal $x$ induced by the
assignment $A$. By the $3$XOR property of $\phi'$, all but
$\gamma$ fraction of the clauses of $\phi'$ are satisfied as
$3$XOR clauses. Hence at least a $(1 - 2\gamma)$ fraction of the
pairs of matched clauses have both clauses in the pair satisfied
by $A$ as 3XOR. Let $(x,w,\ell),(y,w,\ell)$ be a pair such that both
$(x,w,\ell)$ and $(y,w,\ell)$ are satisfied as $3$XOR. It holds that:
$A(x)+A(y)+2(A(w)+A(\ell))= 0 \bmod 2$. Thus exactly one of the
literals $\bar{x},y$ is true and the other is false (under the
assignment $A$), and the edge $(\bar{x},y)$ induced by this pair
of clauses crosses the cut. It then follows that at least $1-
2\gamma$ fraction of the edges in $G_{\phi'}^{2eq}$ are cut edges.
\end{proof}



\begin{lemma}[The $3$XOR lemma]\label{lemma: 3xor lemma}
Let $\phi$ be a 3CNF formula. If the following hold:
\begin{enumerate}
\item $\phi$ is $\delta$-balanced, and

\item the graph (induced by $\phi$) $G_{\phi}$ has no $(\tfrac{1}{2} +
\varepsilon)$-cut,
\end{enumerate}
then $\phi$ has the $(1 -\tfrac{3}{2}(\delta + 2\varepsilon))3$XOR
property.
\end{lemma}

\begin{proof}

The proof of this lemma appears in~\cite{Feige02}; also a similar
version of this lemma (for denser random $2k$CNF formulas) appears
in~\cite{CojaGoLaSc03}. We repeat the proof for the sake of
self-containment.


Denote by $m$ the number of clauses in $\phi$ and let $A$ be a
satisfying assignment. We bound from above the number of satisfied
appearances of literals. The assignment which maximizes the number
of satisfied appearances of literals is the 'majority vote': a
variable $x$ is assigned 'true' iff it appears more times with
positive polarity than with negative polarity. Using this assignment
the total number of satisfied appearances is $\frac{1}{2}\bigl(3m +
\sum_{i=1}^n Im_i\bigr)$ (where $Im_i$ denotes the imbalance of
variable $i$). It follows that on average each clause is satisfied
at most $\frac{3}{2} + \frac{1}{2m} \sum_{i=1}^n Im_i = \frac{3}{2}
(1+ \delta)$ times.

A clause is satisfied as $3$AND by $A$ if all its literal are
satisfied by $A$. We next show that the fraction of clauses
satisfied by $A$ as $3$AND is at least ${1}/{4} -
3\varepsilon/2$. Equivalently it is enough to show that the
fraction of clauses satisfied as $3$NAE, denoted by $\beta$, is at
most ${3}/{4} + {3\epsilon}/{2}$ (since $A$ is a
satisfying assignment). Consider the graph $G_{\phi}$ induced by
$\phi$. We remind the reader that each clause of $\phi$ contributes
a ``triangle'' of $3$ edges to $G_{\phi}$ (\eg, the clause
$(x,\bar{y},z)$ contributes the edges
$(x,\bar{y}),(\bar{y},z),(x,z)$). Consider the cut induced by the
satisfying assignment $A$. Each clause satisfied as $3$NAE
contributes exactly $2$ edges to the cut, thus this cut has at least
$2\beta m$ edges. But $G_{\phi}$ has no $({1}/{2} +
\varepsilon)$-cut. It follows that $2 \beta m \leq ({1}/{2} +
\varepsilon)3m$, giving $\beta \leq {3}/{4} +
3\varepsilon/2 $ as needed.


It remains to show that all but a small fraction of the clauses are
satisfied as $3$XOR by $A$. Denote by $\alpha_1,\alpha_2,\alpha_3$
the fraction of clauses which are satisfied exactly once, exactly
twice and exactly $3$ times respectively ($\sum_{i=1}^3
\alpha_i=1$). We already know that $\alpha_3 \geq {1}/{4}
- {3\epsilon}/{2}$ and that each clause is satisfied at most
${3}/{2}+ \delta $ times on average, thus
\[
\frac{3(1 +
\delta )}{2} \geq 3 \cdot \alpha_3 + 2 \cdot \alpha_2 + 1 \cdot \alpha_1\enspace.
\]
Substituting $\alpha_1 = (1-\alpha_3-\alpha2)$ and $\alpha_3$
with $\frac{1}{4} - \frac{3}{2}\epsilon$ yields that $\alpha_2
\leq \frac{3}{2}(\delta + 2\varepsilon)$.
\end{proof}


The following two Theorems show that our refutation algorithm
refutes most formulas $\phi \in_R C_n^{cn^{3/2}}$.

\begin{theorem}\label{thm: random graphs small MaxCut}
For any $\e >0$ there is a constant $c = c(\e)$ such that w.h.p. over
the choice of $\phi \in_R C_n^{cn^{3/2}}$:
\begin{enumerate}
\item[(a)] the subformula $\phi'$ is $\e$-balanced, and
\item[(b)] each of the graphs $G_{\phi'}$ and $G_{\phi'}^{2eq}$ has no
$(\tfrac{1}{2} + \e)$-cut.
\end{enumerate}
\end{theorem}
It is well known and follows from standard probabilistic calculations
that a random graph (with large enough average degree) has no
$({1}/{2} + \e)$-cut, and that a random formula is $\e$-balanced.
The distributions of $\phi',G_{\phi'}$ are close enough to the standard
models of random formulas/graphs respectively, so that the proof
techniques used for the random cases can be applied also in our case.
The proof of \expref{Theorem}{thm: random graphs small MaxCut} is
deferred to \expref{Section}{app: random graphs max cut}.


\begin{theorem} \label{thm: cut certification zwick}
There is a polynomial time algorithm that finds the imbalance of a
3CNF formula. There is a polynomial time algorithm that for every
graph $G$ that has no $(\tfrac{1}{2} + \epsilon)$-cut, certifies
that $G$ has no $(\tfrac{1}{2} + \delta)$-cut, where $\delta(\e)
\to 0$ as $\e \to 0$.
\end{theorem}

\begin{proof}
Finding the normalized imbalance of a formula $\phi$ is done by
counting positive and negative appearances for every variable,
computing its imbalance and averaging.

An algorithm for certifying a bound on the maximum cut of a graph is
given in~\cite{Zwick99}. Given a graph with $m$ edges whose maximum
cut is bounded by $m({1}/{2} + \e)$ this algorithm produces a
proof that the input graph has no cut of cardinality $m({1}/{2}
+ \delta)$. This algorithm has the property that $\delta \to
0$ as $\e \to 0$.  \end{proof}

\begin{corollary}[Almost-completeness]\label{cor: completeness}
For sufficiently large constant $c$, the refutation algorithm w.h.p.
returns ``unsatisfiable'' for a random formula $\phi \in_R
C_{n}^{cn^{3/2}}$.
\end{corollary}

\begin{proof}
Using Theorems~\ref{thm: random graphs small MaxCut},~\ref{thm: cut
certification zwick} we will show that by taking $c$ to be a
sufficiently large constant, the terms $\e, \gamma$ from
$\Refute(\phi')$ can be made arbitrary small (and thus $\e + 2\gamma
< {1}/{2}$). The term $\gamma$ equals $({3}/{2})(\delta' + 2
\e')$ where $\delta'$ is the imbalance of $\phi'$ and $({1}/{2}
+\e')$ is the bound returned by the algorithm of~\cite{Zwick99} when
applied to $G_{\phi'}$. By \expref{Theorem}{thm: random graphs small
MaxCut}, $\delta' \to 0$  as $c$ increases. Furthermore, the
value of the maximum cut in both $G_{\phi'}, G_{\phi'}^{2eq}$
approaches ${1}/{2}$ as $c$ increases. It then follows, using
\expref{Theorem}{thm: cut certification zwick}, that the bounds
${1}/{2}+\e'$ and ${1}/{2}+ \e$ returned by the certification
algorithm (of~\cite{Zwick99}) applied to $G_{\phi'}$ and $G_{\phi'}^{2eq}$, respectively,
can be made arbitrary close to ${1}/{2}$ by setting $c$ to be a
sufficiently large constant.

\end{proof}



\section{Proofs of \expref{Lemma}{lemma: number_of_pairs} and \expref{Theorem}{thm: random graphs small MaxCut}}\label{sec: technical lemmas}

\subsection{Proof of \expref{Lemma}{lemma: number_of_pairs}}


\begin{proof}[Proof of \expref{Lemma}{lemma: number_of_pairs}]
\label{app: number of pairs} For two random clauses the probability
that they induce a matched pair of clauses is 
\[
p \eqdef \frac{1}{2n}
\frac{1}{2n-2} \sim \frac{1}{4n^2}\enspace.
\]
Thus, the expected number of pairs that match is
\begin{align}
\mu = \binom{cn^{3/2}}{2} p \sim \frac{c^2n^3}{2} \frac{1}{4n^2}
\sim \frac{c^2 n}{8}\enspace.
\end{align}
Let $X$ denote the number of pairs of clauses in $\phi$ that match.
We use the second moment to show that w.h.p. $X \sim \mu$. By
Chebyshev's inequality
\begin{align}
\Pr[|X - \mu| > \e \mu] < \frac{\Var(X)}{\e^2\mu^2}\enspace.\label{eqn:
chebishev}
\end{align}
For every two clause locations in $\phi$ (e.g., first clause and
third clause) we set an indicator random variable $X_i$
($i=1,2,...,\binom{cn^{3/2}}{2}$) to be $1$ if the respective two
clauses match and otherwise $0$. For $i \neq j$ we say that $i
\sim j$ if the pair $i$ and pair $j$ share one clause location,
and otherwise $i \nsim j$. (Note that if pair $i$ and pair $j$
share two clause locations, then $i=j$. Note also that $i \nsim j$
might share the same clause without sharing a clause location, if
a certain clause happened to appear twice in $\phi$.) For any
fixed $i$ we let
\begin{align}
\Delta^*  = \sum_{j:~ j \sim i} \Pr[X_j~|~X_i]\enspace.
\end{align}
>From symmetry, $\Delta^*$ does not depend on $i$.
\begin{align}
 \Var(X)  & = \Exp\left[X^2\right] - \mu^2  = \Exp\bigl[(\sum_{i} X_i)^2\bigr] - \mu^2
 = \Exp\Bigl[\sum_{i} X_i^2 + \sum_{i \neq j} X_i X_j\Bigr] - \mu^2 \nonumber \\
 & = \mu - \mu^2 + \sum_{i \neq j} \Pr[X_i X_j] =  \mu -\mu^2 +
\sum_{i} \Pr[X_i] \left( \sum_{j: ~j \nsim i} \Pr[X_j] + \sum_{j:~j
\sim i} \Pr[X_j~|~X_i]\right) \nonumber \\
& \leq \mu - \mu^2 + \underbrace{\sum_{i} \Pr[X_i] \sum_{j}
\Pr[X_j]}_{\mu^2} + \sum_{i} \Pr[X_i] \underbrace{\sum_{j:~j \sim
i}\Pr[X_j ~|~X_i]}_{\Delta^*} \nonumber \\
& = \mu + \sum_{i}\Pr[X_i] \Delta^* = \mu(1+ \Delta^*)\enspace.
\end{align}
Substituting $\Var(X)$ with $\mu(1+ \Delta^*)$ in inequality
\eqref{eqn: chebishev} we derive
\begin{align}
\Pr[|X -\mu| > \e \mu] < \frac{\mu(1+ \Delta^*)}{\e^2 \mu^2} \leq
\frac{1+ \Delta^*}{\e^2 \mu}\enspace.
\end{align}
Thus, since $\mu = \omega(1)$, it suffices to show that $\Delta^* =
o(\mu)$. It holds that
\begin{align}
\Delta^*   \leq 2(cn^{3/2}-2) p = o(\mu)\enspace.
\end{align}
So far we showed that w.h.p. $X \sim \mu$. Note that $X$ may
over-count the number of matched pairs in $\phi'$. The reason is
that in $\phi$ there are expected to be sets of three or more
clauses in which any two clauses match. From each such set,
$\Extract(\phi)$ takes to $\phi'$ only the first two clauses of the
set.

For $i \geq 3$, we call a set of $i$ clauses \emph{bad} if each two clauses
of the set match. Let $Y_i$  be the number of bad sets of size $i$
in $\phi$. The number of matched clauses in $\phi'$ is at least $X -
\sum_{i \geq 3} \binom{i}{2} Y_i$. Thus, in order to show that the
number of matched pairs in $\phi'$ is $\sim \mu$ it is enough to
prove that 
\[
\Exp\left[\sum_{i \geq 3} \binom{i}{2} Y_i\right] = o(\mu)\enspace.
\]
Then, using Markov's inequality we derive that w.h.p. $\sum_{i \geq
3} \binom{i}{2} Y_i = o(\mu)$. It holds
\begin{align}
\Exp\left[ \binom{i}{2} Y_i\right] = \binom{i}{2} \binom{cn^{3/2}}{i}
p^{i-1} \leq \frac{i^2}{2} \left( \frac{cn^{3/2} e}{i}\right)^i
\left( \frac{1}{4n^2(1-1/n)} \right)^{i-1}. \label{eqn: surplus}
\end{align}
Thus, the sum $\sum_{i \geq 3} \Exp\bigl[\binom{i}{2} Y_i\bigr]$ is bounded
by the sum of a geometric sequence whose first term ($i=3$) is
$o(\mu)$. It follows that $\Exp\bigl[\sum_{i \geq 3} \binom{i}{2} Y_i\bigr]
= o(\mu)$.

\end{proof}

\subsection{Proof of \expref{Theorem}{thm: random graphs small MaxCut}
}\label{app: random graphs max cut}

Let $P \subset C_n \times C_n$ be the set of all possible matched
pairs of clauses, and let $P^m$ be the set of all $m$-tuples of
matched pairs of clauses. For a pair of matched clauses $(c_1,c_2)
\in P$, the \emph{inducing pair} is the pair formed by the second
and third literals in each of the two matched clauses $c_1,c_2$.
Let $m$ denote the number of matched pairs of clauses in $\phi'$.
Note that $\phi' \in P^m$, however not all the elements of $P^m$
are in the support of $\phi'$. We denote by $\binom{P}{m} \subset
P^m$ the support of $\phi'$, \ie, the collection of all (ordered)
$m$ matched pairs for which every matched pair of clauses has a
distinct inducing pair. We claim that $\phi'$ is a random element
of $\binom{P}{m}$.

\begin{lemma}\label{lemma: pairs distribution}
Given that $\phi'$ has $m$ matched clauses, the set of inducing
pairs is a random set of $m$ different ordered pairs of literals
(each pair has two distinct literals).
\end{lemma}

\begin{proof}
Let $C$ denote the set of clauses that have a matching clause
(from the original formula $\phi$). Denote by $L$ the set of
inducing pairs, \ie, pairs that participate as the second and
third literal in one of the clauses of $C$. The set $L$ may be any
set of distinct ordered pairs of size $m$. By symmetry, any such
set is equally likely to be $L$. The explanation is as follows.
Assume we expose the indices of the clauses in $C$ and also the
partition of $C$ into equivalent classes (each equivalent class is
a maximal set of clauses that have the same second and third
literals). Given this information, for each choice of $L$, the
probability that $L$ is the set of inducing pairs is the same (for
any $L$ the number of ways to match the pairs of $L$ with the
equivalent classes is the same; additionally, the probability for
all other clauses not in $C$ to avoid all the pairs of $L$ in the
second and third literals is the same).
\end{proof}

\begin{proof}[Proof of \expref{Theorem}{thm: random graphs small MaxCut}]

>From here on we will assume that $m$ is a fixed number and that $m
\sim {c^2n}/{8}$ (this is justified by \expref{Lemma}{lemma:
number_of_pairs}). The formula $\phi'$ is a random element of 
$\binom{P}{m}$. Let $\phi'' \in_R P^m$ (\ie, $\phi''$ is composed of
$m$ random and independent samples from $P$). Denote by $\Omega'$
the event that every matched pair of clauses in $\phi''$ has a
distinct inducing pair. Conditioned on $\Omega'$, $\phi''$ has the
distribution of $\phi'$. As \expref{Lemma}{lemma: density} shows, the
event $\Omega'$ is not too rare (the proof of \expref{Lemma}{lemma:
density} is deferred to the end of this section).
\begin{lemma}\label{lemma: density}
Let $m = \frac{c^2n}{8}$. For $\phi'' \in_R P^m$  it holds that
$\Pr[\Omega'] \geq e^{-c^4/128}$
\end{lemma}
\noindent Furthermore, as \expref{Lemma}{lemma: G phi'' cut} states,
$G_{\phi''}$ is unlikely to have a large cut.
\begin{lemma}\label{lemma: G phi'' cut}
Let $m \sim \frac{c^2n}{8}$. For $\phi'' \in_R P^m$ with probability
$1-o(1)$ it holds that $G_{\phi''}$ has no $(\frac{1}{2} +
\frac{4}{c})$-cut.
\end{lemma}
\noindent Combining Lemmas~\ref{lemma: density},~\ref{lemma: G phi''
cut} we now show that $G_{\phi'}$ is unlikely to have a large cut.
The reasoning is as follows:
\begin{align}
\Pr_{\phi' \in_R \binom{P}{m}}[G_{\phi'} \text{ has $(\tfrac{1}{2}
+ \tfrac{4}{c})$ cut}] = \Pr_{\phi'' \in_R P^m}[G_{\phi''} \text{
has $(\tfrac{1}{2} + \tfrac{4}{c})$ cut} \;|\;
\Omega']\\
\leq \frac{\Pr_{\phi''}[G_{\phi''} \text{ has $(\tfrac{1}{2} +
\tfrac{4}{c})$ cut}]}{\Pr_{\phi''}[\Omega']} \leq
\frac{o(1)}{e^{-c^2/128}} =o(1)\enspace,
\end{align}
where the last equality is because $c$ is a fixed constant.

A similar argument shows that if w.h.p. $G_{\phi''}^{2eq}$ has no
$({1}/{2} + \e)$-cut and $\phi''$ is $\e$-balanced, then w.h.p.
$G_{\phi'}^{2eq}$ has no $({1}/{2} + \e)$-cut and $\phi'$ is
$\e$-balanced. Hence, we only need to prove the following lemmas.
\begin{lemma}\label{lemma: G phi'' 2eq cut}
Let $m \sim \frac{c^2n}{8}$. For $\phi'' \in_R P^m$ with probability
$1-o(1)$ it holds that $G_{\phi''}^{2eq}$ has no $(\frac{1}{2} +
\frac{5}{c})$-cut.
\end{lemma}

\begin{lemma}\label{lemma: phi'' imbalance}
Let $m \sim \frac{c^2n}{8}$. For $\phi'' \in_R P^m$, with
probability $1-o(1)$ it holds that $\phi''$ is
$\frac{3}{c}$-balanced.
\end{lemma}
\end{proof}
To complete the proof of \expref{Theorem}{thm: random graphs small
MaxCut} we now give the proofs of Lemmas~\ref{lemma: G phi'' cut},
\ref{lemma: G phi'' 2eq cut}, \ref{lemma: phi'' imbalance},
and \ref{lemma: density}.

%\begin{lemma}
%Let $m \sim \frac{c^2n}{8}$. For $\phi'' \in_R P^m$ with probability
%$1-o(1)$ it holds that $G_{\phi''}$ has no $(\frac{1}{2} +
%\frac{4}{c})$-cut.
%\end{lemma}
\begin{proof}[Proof of \expref{Lemma}{lemma: G phi'' cut}]
Fix a partition $(V_1,V_2)$ of the vertices of $G_{\phi''}$. We
denote by $W_{\phi''}(V_1,V_2)$ the number of edges crossing the
cut $(V_1,V_2)$ in $G_{\phi''}$. For $\phi'' \in_R P^m$ the
expectation of $W_{\phi''}(V_1,V_2)$ is at most $6m({1}/{2} +
{1}/{n})$. For any formula $\phi'' \in P^{m}$ we let
$f(\phi'')$ be equal to $W_{\phi''}(V_1,V_2)$.  Let $X_i$ be the
expected value of $f$ after exposing the first $i$ pairs of
$\phi''$ (for $i=0,1, \ldots ,m$). The sequence $X_0,X_1,\ldots
,X_m$ is a martingale. The following two facts:

\begin{enumerate}
\item for any $\phi'' \in P^m$,
changing one matched pair (an element of $P$) can change the value
of $f$ by at most $4$ (each clause forms a triangle that contributes
at most $2$ edges to the cut), and

\item $\phi''$ is taken from a product measure $P^m$,
\end{enumerate}
imply that for every $i$ it holds that $|X_i - X_{i+1}| \leq 4$
(see Theorem 7.4.1 from~\cite{AlonSp02}). Azuma's inequality
implies that for any $\lambda >0$
\[
\Pr[f(\phi'') - 6m(\tfrac{1}{2} + \tfrac{1}{n} )
> \lambda] < e^{-\frac{\lambda^2}{2m 4^2}}\enspace.
\]
Setting $\lambda = \frac{17m}{c}$ and using $m \sim \frac{c^2n}{8}$
we derive
\[
\Pr_{\phi'' \in_{R} P^m}[W_{\phi''}(V_1,V_2) > 6m(\tfrac{1}{2} +
\tfrac{4}{c})] < 2^{-1.1n}\enspace.
\]
Using the union bound over all possible cuts we derive that w.h.p.
(for $\phi'' \in_R P^m$) the graph $G_{\phi''}$ has no
$(\tfrac{1}{2} + \tfrac{4}{c})$-cut.

\end{proof}


%\begin{lemma}\label{lemma: G phi'' 2eq cut}
%Let $m \sim \frac{c^2n}{8}$. For $\phi'' \in_R P^m$ with probability
%$1-o(1)$ it holds that $G_{\phi''}^{2eq}$ has no $(\frac{1}{2} +
%\frac{5}{c})$-cut.
%\end{lemma}

The proof of \expref{Lemma}{lemma: G phi'' 2eq cut} is very similar to
the proof of \expref{Lemma}{lemma: G phi'' cut}, details are omitted.




\begin{proof}[Proof of \expref{Lemma}{lemma: phi'' imbalance}]
We first bound the expected imbalance of $\phi''$. The total
imbalance of $\phi''$ is bounded by the sum of the imbalances of
$\phi_1,\phi_2$ (where $\phi_1$/$\phi_2$ are formed by taking the
first/second clause from each matched pair of $\phi''$). Since
$\phi_2$ has the same distribution of $\phi_1$, it is enough to
bound the expected total imbalance of $\phi_1$ (and then multiply by
two). The total imbalance of $\phi_1$ is the sum of the imbalances
of all variables in $\phi_1$, \ie, $\sum_{i=1}^n Im_i$ (the
imbalance of $x_i$ in $\phi_1$ is denoted by $Im_i$).

For any variable $x_i$ we denote by $d_i$ the number of
appearances of $x$ in $\phi_1$. For $\phi'' \in_R P^m$ it holds
that $\sum_{i=1}^n \Exp[d_i] = 3m$. By symmetry, for every $i$ it
holds that $\Exp[d_i] = {3m}/{n}$. We denote $d \eqdef
{3m}/{n}$. Given that $d_i=k$ the polarities of the
appearances of $x_i$ are still random. Given that $d_i=k$, the
imbalance of $x_i$ is the absolute value of the sum of $k$
independent random variables, where each random variable has
probability $1/2$ of being $1$ and probability $1/2$ of being
$-1$. Hence $\Exp[Im_i^2 \;|\; d_i=k] = k$. It then follows that
\begin{align}
\Exp[Im_i^2] = \sum_{k} \Pr[d_i =k] \cdot \Exp[Im_i^2 \;|\;d_i=k] =
\sum_{k} k\Pr[d_i =k] = \Exp[d_i] = d\enspace,
\end{align}
where all probabilities and expectations are taken over $\phi''
\in_R P^m$. Using the convexity of the square function
\begin{align}
\Exp[Im_i] \leq \sqrt{\Exp[Im_i^2]} \leq \sqrt{d}\enspace.
\end{align}
So far we showed that $\Exp_{\phi'' \in_R P^m}\bigl[\sum_{i=1}^n Im_i\bigr]
\leq n \sqrt{d}$. Thus, for $\phi'' \in_R P^m$ the total imbalance
of $\phi''$ is expected to be less than $2n\sqrt{d}$. We will now show
that the total imbalance of $\phi''$ is not likely to be too large
relative to $2n\sqrt{d}$. For any $\phi'' \in P^m$ we let
$f(\phi'')$ be the total imbalance of $\phi''$. Changing one
matched pair of clauses in $\phi''$ changes the total imbalance of
$\phi''$ by at most $12$. Azuma's inequality implies that for any
$\lambda >0$
\[
\Pr_{\phi'' \in_R P^m} \bigl[f(\phi'') - 2n\sqrt{d} > \lambda\bigr] <
e^{-\frac{\lambda^2}{2m 12^2}}\enspace.
\]
Setting $\lambda = n \sqrt{d}$ and using $d= {3m}/{n}$ we
derive
\[
\Pr_{\phi'' \in_R P^m}[\text{the total imbalance of $\phi''$}
> 3n\sqrt{d}] < e^{-\frac{n}{96}}\enspace.
\]
It then follows that the normalized imbalance of $\phi''$ is w.h.p.
bounded by ${3n\sqrt{d}}/{6m} \leq {3}/{c}$ (using $m \sim
{c^2n}/{8}$ and $d= {3m}/{n}$).
\end{proof}

\begin{proof}[Proof of \expref{Lemma}{lemma: density}]
We generate $\phi''$ iteratively by choosing each time
(independently) a random element of $P$. For each new random element
of $P$, the probability for it to have an inducing pair which is
different from all previous inducing pairs is $\geq 1 - {m}/{N}$,
where $N=2n\cdot (2n-2)$ is the number of possible inducing pairs. It then
follows that with probability of at least
\[
\left(1 - \frac{m}{2n(2n-2)} \right)^m \stackrel{(1)}{\geq} \exp
\left(-m\Bigl(\tfrac{m}{2n^2(1-1/n)}\Bigr)\right) \stackrel{(2)}{\geq}
e^{-c^4/128}\enspace,
\]
each of the matched pairs in $\phi''$ has a distinct inducing
pair. Inequality (1) is because $1-x \geq e^{-2x}$ holds for every
$x \in [0,{1}/{2}]$. The constant in the exponent following
Inequality (2) is derived by taking $m = {c^2n}/{8}$.
\end{proof}

\section{Practical considerations for the refutation
algorithm}\label{sec: practice considerations}


Recall that our refutation algorithm extracts from $\phi$ a
subformula $\phi'$ that contains matched pairs of clauses, and
then refutes $\phi'$. The longer $\phi'$ is, the easier it is to
refute it. For simplicity, we matched a pair of clauses only if
they agreed on their last two literals. Moreover, every clause of
$\phi$ participated in at most one pair of matched clauses in
$\phi'$, even though a clause may be eligible to participate in
more than one matched pair. In practical implementations, it is
advantageous not to have these restrictions, and thus get a longer
formula $\phi'$. In particular, we may allow the same clause to
participate in several pairs of matched clauses, by duplicating
it. More importantly, we may match any two clauses that share two
variables (regardless of the polarity of the variables, and of
their location within the clauses). For example, the two clauses
$(x,w,\ell)$ and $(\bar{w},\ell,y)$ can be matched. If $\phi'$ is
satisfied by an assignment $A$ that has the $3$XOR property, then
one step of Gaussian elimination gives in this case
$A(x)+A(w)+A(\ell)+A(y)+A(\bar{w})+ A(\ell) =0 \bmod 2$, thus $A(x) +
A(y) = 1 \bmod 2$. Hence, we will associate the edge $(x,y)$ with
this pair of matched clauses so that the edge induced by this pair
in $G_{\phi'}^{2eq}$ will cross the cut which corresponds to the
assignment $A$. Using the principles above, the number of pairs of
matched clauses that one expects to extract from a random formula
of length $cn^{3/2}$ is roughly $\binom{3cn^{3/2}}{2}/\binom{n}{2} \simeq 9c^2n$.

We used the principles above to implement the algorithm in practice.
In the current implementation, the problem of refuting $\phi'$ is
reduced to strong refutation of two 2XOR formulas. We performed $2$
eigenvalue computations on matrices of size $n \times n$, whereas
the original refutation algorithm performed eigenvalue computations
on matrices of size $2n \times 2n$. Our implementation uses the
conditions of \expref{Theorem}{thm: practical} to refute $\phi'$. Before
stating \expref{Theorem}{thm: practical} we need the following
definition.

\begin{definition}
Let $\phi$ be a 2XOR formula with $m$ clauses and $n$ variables.
$A_{\phi}$ is the following $n \times n$ symmetric matrix
associated with $\phi$. Initially $A_{\phi}$ is the zero matrix.
For each clause of the forms $(x,\bar{y})$ or $(\bar{x},y)$ we add
$+1$ to positions $A(x,y)$ and $A(y,x)$.
%% ToC edit: added ``and'' above. Authors please check. Similarly below.
 For each clause of the forms
$(x,y)$ or $(\bar{x},\bar{y})$ we add $-1$ to positions
$A(x,y)$ and $A(y,x)$.
\end{definition}

A similar matrix can be defined for a 2EQ formula, just by reducing
the 2EQ formula into a 2XOR formula.

\begin{theorem}\label{thm: practical}
Let $\phi'$ be a 3CNF formula with $m$ pairs of matched clauses
and $n$ variables. Let $\phi_{2xor}$ be the 2XOR formula with $6m$
clauses induced by replacing each 3CNF clause of $\phi'$ by three
2XOR clauses (one for every two literals). Let $\phi_{2eq}$ be the
2EQ formula with $m$ clauses induced by adding pairs of matched
clauses modulo $2$. If the
following hold then $\phi'$ is not satisfiable:
\begin{enumerate}
\item[(1)] $\phi'$ is $\delta$-balanced.
\item[(2)] Let $\lambda_{2xor}, \lambda_{2eq}$ denote the largest
eigenvalues of $A_{\phi_{2xor}}, A_{\phi_{2eq}}$ respectively; then
$$
3\delta + \frac{n}{4m}(\lambda_{2xor} + \lambda_{2eq}) <
\tfrac{1}{2}\enspace.
$$
\end{enumerate}
\end{theorem}

The proof of \expref{Theorem}{thm: practical} will follow shortly.


\begin{lemma}[2XOR strong refutation]\label{lemma: 2xor refutation}
Let $\phi$ be a 2XOR formula with $m$ clauses. If $\lambda$ is the
maximum eigenvalue of $A_{\phi}$ then $\phi$ is at most
$(\tfrac{1}{2} + \e)$ satisfiable, for $\e = \tfrac{ \lambda
n}{4m}$.
\end{lemma}

\begin{proof}
Let T be an assignment satisfying the most clauses as 2XOR. Let
$x$ be the $\pm 1$ vector which corresponds to T: $x_i$ equals
$+1$ if $T(i) =$ true, otherwise $x_i =-1$. It holds that
$x^tA_{\phi}x = A_{\phi} \bullet xx^t$ (where for any two matrices
$A,B$ of the same dimensions, $A \bullet B = \sum_{i,j}A(i,j)
B(i,j)$). Every 2XOR clause satisfied by T contributes $2$ to
$A_{\phi} \bullet xx^t$ whereas every unsatisfied clause
contributes $-2$. If T satisfies exactly $({1}/{2}  +\e)m$
clauses, then
\begin{align}
\frac{x^tA_{\phi}x}{x^tx} = \frac{(\tfrac{1}{2} +\e)m(+2) +
(\tfrac{1}{2} -\e)m(-2) }{n} = 4 \e m/n\enspace.
\end{align}
Since $x^tA_{\phi}x/x^tx \leq \lambda$ we derive $\e \leq \lambda
n/4m$.

\end{proof}

\begin{lemma}[3XOR property certification]\label{lemma: 3xor property certification}
Let $\phi$ be a 3CNF formula which is $\delta$-balanced with $m$
clauses and $n$ variables. Assume that the 2XOR formula induced by
replacing each 3CNF clause by $3$ 2XOR clauses is at most
$(\tfrac{1}{2} + \gamma)$ satisfiable. Then $\phi$ has the $(1 -
\tfrac{3}{2}(\delta + 2\gamma))$3XOR property.
\end{lemma}

\begin{proof}
Assume that the assignment T satisfies $\phi$ as 3CNF. Denote by
$\alpha_i$ the number of clauses satisfied exactly once, twice,
three times respectively ($\sum_{i=1}^3 \alpha_i=1$). We need to
upper bound $\alpha_2$. It can not be that $\alpha_3$ is too small,
as in such case there are many satisfied clauses in the induced 2XOR
formula: if a clause of $\phi$ is satisfied exactly once or exactly
twice, then out of the three 2XOR clauses induced by it, two are
satisfied as 2XOR. Since the number of satisfied 2XOR clauses (in
the induced 2XOR formula) is bounded by $3m({1}/{2} + \gamma)$,
we derive $2m(\alpha_1 + \alpha_2) \leq 3m({1}/{2} + \gamma)$,
or equivalently $\alpha_3 \geq {1}/{4} - {3\gamma}/{2}$.
The imbalance of $\phi$ is bounded by $\delta$, thus the number of
satisfied literals in $\phi$ is at most $\tfrac{3}{2}(1+ \delta)m$.
This implies
\[
\frac{3(1 +\delta)}{2} \geq \alpha_1 +
2\alpha_2 + 3\alpha_3.
\]
Using the facts: $\alpha_1 = 1 - \alpha_2 - \alpha_3$ and $\alpha_3 \geq \tfrac{1}{4} -
\tfrac{3}{2} \gamma$ we deduce
\begin{align}
\tfrac{3}{2}(1 + \delta) \geq \alpha_2 + 1 + 2(\tfrac{1}{4}
-\tfrac{3}{2} \gamma),
\end{align}
and thus $\tfrac{3}{2}(\delta + 2 \gamma) \geq \alpha_2$.
\end{proof}


\begin{proof}[Proof of \expref{Theorem}{thm: practical}]
Assume that $\phi'$ is satisfiable as 3CNF. We show that property
(1) contradicts property (2). Set
\[
\e_{2xor} \eqdef  \frac{\lambda_{2xor}n}{4 \cdot 6m},\qquad\text{and}\qquad \e_{2eq}
\eqdef \frac{\lambda_{2eq}n}{4m }. \label{eqn: e bounds}
\]
By \expref{Lemma}{lemma: 2xor refutation} $\phi_{2xor} $ and $\phi_{2eq}$
are at most $({1}/{2} + \e_{2xor})$ and $({1}/{2} + \e_{2eq})$
satisfied, respectively . It then follows, using \expref{Lemma}{lemma:
  3xor property certification}, that $\phi'$ is $(1 - ({3}/{2})(\delta + 2
\e_{2xor}))$ satisfied as 3XOR. Each pair of matched clauses of $\phi'$
for which both clauses are satisfied as 3XOR yields a satisfied 2EQ
clause of $\phi_{2eq}$. As $\phi_{2eq}$ is at most $({1}/{2} +
\e_{2eq})$ satisfied, we conclude that
\begin{align}
1 - 3(\delta + 2 \e_{2xor}) & \leq \tfrac{1}{2} + \e_{2eq},
~~\text{or equivalently}\\
3(\delta + 2 \e_{2xor}) + \e_{2eq} & \geq \tfrac{1}{2}\enspace.
\end{align}
Substituting $\e_{2xor}$ and $\e_{2xor}$ according to the definitions in
\eqref{eqn: e bounds} we derive 
\[
3 \delta +
\frac{n}{4m}(\lambda_{2xor} + \lambda_{2eq}) \geq \tfrac{1}{2}\enspace,
\]
which contradicts property (2).
\end{proof}




\noindent We generated several random formulas with $n = 5 \cdot 10^4$
variables and $27335932 = \lceil 2.445 \cdot n^{3/2} \rceil$ clauses. Our algorithm
refuted all of them (our current implementation fails to refute
formulas of significantly lower clause density). We give more detailed
results for one specific (though typical) run. Our algorithm extracted
a subformula $\phi'$ with $m = 2689832$ pairs of matched clauses.
\expref{Table}{table} summarizes the values computed by the
algorithm along with a heuristic estimation of what we could have
expected them to be.
\begin{table}[ht]
\begin{center}
\begin{tabular}{|c|c|c|c|c|c|}
\hline & $m $ & $ \lambda _{2eq}$ & $ \lambda _{2xor} $ & $ \delta
$&
Bound\\
\hline \hline Algorithm & 2689832 & 20.8961& 54.6503& 0.048662&
$0.49706 < \tfrac{1}{2}$\\
\hline Heuristic bound: & 2690112 & 20.7454 & 50.8157& 0.05565&
$0.49946 < \tfrac{1}{2}$
\\
(using formula) & $\approx 9c^{2}n$& $2 \sqrt{18c^2}$& $2
\sqrt{108c^2}$  & $\sqrt{\frac{n}{6m}}$& $ \frac{n}{4m}(
\lambda_{2eq} + \lambda_{2xor}) + 3 \delta $
\\
\hline
\end{tabular}\\
\caption{Results for a random formula with $5 \cdot 10^4$ variables
and $27335932$ clauses.}
\label{table}
\end{center}
\end{table}

Let us explain the heuristic bounds used in the table. To estimate
the largest eigenvalue of a symmetric matrix we use the formula
$2\sqrt{d}$ where $d$ is the average $\ell_1$ norm of each of the rows.
This bound is known to be true for various random graph models, but
apparently is too optimistic for $A_{\phi_{2xor}}$. To estimate the
imbalance $\delta$ we assume that each variable appears exactly
$6m/n$ times, each time with random polarity. The difference between
the number of positive and negative appearances behaves like the
distance from $0$ when performing a random walk of length $6m/n$ on
$\mathbb{Z}$ (starting from $0$). The expected square of the
distance is $6m/n$, and $\sqrt{6m/n}$ is an upper bound on the
expected distance. We estimated the expected normalized imbalance as
$n \sqrt{6m/n}/6m = \sqrt{{n}/{6m}}$.

A few words about the implementation of our algorithm. The part of
extracting the subformula $\phi'$ was implemented in C. The other
parts (computing the imbalance and the eigenvalues) were implemented
in Matlab. To save memory we used Matlab's sparse matrix objects.
The heavy part of the algorithm was computing the largest
eigenvalues of the two matrices $A_{\phi_{2xor}},A_{\phi_{2eq}}$.
This part took $63$ minutes on an Intel Xeon CPU 1700MHz with 256K
cache and 2Gbyte memory (running the Linux operating system).

It may be interesting to see if other refutation algorithms (which
may use various optimized versions of resolution, OBDDs,
backtracking, to name a few of the common algorithmic principles
used for refutation) can handle random formulas with as many
variables as those handled by our algorithm. We have not made a
serious attempt to check this.



\subsection*{Acknowledgements}
This research was supported by a grant from the G.I.F., the
German-Israeli Foundation for Scientific Research and Development.
We thank Amin Coja-Oghlan for useful discussions.


%% bibliography{tocabbrev,a002}

\bibliography{a002}

\begin{tocauthors}
  \begin{tocinfo}[Uri]
    Uriel Feige \tocabout\\
    Microsoft Research\\
    One Microsoft Way\\
    Redmond, WA 98052-6399\\
    \hspace*{1cm}and\\
    Weizmann Institute of Science\\
    Rehovot 76100, Israel\\
    urifeige\tocat{}microsoft\tocdot{}com \\
    uriel\tocdot{}feige\tocat{}weizmann\tocdot{}ac\tocdot{}il \\
   \url{http://www.wisdom.weizmann.ac.il/~feige}
  \end{tocinfo}
  \begin{tocinfo}[Eran]
    Eran Ofek \tocabout\\
    Department of Computer Science \\
    and Applied Mathematics\\
    Weizmann Institute of Science\\
    Rehovot 76100, Israel\\
    eran\tocdot{}ofek\tocat{}weizmann\tocdot{}ac\tocdot{}il \\
    \url{http://www.wisdom.weizmann.ac.il/~erano}
  \end{tocinfo}
\end{tocauthors}

\begin{tocaboutauthors}
\begin{tocabout}[Uri]
\textsc{Uriel Feige} is a member of the
\href{http://research.microsoft.com/theory/}{theory group} at
Microsoft Research, currently on leave from the
\href{http://www.wisdom.weizmann.ac.il/}{Weizmann Institute}. His
main research interests involve studying the borderline between P
and NP as it manifests itself in approximation algorithms,
heuristics, and exact algorithms that are not necessarily
polynomial time. Other activities he enjoys include playing the
piano, dancing with his wife, and helping his kids with their
homework.
\end{tocabout}
\begin{tocabout}[Eran]
\textsc{Eran Ofek} obtained his \phd\ in Computer Science from
the \href{http://www.weizmann.ac.il/}{Weizmann Institute of Science}
in 2006. His \phd\ and \msc\ advisor was
\href{http://www.wisdom.weizmann.ac.il/~feige}{Uriel Feige}. His
research interests include optimization algorithms, random
structures, and average case complexity. On his spare time he likes
to play soccer, volleyball, or spend time with his two sons.
\end{tocabout}
\end{tocaboutauthors}

\end{document}
