.comment-link {margin-left:.6em;}

A Blogscursion into the World of Expanders

Friday, October 14, 2005

Summary of first few lectures

We defined an expander family as a familty of undirected, regular graphs on n vertices (for increasing n) such that the degree is constant and
  • the edge expansion (hG=min|S|≤1/2 |E(S,S')|/|S|) is at least a constant, OR
  • the spectral gap (d-λ1) is at least a constant.

(The equivalence follows from (d-λ1)/2 ≤ hG ≤ sqrt[2d(d-λ1)], proven later.)

A slightly stronger definition of spectral expansion requires d-λ to be at least a constant, where λ=max(|λ1|,|λn-1|). Then we have the following for an expander G (writing α for λ/d).


  • Optimal Expanders: If we allowed "fractional edges," a clique (with self-loops) with all edge weights d/n would be the best degree d expander (with λ=0). However, with "integral edges," λ ≥ 2 sqrt[d-1] - o(1). So called Ramanujan Graphs have λ = 2 sqrt[d-1]. So one can think of α being O(1/sqrt d).

  • Expander Mixing Lemma: For any cut (S,T) in G, number of edges across the cut is in the range (d/n)|S| |T| ± dα sqrt[|S| |T|]. (Note that the smaller the α the closer this is to the value for the "d/n-clique.")

    An application of this lemma is in a simple "deterministic amplification" of BPP algorithms: the algorithm is run not just on one random-tape, but on all its d "neighbors" too -- neighbors in an arbitrary expander with the space of all random-tapes as the vertex set -- driving the error probability down to O(α2), starting from say 1/4.

  • Rapid Mixing: Starting from a distribution p on the n vertices, a t-step random walk on G results in a new distribution Mtp exponentially close to the uniform distribution u. More precisely, the L2-norm of the difference Mtp - u is at most αt (and hence the L1-norm, which measures twice the statistical difference, is at most (sqrt n)αt).

  • Sampling by Random Walks: Probability that a t-step random walk is confined to a set B of size βn is between (β-2α)t and (β+α)t. (Note that the smaller the α the closer this is to βt corresponding to the random walk on the clique.)

    The upperbound lets one amplify a BPP algorithm to reduce the error probability exponentially in t using only t log d extra random bits than the original BPP algorithm (if its error probability is a sufficiently small constant to begin with).

    The two bounds together can be used to deterministically sample a graph such that the relative size of the max-clique is preserved. This lets one reduce the approximation factor of an algorithm for approximating the size of a largest clique, by first using a graph exponentiation which preserves the relative log-size of the max-clique and then sampling it down as above to a polynomial sized graph in which the original approximation algorthm is then run. (Such an improvement in approximation factor can be used to amplify hardness of approximation results.)

Thursday, October 13, 2005

Amplifying the hardness of one-way functions

We began by defining a negligible function. A function β(n) is said to be negligible if ∀ c > 0 β(n) < 1/(n^c) for sufficiently large n.

We then defined a one-way permutation: a polynomial-time computable permutation f is α(n)-one way (denoted by α-OWP) if for any probabilistic polynomial-time algorithm A:
Pr[A(f(x)) = x] ≤ 1 - α(n) + β(n)
where the probability is over all strings x of length n and the coin-flips used by the algorithm, and β(n) is some negligible function. Intuitively, f is α-OWP if the fraction of 'hard' inputs is close to α.

Finally, for sets X and Y where X ⊆ Y, we say that X has density μ if |X| = μ|Y|

It is not obvious that one-way permutations exist, but our first theorem (due to Yao) shows that we can construct strong one-way permutations from weak ones. Given f, a (1/p(n))-OWP on n-bit strings for some polynomial p(n), we define:
F(x1, x2, ..., xt) = f(x1)f(x2)...f(xt)
where |xi| = n, and t = n⋅p(n). That is, F is a function on n2⋅p(n)-bit strings formed by applying f to each part of size n, and concatenating the results. To invert F, we must invert f on each part; the large number of parts means that we expect several of them to be hard for f. It can be shown that in fact, F is hard to invert on all but a negligible fraction of the range.

The chief problem with this method is that the input length blows up by a large polynomial factor. If f used n bits, F uses n2p(n) bits. Therefore, the hardness of inverting F on large inputs (say of length m) relates to the hardness of inverting f on much smaller inputs (those of length mε, where ε is a constant ≤ 1/2).

A better idea to amplify hardness is to run f on many inputs, but instead of selecting the inputs independently at random, construct an expander graph where each node corresponds to an n-bit input string. Now, select the first node randomly, and then perform a random walk in the graph. Run f on every node v in the walk; the nodes are 'independent enough' that with good probability, we will find some node v such that f(v) is hard to invert.

Lemma 1: Let α(n) be at least 1/p(n) for some polynomial p. The following are equivalent:
(a) f is α-OWP
(b) For any polynomial-time randomized algorithm A, ∃ H ⊆ Σn of density ≥ α such that
Pr[A(f(x)) = x | x ∈ H] is negligible.

Intuitively, H is the 'hard' set for algorithm A; that is, the lemma says that if f is an α-OWP, there is some set H (of density at least α) of inputs that is hard to invert, and the converse. It follows almost immediately from the definition of an α-OWP that (b) implies (a). The other direction is also not difficult to prove; if there is an algorithm A with a hard set of density < α, repeated applications of this algorithm allow us to invert f with probability > 1 - α, contradicting the fact that f is α-OWP.

Lemma 2: Let G be an (N, d, λ) expander. (G has N vertices, is d-regular, and λ is the normalized second-largest eigenvalue.) For any subset S of the vertices, with density μ, the probability that a random walk of length k is contained entirely within S is ≤ μk/2.

The proof of this lemma uses a technique we saw earlier. Let P be the projection vector with 1s in positions corresponding to elements of S, and A the normalized adjacency matrix of G. If u is the uniform unit vector, the desired probability is ||(PA)k u|| 1. (To see this, observe that we begin with a vertex picked uniformly at random; the probability that we are in S after a single step is given by the L1-norm of PA(u) because Au gives us the probability distribution over vertices after a single step, and we sum over all the probabilities in S.)

The algebra required to bound ||(PA)k u|| is a little too tedious for a blog post (with applications of the law of cosines, the Cauchy-Schwartz inequality, etc.), but it is fairly straightforward.

Theorem 1: (Main) If f is (1-α)-OWP on n-bit input strings (where α ≥ 1/2), we can construct Fk, which is (1 - αk/2)-OWP on input strings of length n + k log d, where k is a polynomial and d a constant.

Proof: Let {Gn} be an expander family on 2n vertices, with degree d, and λ ≤ 1/2. We label edges such that RG_n(v, i) = (u,i) for all v and i. That is, we label edges such that if the edge between v and u is the ith edge of v, it is also the ith edge of u; an alternate way to think about this is that the edge labels give us d disjoint perfect matchings in the graph. We now define Fk(x, σ1, σ2, ..., σk) (where |x| = n, and each σi ∈ {1...d}) as the output of the function:
for i ← 1 to k
   x ← f(x)
   x ← RG_n(x, σi)
return (σ1, σ2, ..., σk, x)

That is, we apply f on x, take a step in the graph, and repeat this process k times. Another interpretation is that we take a k-step walk based on the σis in the graph G'n, which has the same vertex set as Gn, but has an edge between u and v ⇔ Gn has an edge between f(u) and v. Since f is a permutation, G'n is also an expander with the same properties as Gn.

To see that the function Fk is a permutation, it is sufficient to see that it is injective. Suppose two different inputs produced the same output; this implies that the walks ended at the same vertex and that the sequence of &sigmais is identical. But two such walks that begin at distinct vertices can never converge, because f is one-to-one, and the &sigmaith edges from two different vertices cannot lead to the same vertex, because of the way edges are labelled.

We show that Fk is a (1-αk/2)-OWP, by using any algorithm A that inverts Fk to construct an algorithm A' that inverts f; if the hard set for A has size < 1-αk/2, the hard set for A' will have size < α. Given A, the algorithm A' works as follows to invert an input y:
Pick i from {1, 2, ..., k}
∀ 1 ≤ j ≤ k Pick σj uniformly at random from {1, 2, ..., d}
Set y' ← Fk-i-1(RG_n(y, σi), σi+1, ..., σk
Use A to find (x', σ1, σ2,...,σk) ← Fk-1(y', σ1, σ2,...,σk)
Return x ← Fi-1(x', σ1, σ2,...,σi-1)

Intuitively, to find the pre-image of y, A' picks i to be some position in a k-step walk. The algorithm follows the walk to the end, inverts it, and then follows the first part of the walk, stopping just before it reaches y. This vertex will be the predecessor of y in the walk. If A successfully inverts Fk, A' will invert f.

Since f is a (1-α)-OWP, A' must have a hard set of density ≥ 1-α. Let H be the set of vertices corresponding to the inputs in the hard set, and H* be the set of length k-walks in the graph that intersect H. From lemma 1, we know that H* has density at least (1-αk/2). If we can show that H* is a hard set for A, it follows that Fk is a (1-αk/2)-OWP.

If H* is not a hard set for A, then the algorithm succeeds with non-negligible probability (≥ 1/p(n)) on H*. For every invocation of A by A', with probability ≥ 1/k, the index i points to some vertex x ∈ H. Considering all paths in H*, the probability of A succeeding is ≥ 1/p(n). Therefore, the probability of A' succeeding on H is ≥ 1/(k ⋅p(n)), which is non-negligible; this implies that H is not hard for A'. But this is a contradiction, and so H* must be a hard set for A. This completes our proof of the main theorem.

Theorem 2: If f is a 1/p(n)-OWP for some polynomial p(n), then for any polynomial q(n), ∃ a (1- 1/q(n))-OWP with only an O(log n) (additive) increase in input length.

Proof: We use the construction of theorem 1 in two stages: First, starting with a (1 - (1-1/p(n)))-OWP, set k = O(1) and construct a (1 - (1-1/p(n))k/2)-OWP by applying theorem 1; this only increases the input length by Θ(1) bits. Repeating this process O(log p(n)) = O(n) times, we can achieve a (1- 1/2)-OWP, with a total of O(log n) increase in input size. Now, set k = log q(n) = O(log n) and apply the construction to the (1- 1/2)-OWP; this gives us the desired (1-1/q(n))-OWP. The total increase in the iterations of stage 1 was O(log n), as it was in the single iteration of stage 2.

Saturday, October 01, 2005

Introduction to Zig-Zag

Starting September 21st, Erin introduced us to the Zig-Zag product as a means of expander construction.

In leading up to the construction, we first defined a rotation map of a graph G = (V,E) to be a function from pairs (v, i) to (w, j) where v,w ∈ V and the i-th incident edge to v leads to w, and it leads through the j-th incident edge to w. Further, we will use the notation that G is a (N, D, λ)-graph if G has N-vertices, is D-regular, and has second largest eigenvalue in absolute value, λ = max α ⊥ 1n | ⟨ α, M α ⟩ | / ⟨ α, α ⟩.

Next we explored the results of squaring and the tensor product of a graph. Namely, we considered Gt for G an (N,D, λ)-graph. Proposition 1 stated that Gt is an (N, Dt, λt)-graph. To see this, consider the normalized adjacenty matrix for Mt for Gt. Then Mtu = λtu from repeated iterations of Mu = λu.

We then defined the tensor product G of G1 and G2 where G1 is N1-vertex D1-regular and G2 is N2-vertex D2-regular to be a N1 · N2-vertex D1 · D2-regular graph such that the rotation map of the tensor product G is

RG( (v,w), (i,j) ) = ( (v',w'), (i',j') ) where v ∈ G1, w ∈ G2, i ≤ D1, j ≤ D2 and RG1(v,i) = (v',i'), RG2(w,j) = (w',j')

This is a replacement of each vertex in G1 with a copy G2 and connecting all vertices based on the vertices they represent in G1.

As the graph can be represented by the adjacency matrix, we consider the tensors effect on the two adjacency matricies A and B. If we let Au = λ1u and Bv = λ2v and let C be the tensor of A and B and let w be the tensor of u and v, then the product of Cw = λ1λ2w. We see this by considering the tensor of Au and Bv, extracting the appropriate position (i,j) in the resulting matrix.

Now, we consider the effect of the tensor. We considered Proposition 1 which states,

Let G1 be a (N1, D1 λ1) and G2 be a (N2, D2, λ2) graph. Then the tensor of the two is (N1 · N2, D1 · D2, max{ λ1, λ2 }).

To see this, consider the eigenvalues of the normalized adjacency matrix of the tensor. These eigenvalues depend on the normalized adjancency matricies of the original graphs G1 and G2. As the result of the tensoring of these normalized adjacency matricies results in a largest eigenvalue of 1, the second largest value is either λ1 or λ2.

From here we defined the Zig-Zag product. Consider G1 a D1-regular graph on N-vertices and G2 a D2-regular graph on D1-vertices. Replacing each vertex in G1 with a copy of G2, we create an edge in this resulting graph on [N] × [D1] by adding an edge for every path of length 3 of the following form:

(1) Take a random step inside a copy of G2
(2) Take an edge inside G1 which represents the jump from the current embedded G2 to another.
(3) Take a random step inside the copy of G2

Thus, an edge between vertex i and j exists if there is some verticies i' and j' such that there is an edge between i and i' in am embedded copy of G2, an edge in the tensor product between i' and j', and an edge in an embedded copy of G2 between j' and j.

This can be expressed formally via the rotation map of G, G1, and G2 as follows:

Let v ∈ [N], k ∈ [D1], and i,j ∈ [D2].
Let (k', i') = RG2(k,i), the first step within a cloud.
Let (w, l') = RG1(v, k), the step in G1.
Let (l, j') = RG2(l' j), the second step within a cloud.

Then RG( (v,k), (i,j)) = ((w,l), (j',i')).

The point of exploring the zig-zag product is that if the two graphs G1 and G2 are good expanders we expect that their product G will also be a good expander.

To this end, we have Theorem 4 which follows. Let G1 be a (N, D1, λ1) graph and G2 be a (D1, D2, λ2) graph. Then the zig-zag G of G1 and G2 is (N · D1, D22, f( λ1, λ2 )) where f( λ1, λ2 ) ≤ λ1 + λ2 + λ22 and f( λ1, λ2 ) < 1 if λ1, λ2 < 1.

Intuitively, we can think of the construction as trying evenly distribute entropy thoughout the resulting graph. We claim that a step in the zig-zag results in a more uniform distribution of the entropy given that the entropy was not already uniformly distributed. To see this we consider two cases based on the initial probability distribution.

If the distributions are not too close to uniform then the random first step used to produce an edge in the zig-zag contributes some amount of entropy. As the deterministic step across clouds and the following random step cannot decrease the contribution from step one, some amount of reallocation is done making the graph entropy more uniform.

If instead the entropy is close to uniform then the first step is unproductive. However the deterministic step is now effectively random due to the uniformity of the choice of vertex in the cloud and so a contribution is still made for the resulting step in the zig-zag.

The proof follows along the lines of considering any action in terms of its component breakdown into the above two cases, making use of the above propositions. While I believe I understand the intuition behind the idea, the algebraic manipulation seems a bit tedious for this post.