A Blogscursion into the World of Expanders

Monday, December 12, 2005

Recap: Expanders and Coding Theory

We looked at applications of Expanders to Coding Theory and saw that they could be used to obtain efficient encoding and decoding algorithms, with good error-detection and recovery properties.

We first considered codes which used the adjacency matrix of a bipartite expander as the parity check matrix. The simple FLIP algorithm allowed us to decode from a constant fraction of errors in linear time. In this algorithm, vertices of the left partite set are labelled with the bits of the received word, and a vertex of the right partite set is satisfed if an even number of its neighbors are set to '1'. As long as any vertex on the left has more unsatisfied than satisfied neighbors, we flip the bit corresponding to that vertex.

A similar idea could be tried for efficient encoding, but in order to obtain a linear time algorithm, the adjacency matrix must be sparse. Codes with low-density generating matrices have poor error correcting capabilities, but Spielman showed that such codes could be used for error reduction. Using Spielman's error-reducing codes, we can inductively construct error-correcting codes which allow linear-time encoding.

Another use of expanders is in the algorithm of Alon, Bruck, Naor, Naor and Roth; their construction allows us to obtain linear-time encodable and decodable codes with very large minimum distance (and hence good error correction); the price we pay is a significantly increased alphabet size. Finally, we considered list decoding, in which we output all codewords (not just the nearest codeword) that are sufficiently close to the received word. The ABNNR construction allowed us to construct a (1,2,2)-list recoverable code: that is, if in the received codeword, we have a list of two possible symbols for each position of the codeword, we output each of the (at most 2) codewords that are compatible with all the lists. We also saw that a similar construction could be used to obtain list-decodable codes from list-recoverable codes.

Saturday, December 10, 2005

Recap: The PCP Theorem by Gap Amplification

The PCP theorem (almost) states this surprising property that given a proof of answer of some YES/NO problem, there is some scheme that checks just some constant pieces of the proof and decides whether the proof is correct or not, and this decision is with high probability a correct decision.

It is almost easily known that NP-hardness of GAP-3SAT problem is equivalent to correctness of PCP theorem where GAP-3SAT is the following problem: given a 3-SAT formula and a real number c, whether the formula is satisfiable or under every truth assignment at least c fraction of clauses remain unsatisfied. It is obvious that for small values of c this problem is NP-hard (since it is just SAT for small c), so the difficult question is the case of larger c values. Arora and Safra established a proof of PCP theorem which was long and algebraic. This paper presents a more combinatorial proof using reduction of a variant of GAP-3SAT for small values of c to the same variant of GAP-3SAT for larger values (actually constant values) of c. The variant of 3SAT is this: given a graph where each vertex of graph is a variable over a finite alphabet and each edge is a relation over that alphabet, and given a real value c, whether there is satisfying assignment to the vertices of the graph or always c fraction of edges remain unsatisfied.

The tool to prove this, is almost a standard approach: first we convert the graph to an expander using replacement and graph superimposition (union of a graph with an expander). Then using powering (an operation similar to zigzag product) we increase the "expansion" in the graph and it turns out that this increases the fraction of unsatisfied edges. In the last step using a technical method we decrease the size of the alphabet of the graph (which the powering operation increases that by an exponential factor). All these operations can be done in polytime and as a result the farction of unsatisfied edges double. If we repeat these steps for a logarithmic number of times, the fraction of unsatisfied edges becomes as large as a constant, which is the desired property.

Thursday, December 01, 2005

Recap: Extractors

For randomization it is common practice to assume some uniformally random source. However, such an assumption is unrealistic as most sources found in nature are biased. We therefore wish to utilize these biased sources of randomness to produce a perfectly random source. For some cases, this can be accomplished deterministically, however for many sources this is not the case. An example of which are Santha-Vazirani sources (δ-SV sources defined in the post of 10/22/05).
By construction of a δ-SV-source for any deterministic extractor of even a single bit such that the extractor using the source is as biased as the source itself in Proposition 1 we showed this.

A (k, ε)-extractor is then a function which takes some purely random seed of length d and any k-source X on {0,1}ⁿ and outputs some m length bitstring which is ε-close to unifom.
The proof of the existence of such extractors comes from the application of the probabilistic method considering a random choice of deterministic extractors (zero length seeds).
Constructive proofs such as the Leftover Hash Lemma guarantee extractors with specific bounds (A table of some of these bounds is located in the post of 10/22/05).

Extractors and expanders are interrelated as any extractor Ext: {0,1}ⁿ × {0,1}^d → {0,1}^m can be viewed as a bipartite multigraph G = { [N], [M], E } where an edge exists between v ∈ [N] and u ∈ [M] if and only if for some random seed the extractor on u outputs v.
On expection of the resulting graph the extraction property is equivalent to the expander mixing lemma thus the graph is an expander.
Similarly an expander can be used to create an extractor by making the expander into a bipartite graph in the usual way.

In the end we examined extractors over block sources. A block source is a generalization of a δ-SV-source where single bits are replaced with constant sized blocks of bits. Such sources can be used by extractors via the repeated application of an extractor on the individual blocks of the source. In doing so, the initial seed is applied only to the first iteration of the extractor generating some random bitstring r_iz_i. On the next iteration i - 1 the r_i bits resulting from the previous extraction are used as the seed. The resulting output is then r₁z₁z₂...z_i (note that the first iteration is i and each subsequent iteration has index i - 1). The proof of these bits' randomness is found by induction.

Saturday, November 26, 2005

The PCP Theorem by Gap Amplification

I have temporarily put my post here: http://www.cs.uiuc.edu/homes/zamani/mypost.xml

Seems I have to change something(s) in blog template to incorporate this soundly.

Friday, October 14, 2005

Summary of first few lectures

We defined an expander family as a familty of undirected, regular graphs on n vertices (for increasing n) such that the degree is constant and

the edge expansion (h_G=min_|S|≤1/2 |E(S,S')|/|S|) is at least a constant, OR
the spectral gap (d-λ₁) is at least a constant.

(The equivalence follows from (d-λ₁)/2 ≤ h_G ≤ sqrt[2d(d-λ₁)], proven later.)

A slightly stronger definition of spectral expansion requires d-λ to be at least a constant, where λ=max(|λ₁|,|λ_n-1|). Then we have the following for an expander G (writing α for λ/d).

Optimal Expanders: If we allowed "fractional edges," a clique (with self-loops) with all edge weights d/n would be the best degree d expander (with λ=0). However, with "integral edges," λ ≥ 2 sqrt[d-1] - o(1). So called Ramanujan Graphs have λ = 2 sqrt[d-1]. So one can think of α being O(1/sqrt d).

Expander Mixing Lemma: For any cut (S,T) in G, number of edges across the cut is in the range (d/n)|S| |T| ± dα sqrt[|S| |T|]. (Note that the smaller the α the closer this is to the value for the "d/n-clique.")

An application of this lemma is in a simple "deterministic amplification" of BPP algorithms: the algorithm is run not just on one random-tape, but on all its d "neighbors" too -- neighbors in an arbitrary expander with the space of all random-tapes as the vertex set -- driving the error probability down to O(α²), starting from say 1/4.

Rapid Mixing: Starting from a distribution p on the n vertices, a t-step random walk on G results in a new distribution M^tp exponentially close to the uniform distribution u. More precisely, the L₂-norm of the difference M^tp - u is at most α^t (and hence the L₁-norm, which measures twice the statistical difference, is at most (sqrt n)α^t).

Sampling by Random Walks: Probability that a t-step random walk is confined to a set B of size βn is between (β-2α)^t and (β+α)^t. (Note that the smaller the α the closer this is to β^t corresponding to the random walk on the clique.)

The upperbound lets one amplify a BPP algorithm to reduce the error probability exponentially in t using only t log d extra random bits than the original BPP algorithm (if its error probability is a sufficiently small constant to begin with).

The two bounds together can be used to deterministically sample a graph such that the relative size of the max-clique is preserved. This lets one reduce the approximation factor of an algorithm for approximating the size of a largest clique, by first using a graph exponentiation which preserves the relative log-size of the max-clique and then sampling it down as above to a polynomial sized graph in which the original approximation algorthm is then run. (Such an improvement in approximation factor can be used to amplify hardness of approximation results.)

Thursday, October 13, 2005

Amplifying the hardness of one-way functions

We began by defining a negligible function. A function β(n) is said to be negligible if ∀ c > 0 β(n) < 1/(n^c) for sufficiently large n.

We then defined a one-way permutation: a polynomial-time computable permutation f is α(n)-one way (denoted by α-OWP) if for any probabilistic polynomial-time algorithm A:
Pr[A(f(x)) = x] ≤ 1 - α(n) + β(n)
where the probability is over all strings x of length n and the coin-flips used by the algorithm, and β(n) is some negligible function. Intuitively, f is α-OWP if the fraction of 'hard' inputs is close to α.

Finally, for sets X and Y where X ⊆ Y, we say that X has density μ if |X| = μ|Y|

It is not obvious that one-way permutations exist, but our first theorem (due to Yao) shows that we can construct strong one-way permutations from weak ones. Given f, a (1/p(n))-OWP on n-bit strings for some polynomial p(n), we define:
F(x₁, x₂, ..., x_t) = f(x₁)f(x₂)...f(x_t)
where |x_i| = n, and t = n⋅p(n). That is, F is a function on n²⋅p(n)-bit strings formed by applying f to each part of size n, and concatenating the results. To invert F, we must invert f on each part; the large number of parts means that we expect several of them to be hard for f. It can be shown that in fact, F is hard to invert on all but a negligible fraction of the range.

The chief problem with this method is that the input length blows up by a large polynomial factor. If f used n bits, F uses n²p(n) bits. Therefore, the hardness of inverting F on large inputs (say of length m) relates to the hardness of inverting f on much smaller inputs (those of length m^ε, where ε is a constant ≤ 1/2).

A better idea to amplify hardness is to run f on many inputs, but instead of selecting the inputs independently at random, construct an expander graph where each node corresponds to an n-bit input string. Now, select the first node randomly, and then perform a random walk in the graph. Run f on every node v in the walk; the nodes are 'independent enough' that with good probability, we will find some node v such that f(v) is hard to invert.

Lemma 1: Let α(n) be at least 1/p(n) for some polynomial p. The following are equivalent:
(a) f is α-OWP
(b) For any polynomial-time randomized algorithm A, ∃ H ⊆ Σⁿ of density ≥ α such that
Pr[A(f(x)) = x | x ∈ H] is negligible.

Intuitively, H is the 'hard' set for algorithm A; that is, the lemma says that if f is an α-OWP, there is some set H (of density at least α) of inputs that is hard to invert, and the converse. It follows almost immediately from the definition of an α-OWP that (b) implies (a). The other direction is also not difficult to prove; if there is an algorithm A with a hard set of density < α, repeated applications of this algorithm allow us to invert f with probability > 1 - α, contradicting the fact that f is α-OWP.

Lemma 2: Let G be an (N, d, λ) expander. (G has N vertices, is d-regular, and λ is the normalized second-largest eigenvalue.) For any subset S of the vertices, with density μ, the probability that a random walk of length k is contained entirely within S is ≤ μ^k/2.

The proof of this lemma uses a technique we saw earlier. Let P be the projection vector with 1s in positions corresponding to elements of S, and A the normalized adjacency matrix of G. If u is the uniform unit vector, the desired probability is ||(PA)^k u|| ₁. (To see this, observe that we begin with a vertex picked uniformly at random; the probability that we are in S after a single step is given by the L₁-norm of PA(u) because Au gives us the probability distribution over vertices after a single step, and we sum over all the probabilities in S.)

The algebra required to bound ||(PA)^k u|| is a little too tedious for a blog post (with applications of the law of cosines, the Cauchy-Schwartz inequality, etc.), but it is fairly straightforward.

Theorem 1: (Main) If f is (1-α)-OWP on n-bit input strings (where α ≥ 1/2), we can construct F_k, which is (1 - α^k/2)-OWP on input strings of length n + k log d, where k is a polynomial and d a constant.

Proof: Let {G_n} be an expander family on 2ⁿ vertices, with degree d, and λ ≤ 1/2. We label edges such that R_{G_n}(v, i) = (u,i) for all v and i. That is, we label edges such that if the edge between v and u is the i^th edge of v, it is also the i^th edge of u; an alternate way to think about this is that the edge labels give us d disjoint perfect matchings in the graph. We now define F_k(x, σ₁, σ₂, ..., σ_k) (where |x| = n, and each σ_i ∈ {1...d}) as the output of the function:
for i ← 1 to k
x ← f(x)
x ← R_{G_n}(x, σ_i)
return (σ₁, σ₂, ..., σ_k, x)

That is, we apply f on x, take a step in the graph, and repeat this process k times. Another interpretation is that we take a k-step walk based on the σ_is in the graph G'_n, which has the same vertex set as G_n, but has an edge between u and v ⇔ G_n has an edge between f(u) and v. Since f is a permutation, G'_n is also an expander with the same properties as G_n.

To see that the function F_k is a permutation, it is sufficient to see that it is injective. Suppose two different inputs produced the same output; this implies that the walks ended at the same vertex and that the sequence of &sigma_is is identical. But two such walks that begin at distinct vertices can never converge, because f is one-to-one, and the &sigma_i^th edges from two different vertices cannot lead to the same vertex, because of the way edges are labelled.

We show that F_k is a (1-α^k/2)-OWP, by using any algorithm A that inverts F_k to construct an algorithm A' that inverts f; if the hard set for A has size < 1-α^k/2, the hard set for A' will have size < α. Given A, the algorithm A' works as follows to invert an input y:
Pick i from {1, 2, ..., k}
∀ 1 ≤ j ≤ k Pick σ_j uniformly at random from {1, 2, ..., d}
Set y' ← F_k-i-1(R_{G_n}(y, σ_i), σ_i+1, ..., σ_k
Use A to find (x', σ₁, σ₂,...,σ_k) ← F_k^-1(y', σ₁, σ₂,...,σ_k)
Return x ← F_i-1(x', σ₁, σ₂,...,σ_i-1)

Intuitively, to find the pre-image of y, A' picks i to be some position in a k-step walk. The algorithm follows the walk to the end, inverts it, and then follows the first part of the walk, stopping just before it reaches y. This vertex will be the predecessor of y in the walk. If A successfully inverts F_k, A' will invert f.

Since f is a (1-α)-OWP, A' must have a hard set of density ≥ 1-α. Let H be the set of vertices corresponding to the inputs in the hard set, and H* be the set of length k-walks in the graph that intersect H. From lemma 1, we know that H* has density at least (1-α^k/2). If we can show that H* is a hard set for A, it follows that F_k is a (1-α^k/2)-OWP.

If H* is not a hard set for A, then the algorithm succeeds with non-negligible probability (≥ 1/p(n)) on H*. For every invocation of A by A', with probability ≥ 1/k, the index i points to some vertex x ∈ H. Considering all paths in H*, the probability of A succeeding is ≥ 1/p(n). Therefore, the probability of A' succeeding on H is ≥ 1/(k ⋅p(n)), which is non-negligible; this implies that H is not hard for A'. But this is a contradiction, and so H* must be a hard set for A. This completes our proof of the main theorem.

Theorem 2: If f is a 1/p(n)-OWP for some polynomial p(n), then for any polynomial q(n), ∃ a (1- 1/q(n))-OWP with only an O(log n) (additive) increase in input length.

Proof: We use the construction of theorem 1 in two stages: First, starting with a (1 - (1-1/p(n)))-OWP, set k = O(1) and construct a (1 - (1-1/p(n))^k/2)-OWP by applying theorem 1; this only increases the input length by Θ(1) bits. Repeating this process O(log p(n)) = O(n) times, we can achieve a (1- 1/2)-OWP, with a total of O(log n) increase in input size. Now, set k = log q(n) = O(log n) and apply the construction to the (1- 1/2)-OWP; this gives us the desired (1-1/q(n))-OWP. The total increase in the iterations of stage 1 was O(log n), as it was in the single iteration of stage 2.

Saturday, October 01, 2005

Introduction to Zig-Zag

Starting September 21st, Erin introduced us to the Zig-Zag product as a means of expander construction.

In leading up to the construction, we first defined a rotation map of a graph G = (V,E) to be a function from pairs (v, i) to (w, j) where v,w ∈ V and the i-th incident edge to v leads to w, and it leads through the j-th incident edge to w. Further, we will use the notation that G is a (N, D, λ)-graph if G has N-vertices, is D-regular, and has second largest eigenvalue in absolute value, λ = max _{α ⊥ 1_n} | ⟨ α, M α ⟩ | / ⟨ α, α ⟩.

Next we explored the results of squaring and the tensor product of a graph. Namely, we considered G^t for G an (N,D, λ)-graph. Proposition 1 stated that G^t is an (N, D^t, λ^t)-graph. To see this, consider the normalized adjacenty matrix for M^t for G^t. Then M^tu = λ^tu from repeated iterations of Mu = λu.

We then defined the tensor product G of G₁ and G₂ where G₁ is N₁-vertex D₁-regular and G₂ is N₂-vertex D₂-regular to be a N₁ · N₂-vertex D₁ · D₂-regular graph such that the rotation map of the tensor product G is

R_G( (v,w), (i,j) ) = ( (v',w'), (i',j') ) where v ∈ G₁, w ∈ G₂, i ≤ D₁, j ≤ D₂ and R_G₁(v,i) = (v',i'), R_G₂(w,j) = (w',j')

This is a replacement of each vertex in G₁ with a copy G₂ and connecting all vertices based on the vertices they represent in G₁.

As the graph can be represented by the adjacency matrix, we consider the tensors effect on the two adjacency matricies A and B. If we let Au = λ₁u and Bv = λ₂v and let C be the tensor of A and B and let w be the tensor of u and v, then the product of Cw = λ₁λ₂w. We see this by considering the tensor of Au and Bv, extracting the appropriate position (i,j) in the resulting matrix.

Now, we consider the effect of the tensor. We considered Proposition 1 which states,

Let G₁ be a (N₁, D₁ λ₁) and G₂ be a (N₂, D₂, λ₂) graph. Then the tensor of the two is (N₁ · N₂, D₁ · D₂, max{ λ₁, λ₂ }).

To see this, consider the eigenvalues of the normalized adjacency matrix of the tensor. These eigenvalues depend on the normalized adjancency matricies of the original graphs G₁ and G₂. As the result of the tensoring of these normalized adjacency matricies results in a largest eigenvalue of 1, the second largest value is either λ₁ or λ₂.

From here we defined the Zig-Zag product. Consider G₁ a D₁-regular graph on N-vertices and G₂ a D₂-regular graph on D₁-vertices. Replacing each vertex in G₁ with a copy of G₂, we create an edge in this resulting graph on [N] × [D₁] by adding an edge for every path of length 3 of the following form:

(1) Take a random step inside a copy of G₂
(2) Take an edge inside G₁ which represents the jump from the current embedded G₂ to another.
(3) Take a random step inside the copy of G₂

Thus, an edge between vertex i and j exists if there is some verticies i' and j' such that there is an edge between i and i' in am embedded copy of G₂, an edge in the tensor product between i' and j', and an edge in an embedded copy of G₂ between j' and j.

This can be expressed formally via the rotation map of G, G₁, and G₂ as follows:

Let v ∈ [N], k ∈ [D₁], and i,j ∈ [D₂].
Let (k', i') = R_G₂(k,i), the first step within a cloud.
Let (w, l') = R_G₁(v, k), the step in G₁.
Let (l, j') = R_G₂(l' j), the second step within a cloud.

Then R_G( (v,k), (i,j)) = ((w,l), (j',i')).

The point of exploring the zig-zag product is that if the two graphs G₁ and G₂ are good expanders we expect that their product G will also be a good expander.

To this end, we have Theorem 4 which follows. Let G₁ be a (N, D₁, λ₁) graph and G₂ be a (D₁, D₂, λ₂) graph. Then the zig-zag G of G₁ and G₂ is (N · D₁, D₂², f( λ₁, λ₂ )) where f( λ₁, λ₂ ) ≤ λ₁ + λ₂ + λ₂² and f( λ₁, λ₂ ) < 1 if λ₁, λ₂ < 1.

Intuitively, we can think of the construction as trying evenly distribute entropy thoughout the resulting graph. We claim that a step in the zig-zag results in a more uniform distribution of the entropy given that the entropy was not already uniformly distributed. To see this we consider two cases based on the initial probability distribution.

If the distributions are not too close to uniform then the random first step used to produce an edge in the zig-zag contributes some amount of entropy. As the deterministic step across clouds and the following random step cannot decrease the contribution from step one, some amount of reallocation is done making the graph entropy more uniform.

If instead the entropy is close to uniform then the first step is unproductive. However the deterministic step is now effectively random due to the uniformity of the choice of vertex in the cloud and so a contribution is still made for the resulting step in the zig-zag.

The proof follows along the lines of considering any action in terms of its component breakdown into the above two cases, making use of the above propositions. While I believe I understand the intuition behind the idea, the algebraic manipulation seems a bit tedious for this post.