.comment-link {margin-left:.6em;}

A Blogscursion into the World of Expanders

Saturday, October 22, 2005

Extractors (Phillip Mienk, lecturer)

Motivation: For most of what we've been talking about in the past, we've been assuming we've had perfectly random sources, but these creatures simply can't always be found in "nature." The question then arises -- can we take a source which is biased and bring it as close as possible to purely random?

We take to start a result from vonNeumann: For any X1, X2,...,Xn∈{0,1} with Pr[Xi = 1] = δi, where 0<δ≤δi≤1-δ, then for l such bits, we can bound the parity to be exponentially close to 1/2. That is, |Pr[ XORi=1...l Xi = 1] - 1/2| = 2 -Ω(l).

δ-SV-sources (Santha-Vazirani): ∀i ∀x1, ..., xi-1 such that δ≤Pr[Xi=1 | X1=x1, ..., Xi-1=xi-1] ≤ 1-δ

Proposition 1: ∀δ for all deterministic extractors f(x)=b (functions which output one bit), ∃ an SV-source X such that Pr[b = 1] is as biased as the SV-source.
Proof: Consider Xi, ..., Xn ∈ D = {0,1}n. Let f(x)=b. Note that |D|=2n. Divide the domain, D, into two pieces, D0 where the extractor outputs 0, and D1 where the extractor outputs 1. D0={x: f(x)=0}, D1={x: f(x)=1}. We want to show there is an SV-source such that f is δ-biased; if the uniform distribution doesn't work, then the difference between the sizes of our two domains must be | |D0| - |D1| | ≤ 2nδ. WLOG, we say |D0| ≥ |D1|. Then there are |D0|≤ (2n/2) + (2nδ/2) = (1 + δ)/2 ·2n. Thus, |D1|≥ (1 - δ)/2 ·2n. Let X be an SV-source such that (1 + δ)/2 on D0 and (1 - δ)/2 on D1. If x ∈ D0, Pr[x] = (1 + δ)/2 · 1/|D0|. If x ∈ D1, Pr[x] = (1 - δ)/2 · 1/|D1|. Now let's say we have strings a=y0 and b=y1, where |y|=n-1 (we would like to show that |Pr[a|y]-Pr[b|y]|<δ). It is enough to bound the ratio of Pr[a|y]/Pr[b|y] = Pr[a]/Pr[b]. If a,b∈D0 or a,b∈D1, this ratio is obviously exactly 1; thus we assume WLOG that a∈D0 and b∈D1. Then, Pr[a]/Pr[b] = ((1 + δ)/2 · 1/|D0|)/((1 - δ)/2 · 1/|D1|) = (1 + δ)/(1 - δ) · |D1|/|D0|(1 + δ)/(1 - δ). With some more elementary algebra (which would just look overly intimidating with the terrible typsetting on this blog), we can also get a lower bound of 1. Thus, for all f we have constructed a δ-SV-source, namely (1+δ)/2 on D0, (1-δ)/2 on D1.

More motivation: Why are we doing this? We want to run randomized algorithms given an extractor and input. If it doesn't yield significantly random output, then we can allow a purely random seed into the extractor, and we hope that this will make the output more uniformly random. From an algorithm A(w,r) we can make a new algorithm A'(w,X) with the same input and k-source X and generates all possible bits it can from a given seed, d. For small d, we can iterate over all seeds until we find the one that gives optimal output, and thus we'd like to prove there exist extractors with small seeds.

Definition: Ext: {0,1}n × {0,1}d → {0,1}m is a (k,ε)-extractor if for any k-source X on {0,1}n then Ext(X,Ud) is ε-close to Um.

Definition: The statistical difference, Δ, is half the L1-difference. That is, Δ(x,y) = maxT⊆U |Pr[x∈T]-Pr[y∈T]|

Definition: Supp(X)={x : x∈X, Pr[x]>0}

Proposition 2: For any n and for any k≤n, and a flat (uniform over {0,1}k) k-source X, ∃ Ext: {0,1}n → {0,1}m with m=k-2log(1/ε)-O(1) such that Ext(X) is ε-close to Um.
Proof: Consider a random deterministic Extractor. We want ∀T ⊆ [2m] |Pr[Ext(X)∈T] - Pr[Um∈T]|≤ε. This would imply that |{x∈supp(X) : Ext(x)∈T}|/2k differs from density μ(T) by less than ε. (The rest of the proof is left to the reader, as class ended and notes were left incomplete.)

Deterministic Extractors: Fix Y a flat l-source on {0,1}t. As there are L=2l desired elements from T=2t possible, we have (T choose L) candidates for Y. By Stirling's approximation, this is ≈(Te/L)L. Consider a random extractor Ext: {0,1}t → {0,1}m. Pr[Ext not an extractor on Y] ≤ 2Ω(Lε2). This implies that Pr[Ext not an extractor for some Y] ≤ (T choose L) · 2-Ω(Lε2) ≤ (Te/L)L · 2-Ω(Lε2) > 1. Thus the deterministic extractor fails.

Randomized Extractors: Consider Y=(X,Ud), where X is a flat k-source with t=n+d. (Notation: T=2t, N=2n, D=2d, K=2k). Then T=ND, L=KD, and there are (N choose K) candidates for Y. Pr[Ext fails for some Y] ≤ (N choose K) · 2-Ω(KDε2) ≤ (Ne/K)K · 2-Ω(KDε2), which is less than 1 if d=log(n-k) + 2log(1/ε)+O(1). Since this probability is less than one, there is a high probability we have an extractor for all Y, and thus we can have randomized algorithms.

Recall that we're looking to find a small seed, d, while giving good output, m. What do we know?






dm
Necessary for BPP simulationO(log n)kΩ(1)
Pairwise Independant HashingO(n)k+d-O(1)
Spectral ExpandersO(n-k)n
Lu, Reingold, Vadhan, WidgersonO(log n)(1-γ)k, ∀γ>0
Zig-Zag ProductO(log2n)k+d-O(1)

Definition: A family of functions H={hi : [N] → [M]} is pairwise independent if
     (1) ∀x∈[N], h(x) is uniformally distributed in [M]
     (2) ∀x,y∈[N], x≠y, h(x) and h(y) are independent.

Leftover Hash Lemma: If H={h : {0,1}n → [0,1}l} is a pairwise independent family where l=k-2log(1/ε)-O(1), then Ext(x,h) = (h, h(x)) is a (k,ε)-extractor. (Not enough time for the Proof in class)

Extractors and Expanders: We can interpret Ext: {0,1}n × {0,1}d → {0,1}m as a bipartite multigraph G=([N],[M],E) where (u,v)∈E iff ∃r∈Ud such that Ext(u,r)=v. Consider the extraction property that Ext is a (k,ε)-extractor iff ∀S⊆[N], |S|=K, Δ(Ext(Us,Ud),Um) = maxT⊆[M] |Pr[Ext(Us,Ud)∈T] - Pr[Um∈T]| ≤ ε Therefore, for any T⊆[M], this implies that |(e(S,T)/|S|·D) - (|T|/M)|≤ε. This looks familiar; recall the...

Expander Mixing Lemma: For S,T⊆G and λ the second largest eigenvalue of the adjacency matrix A, |(E(S,T)/|S|·|T|) - (D/N)| ≤ λ√(|S|·|T|). Note that we can make G bipartite by making a copy of G, selecting S in G1, T in G2, and if uv∈G, ∃uv∈G1G2 such that u∈G1 and v∈G2. Through some manipulation of the two similar equations, one can see that if λ is "small enough," an expander yields an extractor (and vice versa).

Definition: X is a (k,l) block source if X=B1...Bt where ∀i∈[t], |Bi|=l and ∀b1, ..., bi-1∈{0,1}l, Bi|B1=b1, ..., Bi-1=bi-1 is a k-source. Alternatively, X is a (k,l) block source if it can be partitioned into l blocks of equal size such taht conditional distribution of each block given previous history is a k-source.

Theorem: Given explicit construction for (k,ε)-extractor Ext: {0,1}l × {0,1}d → {0,1}d+m, then ∀t≥1 ∃ Ext': {0,1}tl × {0,1}d → {0,1}d+tm such that Ext'(X,Ud is tε-close to Ud+tm for all (k,l) block sources X with t blocks. Formally, consider Ext'. For x∈{0,1}tl, x=b1...bt for |bi|=l. For each bi, Ext yields rizi with |ri|=d. Then Ext(bi,ri+1), where ri+1 is the initial seed, d. Thus, &delta(Ext'(X,Ud), Ud+tm)≤tε
Proof: X=B1...Bt. Let R1, ..., Rt+1 and Z1, ..., Zt be random variables representing the output of Ext'(X,Ud) and internal output. Let M1, ..., Mt be samples of Um. Define ∀i∈[t+1]: Di = (B1, ..., Bi-1, Ri, Zi, ..., Zt) and Ei = (B1, ..., Bi-1, Ud, Mi, ..., Mt). We want to show that Δ(Di,Ei)≤(t+1-i)ε, and we shall proceed by induction. Base case: i=t+1. Since Rt+1 is a seed, it is uniform in Ud. Inductive hypothesis: Δ(Di+1,Ei+1) ≤ ((t+1) - (i+1))ε ≤ (t-i)ε. Inductive step: &Delta (Di+1,Ei+1) = Δ( (B1, ..., Bi, Ri+1, Zi+1, ..., Zt), (B1, ..., Bi, Ud, Mi+1, ..., Mt) ) = Δ( (B1, ..., Bi, Ext(Bi, Ri+1), Zi+1, ..., Zt), (B1, ..., Bi, Ext(Bi, Ud), Mi+1, ..., Mt) ). But Ext(Bi, Ri+1) = RiZi. Thus, the same quantity = Δ(Di, (B1, ..., Bi, Ext(Bi, Ud), Mi+1, ..., Mt)). We know that Bi is a k-source conditioned on B1, ..., Bi-1, that Ext is a (k,ε)-extractor, and that Mi, ..., Mt are independent. Thus, our quantity = Δ((B1, ..., Bi, Ext(Bi, Ud), Mi+1, ..., Mt), (Bi, ..., Bi-1, Ud+m, Mi+1, ..., Mt))≤ε Consider the triangle inequality: &Delta(Di, Ei)≤ Δ(Di, (B1, ..., Bi, Ext(Bi, Ud), Mi+1, ..., Mt)) + Δ((B1, ..., Bi, Ext(Bi, Ud), Mi+1, ..., Mt), Ei) ≤ (t-i)ε + ε ≤ (t+1-i)ε

Since this has worked for any block source, we can then generate similarly for an SV-source.

0 Comments:

Post a Comment

Links to this post:

Create a Link

<< Home