.comment-link {margin-left:.6em;}

A Blogscursion into the World of Expanders

Friday, December 16, 2005

Recap: A Randomness-Efficient Sampler for Matrix-valued Functions and Applications

Recap: A Randomness-Efficient Sampler for Matrix-valued Functions and Applications

We are given an expander graph G.

We are interested in functions f:V->[-1, 1] and aim to calculate the
expectation E[f]. For this analysis we are given that E[f] = 0, but we can also consider estimating E[g] for functions g where
E[g] != 0. To do so let f = g - E[g], so that E[f] = 0.


We consider taking a sample W=(Y1, ..., Yk) of vertices of V. We consider three bounds:

Chernoff
e ( -γ^2 k/2.4)

Walk on G (strong bound)
e -γ^2 ε k/60

Walk on G (weak bound)
((2/ε log 8/(γ ε)) e -(γ^2 k ε^3)/(12(2 log 8/(γ ε)))


In the first case, W is a random, independent sample, resulting in a trivial Chernoff bound.


In the second and third cases, W is a random walk on G.


Chernoff bound and the weak bound for the random walk.
The proof for the Chernoff bound uses Markov's inequality and other simple algebraic
manipulations to arrive at a point where the independence of the Y i
can be used.


When W is a walk, we cannot use the independence of the Y i , since the
Y i are not independent. Instead we apply linear algebra to show
that a summation is bounded by a dot product, and then bound the dot product by
an algebraic expression. We minimize over an arbitrary constant t to get the desired bound.
We then boost this result by dividing the walk W into samples and applying the union bound.



Lastly, we consider another paper, A Simple Chernoff Bound for Random Walks on a Weighted Graph, by David Gillman. In this paper a walk is taken on a
weighted graph, and the objective is to estimate the weight of the vertices in A, where
A is a subset of V, by sampling
(the weight of a vertex is equal to the sum of the weights of its edges).


In the setting of the previous paper, it is as though f is being taken as a 0-1 function, 0 for the vertices not in A and 1 for the vertices in A.


The main result of the paper quantifies the rate convergence of estimates of the weight of A to the actual weight of A as the length of the sample walk increases.

Then we explore three approximation algorithms, the Standard Procedure, Aldous's Procedure, and
Aldous's Procedure with Statistical Technique. The Standard Procedure takes multiple walks, sampling the last vertex of each walk.
Aldous's Procedure takes a walk and samples every verex after a certain count.
Aldous's Procedure with statistical techniques runs Aldous's Procedure multiple times to get
more accurate results.

The paper shows that Aldous's Procedure with Statistical Technique always beats the Standard Procedure, as does Aldous's Procedure given certain conditions.

Monday, December 12, 2005

Recap: Expanders and Coding Theory

We looked at applications of Expanders to Coding Theory and saw that they could be used to obtain efficient encoding and decoding algorithms, with good error-detection and recovery properties.

We first considered codes which used the adjacency matrix of a bipartite expander as the parity check matrix. The simple FLIP algorithm allowed us to decode from a constant fraction of errors in linear time. In this algorithm, vertices of the left partite set are labelled with the bits of the received word, and a vertex of the right partite set is satisfed if an even number of its neighbors are set to '1'. As long as any vertex on the left has more unsatisfied than satisfied neighbors, we flip the bit corresponding to that vertex.

A similar idea could be tried for efficient encoding, but in order to obtain a linear time algorithm, the adjacency matrix must be sparse. Codes with low-density generating matrices have poor error correcting capabilities, but Spielman showed that such codes could be used for error reduction. Using Spielman's error-reducing codes, we can inductively construct error-correcting codes which allow linear-time encoding.

Another use of expanders is in the algorithm of Alon, Bruck, Naor, Naor and Roth; their construction allows us to obtain linear-time encodable and decodable codes with very large minimum distance (and hence good error correction); the price we pay is a significantly increased alphabet size. Finally, we considered list decoding, in which we output all codewords (not just the nearest codeword) that are sufficiently close to the received word. The ABNNR construction allowed us to construct a (1,2,2)-list recoverable code: that is, if in the received codeword, we have a list of two possible symbols for each position of the codeword, we output each of the (at most 2) codewords that are compatible with all the lists. We also saw that a similar construction could be used to obtain list-decodable codes from list-recoverable codes.

Saturday, December 10, 2005

Recap: The PCP Theorem by Gap Amplification

The PCP theorem (almost) states this surprising property that given a proof of answer of some YES/NO problem, there is some scheme that checks just some constant pieces of the proof and decides whether the proof is correct or not, and this decision is with high probability a correct decision.

It is almost easily known that NP-hardness of GAP-3SAT problem is equivalent to correctness of PCP theorem where GAP-3SAT is the following problem: given a 3-SAT formula and a real number c, whether the formula is satisfiable or under every truth assignment at least c fraction of clauses remain unsatisfied. It is obvious that for small values of c this problem is NP-hard (since it is just SAT for small c), so the difficult question is the case of larger c values. Arora and Safra established a proof of PCP theorem which was long and algebraic. This paper presents a more combinatorial proof using reduction of a variant of GAP-3SAT for small values of c to the same variant of GAP-3SAT for larger values (actually constant values) of c. The variant of 3SAT is this: given a graph where each vertex of graph is a variable over a finite alphabet and each edge is a relation over that alphabet, and given a real value c, whether there is satisfying assignment to the vertices of the graph or always c fraction of edges remain unsatisfied.

The tool to prove this, is almost a standard approach: first we convert the graph to an expander using replacement and graph superimposition (union of a graph with an expander). Then using powering (an operation similar to zigzag product) we increase the "expansion" in the graph and it turns out that this increases the fraction of unsatisfied edges. In the last step using a technical method we decrease the size of the alphabet of the graph (which the powering operation increases that by an exponential factor). All these operations can be done in polytime and as a result the farction of unsatisfied edges double. If we repeat these steps for a logarithmic number of times, the fraction of unsatisfied edges becomes as large as a constant, which is the desired property.

Thursday, December 08, 2005

David's Lectures

Sorry, I forgot to post the url of the lecture notes for David's lectures. Meanwhile, I made a few changes. I could not write it up in html so prepared a pdf file:

http://www.geocities.com/hemanta_maji/lecture.pdf

Recap: Are Bitvectors Optimal?

In the paper "Are Bitvectors Optimal?" we aim to answer the static membership problem -- that is, given a Universe U = {1, 2, ..., m} and a set S of keys from U, we want to know whether an element u known to be in the universe is in S. This, of course, could be answered with a simple idea of lying out all elements and querying a large number of elements to find if u is among them; however, we aim to minimize storage space and use as few probes as possible. We look at two types of randomized schemes to implement this, each using only one bitprobe. One has one-sided error (where an answer of "Yes, u is in S" might be incorrect) and another has two-sided error (where either a "Yes" or a "No" answer might be incorrect).

We start by making a bipartite graph G=(U,V,E) where U is the Universe and V are locations in memory. For the two-sided error problem we seek a two-coloring of V such that ∀ u ∈ S, at least (1-ε)|Γ(u)| locations are colored 1, and ∀ u' ∉ S, at least (1-ε)|Γ(u)| locations are colored 0 (where 0 maps to an answer of "no" and 1 to "yes"). This is also referred to as being "(n,ε)-two-colorable." The general idea about how we go about this is to separate out those vertices which can be colored 0 and 1 and have no potential for error, and then show that the error on the remaining vertices is bounded by ε. In particular, we use specific properties of the graph as an expander, showing that V has the "(n,ε) Intersection Property," which in essense says that the number of sets of locations which would be "offending" a simple greedy 0/1 coloring is less than the original number of sets. Thus, we take this smaller set of offenders and inductively find a good (n,ε)-two-coloring for it, then adding back in the original non-offender sets which were already properly colored.

Thursday, December 01, 2005

Recap: Extractors

For randomization it is common practice to assume some uniformally random source. However, such an assumption is unrealistic as most sources found in nature are biased. We therefore wish to utilize these biased sources of randomness to produce a perfectly random source. For some cases, this can be accomplished deterministically, however for many sources this is not the case. An example of which are Santha-Vazirani sources (δ-SV sources defined in the post of 10/22/05).
By construction of a δ-SV-source for any deterministic extractor of even a single bit such that the extractor using the source is as biased as the source itself in Proposition 1 we showed this.

A (k, ε)-extractor is then a function which takes some purely random seed of length d and any k-source X on {0,1}n and outputs some m length bitstring which is ε-close to unifom.
The proof of the existence of such extractors comes from the application of the probabilistic method considering a random choice of deterministic extractors (zero length seeds).
Constructive proofs such as the Leftover Hash Lemma guarantee extractors with specific bounds (A table of some of these bounds is located in the post of 10/22/05).

Extractors and expanders are interrelated as any extractor Ext: {0,1}n × {0,1}d → {0,1}m can be viewed as a bipartite multigraph G = { [N], [M], E } where an edge exists between v ∈ [N] and u ∈ [M] if and only if for some random seed the extractor on u outputs v.
On expection of the resulting graph the extraction property is equivalent to the expander mixing lemma thus the graph is an expander.
Similarly an expander can be used to create an extractor by making the expander into a bipartite graph in the usual way.

In the end we examined extractors over block sources. A block source is a generalization of a δ-SV-source where single bits are replaced with constant sized blocks of bits. Such sources can be used by extractors via the repeated application of an extractor on the individual blocks of the source. In doing so, the initial seed is applied only to the first iteration of the extractor generating some random bitstring rizi. On the next iteration i - 1 the ri bits resulting from the previous extraction are used as the seed. The resulting output is then r1z1z2...zi (note that the first iteration is i and each subsequent iteration has index i - 1). The proof of these bits' randomness is found by induction.

Saturday, November 26, 2005

The PCP Theorem by Gap Amplification

I have temporarily put my post here: http://www.cs.uiuc.edu/homes/zamani/mypost.xml

Seems I have to change something(s) in blog template to incorporate this soundly.

Monday, November 21, 2005

Are Bitvectors Optimal?

Static Membership problem

Given
Universe U = [m]
Set S containing keys of U

Query
For some u, is u in S?

Main goal
Use few probes
Small storage space

Errors in Responding to Query

One-sided
if u is in S, correct ("yes")
if u is not in S, sometimes correct ("no"), sometimes incorrect ("yes")

Two-sided
if us is in S, sometimes correct ("yes"), sometimes incorrect ("no")
if us is not in S, sometimes correct ("no"), sometimes incorrect ("yes")

Space requirements


Error Probes Space (upper) Space (lower)
2-sided 1 O(n/ε2 log m) Ω
(n/(ε log(1/ε)) log m)

1-sided 1 O((n/ε)2 log m)
Ω(n2/(ε2 log n/ε)log m)



One-sided Error (minor result)
S is a subset of U.

Let G = (U,V) be a bipartite graph where U is the universe and
V is locations in memory.

For each u in S, color each of u's neighbors 1.

Note that if u is NOT in S, there may nevertheless
exist an edge from u to one of its neighbors.

Thus the one-sided error.

Query

Suppose we are given the query "is u in S?" for some u in U.

Then we probe a random neighbor of u. If this neighbor is 1, then
we return true, else false.

Result

We aim to exhibit a construction of G such that for all u' not in S,
fewer than ε |Γ(u')| neighboring locations are 1.

This guarantees that when probing u', the probability of an incorrect
response is less than ε |Γ(u')|.

We assume that |S| ≤ n.

Our construction is an (m,l,a)-design .

A family F is an (m,l,a)-design if F has m sets each of size l,
and any two sets in F have at most a elements in common.

A Theorem from an earlier work shows that an (m,l,a)-design can be guaranteed
when m, s, a and l are related in a certain way.

We use an (m,l,a)-design with a = log m, l = (n log m)/ε.

Correctness

If u is in S, then all of Γ are 1, thus we correctly return "yes".
If u is NOT in S, then we have that |U ∩ Γ(u)| ≤ na. Thus
we have that the probability of erroneously getting a 1 is bounded by
na/|Γ(u)| = na/l ≤ ε.

Two-sided Error (main result)

Consider bipartite graph (U, V, E) where U=[m] and V=[s].

We store the set S, where |S| ≤ n.

We use the same query scheme as before: for the query "is u in S?" we return
true if a random vertex of Γ(u) is colored 1, false otherwise.

We want the error probability to be at most ε. We seek a two-coloring
&X:V →{0,1} such that:

∀ u ∈ S, at least (1-ε)|Γ(u)| locations are colored 1
∀ u' ∉ S, at least (1-ε)|Γ(u)| locations are colored 0

Definition: Let C1, C0 be subsets of 2V. We say that
(C1,C0) is ε-2-colorable if there exists
X:V →{0,1} such that:


For all T in C0   |X-1(1) ∩ T| ≤ ε |T|
For all T in C1   |X-1(0) ∩ T| ≤ ε |T|


Further, we say that C is (n,ε)-2-colorable if for all subsets

(C1,C0) of C
such that |C1| ≤ n and C0=C-C1, we have that

(C1,C0) is ε-2-colorable.

Our Goal

We wish to find a bipartite graph G=(U,V,E) such that U=[m], V=[s]
and {Γ(u)}u in U is (n,ε)-2-colorable. This will give
ensure that our query procedure is never wrong more than ε fraction of the
time.

We show that such a graph exists, with |V|=O(n/ε2 log m):


G = (U, V, E) is an (m, s, n, d, ε)-expander
\/

{Γ(u)}u in U has the (n, ε)-intersection property
\/

{Γ(u)}u in U is (n, ε)-2-colorable
\/

(n, m, s, 1)-randomized scheme with error ε


G = (U, V, E) is an (m, s, n, d ε)-expander


G=(U,V,E) is an ε-expander => {Γ(u)}u in U has (n,ε)
intersection property




This is proved via induction on |C1| and |C0|.

C has (n,ε)-intersection property => C is (n,ε)-two-colorable

Definition: G=(U,V,E) is an (ms,n,d,ε)-expander if U=[m], V=[s],
each vertex in U has degree d and for all subsets S of size less than or equal
to 2n we have that |Γ(s)| ≥ (1-ε/2) |S|d
.

We prove the contrapositive. We assume that we do not have the (n,ε)-intersection
property. Then we show that:

|Γ(S)| < (1-ε/2)|S|d, a violation of the definition of an expander.