Published 2017-10-10

Contents

  1. 1.Johnson-Lindenstrauss lemma
  2. 2.Random Projection
  3. 3.Application
  4. 4.Reference

Nowadays, dimensionality is a serious problem of data analysis as the huge data we experience today results in very sparse sets and very high dimensions. Although, data scientists have long used tools such as principal component analysis (PCA) and independent component analysis (ICA) to project the high-dimensional data onto a subspace, but all those techniques reply on the computation of the eigenvectors of a

matrix, a very expensive operation (e.g., spectral decomposition) for high dimension

. Moreover, even though eigenspace has many important properties, it does not lead good approximations for many useful measures such as vector norms. We discuss another method random projection to reduce dimensionality.

In 1984, two mathematicians introduced and proved the following lemma.

For any

,

, there exists a matrix

with

such that

, we have

Remark: This lemma states that for any pair vector

in

dimension, there exist a sketch matrix

which maps

and the Euclidean distance is preserved within

factor. The result dimension does not relate to origin dimension

(only relates to the number of vector pairs

).

During a long time, no one can figure out how to get this sketch matrix.

Random Projection

Until 2003, some researches point out that this sketch matrix can be created using Gaussian distribution.

Consider the following matrix

, where

and all

are independent. We claim that this matrix satisfies the statement of JL lemma.

Proof. It is obvious that sketch has an additional property,

. In other word, Gaussian distribution is 2-stable distribution. Then we can obtain

, where

. That is to say,

follows a

(chi-squared) distribution with degrees of freedom

. For tail bound of

distribution, we can get

for a constant

.

Fix two index

, and let

and

, and set

to get

Take the union bound to obtain,

Thus,

which is same as the guarantee in Johnson-Lindenstrauss lemma.

In this or other forms, the JL lemma has been used for a large variety of computational tasks, especially in streaming algorithm, such as

The post is used for study purpose only.