Subspace-Embedding
Subspace embedding is a powerful tool to simplify the matrix calculation and analyze high dimensional data, especially for sparse matrix.
Subspace Embedding
A random matrix $\Pi \in \mathbb{R}^{m \times n}$ is a $(d, \epsilon, \delta)$-subspace embedding if for every $d$-dimensional subspace $U \subseteq \mathbb{R}^n$, $\forall x \in U$ has,
Essentially, the sketch matrix maps any vector $x \in \mathbb{R}^n$ in the span of the columns of $U$ to $\mathbb{R}^m$ and the $l_2$ norm is preserved with high probability.
Matrix Multiplication via Subspace Embedding
Consider a simple problem, given two matrix $A, B \in \mathbb{R}^{n \times d}$, what is the complexity to compute the $C = A^{\top} B$? The simple algorithm takes $O(nd^2)$. Now we use subspace embedding to solve it. The result matrix is just $C’ = (\Pi A)^{\top} (\Pi B)$. We can prove that with at least $1 - 3d^2 \epsilon$ probability, holds.
Least Squares Regression via Subspace Embedding
Before we introduce subspace embedding, consider a simple problem, least squares regression. The exact least squares regression is the following problem: Given $A \in \mathbb{R}^{n \times d}$ and $b \in \mathbb{R}^n$, solve that
It is well-known that the solution is $(A^{\top}A)^{+} A^T b$, where $(A^{\top}A)^{+}$ is the Moore-Penrose pseudoinverse of $A^{\top}A$. It can be calculated via SVD computation, taking $O(n d^2)$ time. However, if we allow approximation, can we decrease the time complexity? We can formalize the question as below, instead of finding the exact solution $x^{*}$, we would like to find $x’ \in \mathbb{R}^d$ such that,
where $\Delta$ is a small constant number.
Suppose there exist a $(d+1, \epsilon, \delta)$-subspace embedding matrix $\Pi$, can we solve the following problem instead?
Proof: By the definition of $d+1$-subspace embedding matrix, the following equation holds with probability at least $1 - \delta$ for every arbitrary $x \in \mathbb{R}^d$
For $x’$ is optimum in equation(2), we have
Replace $x^{\star}$ with $x$ in equation(3), we have
Replace $x’$ with $x$ in equation(3), we have
Combine equation(4, 5, 6) to get
Take $\Delta = \frac{2 \epsilon}{1 - \epsilon}$ to conclude that the solution in equation(2) satisfies the desired statement.
Util now, we have seen how to solve approximate least regression problem by subspace embedding. However, one fundamental questions may arise, how to construct subspace embedding matrix? In the following section, we demonstrate that CountSketch is a subspace embedding.
Subspace Embedding Via CountSketch
CountSketch matrix $S \in \mathbb{R}^{B \times n}$ is defined as follows, fix the number of buckets $B$, a hash function $h:[n] \rightarrow [B]$ and a sign function $\phi:[n] \rightarrow {-1, +1}$. For $r \in [B], a \in [n]$, let
CountSketch Example:
We can show that for every subspace $U \in \mathbb{R}^{n \times d}$, then
Proof:
For $x$ is the the column span of $U$, then write $x$ as $Uy$ where $y \in \mathbb{R}^d$.
equivalent to
For $U^{\top} U = I$,
equivalent to
Since Frobenius norm upper bounds spectral norm, it suffices to show that
We can show that (the detailed proof is ignored)
By the Markov’s inequality,
Then we can obtain
Thus
which implies that CountSketch is a $(d, \epsilon, \frac{2 d^2}{B \epsilon^2})$-subspace embedding. Setting $B = \frac{C d^2}{\epsilon^2}$ for a large enough absolute constant $C$ gives a subspace embedding with large constant probability.
Complexity Analysis
The matrix $\Pi A$ is a $B \times d$ matrix, where $B = \frac{C d^2}{\epsilon^2}$. Thus using SVD to solve $||\Pi A x - \Pi b||$ takes $ploy(d, \frac{1}{\epsilon})$. How much time does it take to form the matrix $\Pi A$ and the vector $\Pi b$? Since every column of $\Pi$ has exactly one nonzero, the runtime of this is proportional to the number of nonzeros in the matrix $A$ and the vector $b$. The overall time is $O(nnz(A) + ploy(d, \frac{1}{\epsilon}))$. Note that if the matrix is sparse, this is very efficient.
Experiment
Reference
-
EPFL Topics in Theoretical Computer Science (Sublinear Algorithm for Big Data Analysis), 2017
-
Xiangrui Meng and Michael W. Mahoney. Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression, 2012.
-
Jelani Nelson and Huy L. Nguyen. Osnap: Faster numerical linear algebra algorithms via sparser subspace embeddings, 2012.