This paper derives a perturbation bound on the optimal subspace estimator obtained from a subset of its canonical projections contaminated by noise. This fundamental result has important implications in matrix completion, subspace clustering, and related problems.
It is often useful to model data using linear subspaces for computational efficiency, model interpretability, and extensive theoretical framework around estimation error and sensitivity to noisy data. A common problem, however, is that while we build linear models using observed data as proxies for basis vectors (for example, in Low Rank Matrix Completion and Robust PCA), data is incomplete. Therefore, we only observe partial components of basis vectors. In this setting, there has been theoretical work done in discovering necessary and sufficient conditions for when reconstruction of a subspace from partially observed basis vectors is possible (1, 2, 3), this was done in the noiseless-data setting. In this work, we present a novel method of reconstructing the optimal subspace estimator from noisy observations and present an upper bound for our estimation error. We also present experimental results to demonstrate how our method performs under synthetic settings.
You can find relevant code here. You can also find my ISIT 22 presentation slides on this paper here .