Redefining Open Minds: 如何找研究題目(四)：使用有效的工具，尋找適合的問題

如何找研究題目？

(How to come up with new research ideas?)

Jia-Bin Huang

jbhuang0604@gmail.com

Latest update: April 12, 2010

If the only tool you have is a hammer, you tend to see every problem as a nail. - Abraham Maslow

四、使用有效的工具，尋找適合的問題 　neXt = T(X), T: X->Y

Dimensionality Reduction

1. Introduction

　　在非常多科學的問題中我們往往會取得並使用高維(High-dimensional )的資料來描述某個物件，舉例來說一張大小100x100的灰階影像可以表示成一個10,000維的向量、氣候的大量資料、人類基因等，資料的高維度一方面大幅增加計算時的複雜度，另一方面在從統計的角度來看，高維空間需要以指數成長的樣本數才能正確地分析資料(curse of dimensionality)。

　　然而雖然資料表示在高維空間中，往往這些資料會落於一個有意義的較低維的空間中，也許是線性子空間(Linear subspace)或是非線性流形(Nonlinear manifold)，因此，資料降維(dimension reduction)可以1. 降低運算複雜度、2.取得更有本質意義的資料表示方式、3.，容易將高維資料視覺化(Visualization)。降維方法即是透過分析資料來尋找一種Embedding的方式，將資料從原先的高維空間映射到低維空間。

2. Dimensionality Reduction Methods

降維的方法非常多，根據兩種特質可以將這些方法分為四大類，第一種特質是線性或非線性(Linearity: Linear v.s. Non-linear)，第二種特質則是降維時保留全面性或是區域性的結構(Locality: Global v.s. local)：

第一及第二大類線性的方法利用分析資料的分布，來計算一個投影的矩陣已達到降維的功效，不同的目標則產生不同的降維方法

圖一降維方法分類圖 [19]

2.1 Linear, global structure preserved

i. Principal component analysis (PCA) [1]

PCA minimizes reconstruction error or maximizes the data variance in the low-dimensional representation of data.

ii. Linear discriminant analysis (LDA) [2]

LDA maximizes the linear class separability in the low-dimensional representation of the data.

iii. Independent component analysis (ICA) [3]

ICA makes sure that the low-dimensional representation of data are maximally
statistically independent.

iv. Factor analysis (FA) [4]

FA represent the original high-dimensional data with low-dimensional common factors and specific factors.

2.2 Linear, local structure preserved

i. Locality Preserving Projections (LPP) [5]

LPP approximates Laplacian eigenmaps (LE), preserves the neighborhood structure of the high-dimensional data.

ii. Neighborhood Preserving Embedding (NPE) [6]

NPE approximates Locally Linear Embedding (LLE), preserves the local neighborhood structure on the high-dimensional data manifold.

第三第四類為Nonlinear manifold learning，從2000年Science期刊上的兩篇文章(ISOMAP和LLE)之後開始蓬勃發展。

2.3 Non-linear, global structure preserved

i. Multidimensional scaling (MDS) [7]

MDS is a set of nonlinear techniques that maps the high-dimensional data to a low-dimensional representation while preserving the pairwise distances between the data points as much as possible.

ii. Isomap (ISOMAP) [8]

ISOMAP combines geodesic distances on a weighted graph and MDS.

iii. Kernel principal component analysis (Kernel PCA) [9]

Kernel PCA extends PCA using kernel tricks [20-22] to handle nonlinear data (without explicit nonlinear mapping of data).

iv. General Discriminant Analysis (GDA) [10]

GDA maximizes the Fisher criterion used in LDA in high-dimensional space constructed by a kernel function.

v. Diffusion maps (DM) [11]

DM embeds high-dimensional data onto low-dimensional space via the eigenvectors of suitably defined random walks defined on the given datasets.

2.4 Non-linear, local structure preserved

i. Locally Linear Embedding (LLE) [12-13]

LLE constructs a K-NN graph W, computes weights of data points by linear combination of neighbors (i.e., the manifold is locally linear), attempts to retain the reconstruction weights in the linear combinations in the low-dimensional space as well as possible.

ii. Laplacian Eigenmaps (LE) [14]

LE constructs a K-NN graph W, computes weights to minimize the norm of the gradient in least square sense, minimizes the distance between a datapoint and its k nearest neighbors in the low-dimensional space.

iii. Hessian Eigenmaps (HE) [15]

With the constraint that low-dimensional data representation is locally isometric, HE

minimizes the curviness of the high-dimensional manifold when constructing a low dimensional representation of data.

iv. Local Tangent Space Alignment (LTSA) [16]

Based on the assumption that the manifold is locally linear, LTSA simultaneously searches for the low-dimensional representation of data, and the linear mappings of the low dimensional datapoints to the local tangent space of high-dimensional manifold.

v. Semidefinite embedding (SDE) or maximum variance unfolding (MVU) [17]

SDE builds a K-NN graph, compute an inner product matrix that maximizes the pairwise distances between two points that are not connected, then apply MDS to obtain the low-dimensional representation of data.

vi. Visualizing Data using t-SNE (t-SNE) [18]

t-SNE converts the pairwise distances between the datapoints into probabilities of

selecting pairs of data points, achieving good visualization results.

註1：其實上面介紹的許多方法可以被歸類為Spectral method的一種，Spectral method牽涉到eigenvector/eigenvalue的計算(of specially constructed matrices)，並以藉此來分析高維空間的結構(e.g., PCA, LDA, metric MDS, ISOMAP, LLE, Kernel PCA, GDA SDE, LE, HE等等)

下一個段落介紹的Spectral clustering也屬與此類，只不過運用在資料的分群(clustering)，

註2：降維的方法可以看成是將存在於高維空間的圖Embed到低維空間(e.g., 線性投影)，所以可以以Graph Embedding將所有方法放在同一個架構下去做比較。[25]

註3：儘管Nonlinear方法能夠學習較複雜的流形結構，在真實世界的問題上未必能夠較線性的方法效果好 [19]

Dimensionality Reduction

Neil D. Lawrence

Geometric Methods and Manifold Learning

Partha Niyogi, Mikhail Belkin

Introduction to feature selection

Isabelle Guyon

Introduction to kernel methods

Bernhard Schölkopf

MIT 18.06 Linear Algebra Spring 2005

(對於線性代數的觀念不是非常清楚的話可以看這個線上課程，非常推薦)

[1] Analysis of a complex of statistical variables into principal components, Journal of educational psychology 1933

[2] The use of multiple measurements in taxonomic problems, Annals of Eugenics 1936

[3] Independent Component Analysis: Algorithms and Application, Neural Networks 2000

[4] Introduction to Factor Analysis

[5] Locality preserving projections, NIPS 2003

[6] Neighborhood Preserving Embedding, ICCV 2005

[7] Multidimensional scaling

[8] A global geometric framework for nonlinear dimensionality reduction, Science 2000

[9] Kernel principal component analysis, ICANN 1997

[10] Generalized discriminant analysis using a kernel approach, Neural Computation,2000

[11] Diffusion maps, spectral clustering and reaction coordinates of dynamical systems, Applied and Computational Harmonic Analysis 2006

[12] Nonlinear dimensionality reduction by locally linear embedding, Science 2000

[13] Think globally, fit locally: unsupervised learning of low dimensional manifolds, JMLR 2003

[14] Laplacian eigenmaps and spectral techniques for embedding and clustering, NIPS 2002

[15] Hessian Eigenmaps: new locally linear embedding techniques for high-dimensional data, Proceedings of the National Academy Science 2003

[16] Principal Manifolds and Nonlinear Dimension Reduction via Tangent Space Alignment, SIAM Journal of Scientific Computing, 2004

[17] Unsupervised learning of image manifolds by semidefinite programming, CVPR 2004

[18] Visualizing High-Dimensional Data Using t-SNE, JMLR 2008

[19] Dimensionality reduction: A Comparative Survey,

[20] Learning with kernels, 2002

[21] Nonlinear component analysis as a kernel eigenvalue problem, Neural computation 1998

[22] An introduction to kernel-based learning algorithms, TNN 2001

[23] A kernel view of the dimensionality reduction of manifolds, ICML 2004

[24] Spectral methods for dimensionality reduction

[25] Graph Embedding and Extensions: A General Framework for Dimensionality Reduction, PAMI 2007

3. Applications

不論是在科學研究的哪個領域(甚至部分的社會科學)，Dimensionality reduction幾乎都是必備的工具，這邊討論在電腦視覺中Linear subspace和Nonlinear manifold learning方法是如何被應用在各類問題中。

3.1 Subspace as constraints: Structure from motion, Optical flow, Layer extraction, Face alignment

某些問題我們事先可以透過分析得知要解的未知量(e.g., Optical flow, Structure of a scene)存在於一個線性子空間中，在計算時便可以加上subspace的限制得到更正確的估計，這種技巧可以運用在估測3D Structure [1]、2D Optical flow [2]、Planar homogrpahy [3]、Scene layer extraction [4]、和Face alignment [5]等應用。

[1] Shape and motion from image streams under orthography: a factorization method, IJCV 1992

[2] Multi-Frame Optical Flow Estimation Using Subspace Constraints, ICCV 1999

[3] Multi-view subspace constraints on homographies, ICCV 1999

[4] A subspace approach to layer extraction, CVPR 2001

[5] Face Alignment through Subspace Constrained Mean-shifts, ICCV 2009

3.2 Face Recognition

人臉辨識大概是線性子空間應用最多的一個問題，因為人臉影像在不同光線下的變化往往可以由低維的線性子空間[1]來表示，各種降維的方式如PCA [2]、LDA [3]、LPP [4]、ICA [5]或是經由optimization得來的subspace [6]都曾被提出來以求更佳的辨識率。過去一般的認知往往是Discriminative的方法(e.g., LDA)會比Generative方法(e.g., PCA)來得有效，但實際上並不是所有的情況下都是如此 [8]。另外除了線性的方法之外，Nonlinear manifold learning的觀念也被帶進來處理非線性的問題 [7]。

[1] Lambertian reflectance and linear subspaces, PAMI 2003

[2] Face recognition using eigenfaces, CVPR 1991

[3] Eigenfaces vs. fisherfaces: Recognition using class specific linear projection, ECCV 1996

[4] Face recognition using laplacianfaces, PAMI 2005

[5] Recognizing faces with PCA and ICA, CVIU 2003

[6] On Optimizing Subspaces for Face Recognition, ICCV 2009

[7] Local discriminant embedding and its variants, CVPR 2005

[8] PCA versus LDA, PAMI 2001

3.3 Motion segmentation

在同一場景時常同時有許多運動產生，Motion segmentation的問題即是將這些不同的Motion切割出來，此時subspace方法便可以非常有效地去model每一種motion，於是原先的motion segmentation這個問題便轉化成一個Subspace separation的問題了 [1-4]。