Friday, April 23, 2010

如何找研究題目(四):使用有效的工具,尋找適合的問題 - Dimensionality Reduction


如何找研究題目?
                                                         
(How to come up with new research ideas?) 

Jia-Bin Huang 

jbhuang0604@gmail.com

                                                                                          
Latest update: April 12, 2010


If the only tool you have is a hammer, you tend to see every problem as a nail. - Abraham Maslow


四、使用有效的工具,尋找適合的問題  neXt = T(X), T: X->Y



Dimensionality Reduction


1.    Introduction

  在非常多科學的問題中我們往往會取得並使用高維(High-dimensional )的資料來描述某個物件,舉例來說一張大小100x100的灰階影像可以表示成一個10,000維的向量、氣候的大量資料、人類基因等,資料的高維度一方面大幅增加計算時的複雜度,另一方面在從統計的角度來看,高維空間需要以指數成長的樣本數才能正確地分析資料(curse of dimensionality)
  然而雖然資料表示在高維空間中,往往這些資料會落於一個有意義的較低維的空間中,也許是線性子空間(Linear subspace)或是非線性流形(Nonlinear manifold) 因此,資料降維(dimension reduction)可以1. 降低運算複雜度、2.取得更有本質意義的資料表示方式、3.,容易將高維資料視覺化(Visualization)。降維方法即是透過分析資料來尋找一種Embedding的方式,將資料從原先的高維空間映射到低維空間。 

2.    Dimensionality Reduction Methods






降維的方法非常多,根據兩種特質可以將這些方法分為四大類,第一種特質是線性或非線性(Linearity: Linear v.s. Non-linear),第二種特質則是降維時保留全面性或是區域性的結構(Locality: Global v.s. local)

第一及第二大類線性的方法利用分析資料的分布,來計算一個投影的矩陣已達到降維的功效,不同的目標則產生不同的降維方法



圖一 降維方法分類圖 [19]



2.1  Linear, global structure preserved

        i.     Principal component analysis (PCA) [1]
PCA minimizes reconstruction error or maximizes the data variance in the low-dimensional representation of data.

       ii.     Linear discriminant analysis (LDA) [2]
LDA  maximizes the linear class separability in the low-dimensional representation of the data.

      iii.     Independent component analysis (ICA) [3]
ICA makes sure that the low-dimensional representation of data are maximally
statistically independent.

      iv.     Factor analysis (FA) [4]
FA represent the original high-dimensional data with low-dimensional common factors and specific factors.

2.2 Linear, local structure preserved

        i.     Locality Preserving Projections (LPP) [5]
LPP approximates Laplacian eigenmaps (LE), preserves the neighborhood structure of the high-dimensional data.

       ii.     Neighborhood Preserving Embedding (NPE) [6]
NPE approximates Locally Linear Embedding (LLE), preserves the local neighborhood structure on the high-dimensional data manifold.

第三第四類為Nonlinear manifold learning,從2000Science期刊上的兩篇文章(ISOMAPLLE)之後開始蓬勃發展。

2.3 Non-linear, global structure preserved

i.      Multidimensional scaling (MDS) [7]
     MDS is a set of nonlinear techniques that maps the high-dimensional data to a low-dimensional representation while preserving the pairwise distances between the data points as much as possible.

ii.     Isomap (ISOMAP) [8]
    ISOMAP combines geodesic distances on a weighted graph and MDS.

iii.    Kernel principal component analysis (Kernel PCA) [9]
    Kernel PCA extends PCA using kernel tricks [20-22] to handle nonlinear data (without explicit nonlinear mapping of data).

iv.    General Discriminant Analysis (GDA) [10]
    GDA maximizes the Fisher criterion used in LDA in high-dimensional space constructed by a kernel function.

v.     Diffusion maps  (DM) [11]
    DM embeds high-dimensional data onto low-dimensional space via the eigenvectors of suitably defined random walks defined on the given datasets.

2.4 Non-linear, local structure preserved

i.      Locally Linear Embedding (LLE) [12-13]
    LLE constructs a K-NN graph W, computes weights of data points by linear combination of neighbors (i.e., the manifold is locally linear), attempts to retain the reconstruction weights in the linear combinations in the low-dimensional space as well as possible.

ii.     Laplacian Eigenmaps (LE) [14]
    LE constructs a K-NN graph W, computes weights to minimize the norm of the gradient in least square sense, minimizes the distance between a datapoint and its k nearest neighbors in the low-dimensional space.

iii.    Hessian Eigenmaps (HE) [15]
    With the constraint that low-dimensional data representation is locally isometric, HE
   minimizes the curviness of the high-dimensional manifold when constructing a low dimensional representation of data.

iv.    Local Tangent Space Alignment (LTSA) [16]
    Based on the assumption that the manifold is locally linear, LTSA simultaneously searches for the low-dimensional representation of data, and the linear mappings of the low dimensional datapoints to the local tangent space of high-dimensional manifold.

v.     Semidefinite embedding (SDE) or maximum variance unfolding (MVU) [17]
    SDE builds a K-NN graph, compute an inner product matrix that maximizes the pairwise distances between two points that are not connected, then apply MDS to obtain the low-dimensional representation of data.

vi.     Visualizing Data using t-SNE (t-SNE) [18]
    t-SNE converts the pairwise distances between the datapoints into probabilities of 
selecting pairs of data points, achieving good visualization results.

1:其實上面介紹的許多方法可以被歸類為Spectral method的一種,Spectral method牽涉到eigenvector/eigenvalue的計算(of specially constructed matrices),並以藉此來分析高維空間的結構(e.g., PCA, LDA, metric MDS, ISOMAP, LLE, Kernel PCA, GDA SDE, LE, HE等等)
下一個段落介紹的Spectral clustering也屬與此類,只不過運用在資料的分群(clustering),

註2:降維的方法可以看成是將存在於高維空間的圖Embed到低維空間(e.g., 線性投影),所以可以以Graph Embedding將所有方法放在同一個架構下去做比較。[25]

3:儘管Nonlinear方法能夠學習較複雜的流形結構,在真實世界的問題上未必能夠較線性的方法效果好 [19]







Neil D. Lawrence


 
 
Geometric Methods and Manifold Learning


Partha Niyogi, Mikhail Belkin


  
 
Introduction to feature selection


Isabelle Guyon






 


 
Introduction to kernel methods

Bernhard Schölkopf


MIT 18.06 Linear Algebra Spring 2005 

(對於線性代數的觀念不是非常清楚的話可以看這個線上課程,非常推薦)



[11] Diffusion maps, spectral clustering and reaction coordinates of dynamical systems, Applied and Computational Harmonic Analysis 2006
[16] Principal Manifolds and Nonlinear Dimension Reduction via Tangent Space Alignment, SIAM Journal of Scientific Computing, 2004
[19] Dimensionality reduction: A Comparative Survey,


3.    Applications


不論是在科學研究的哪個領域(甚至部分的社會科學)Dimensionality reduction幾乎都是必備的工具,這邊討論在電腦視覺中Linear subspaceNonlinear manifold learning方法是如何被應用在各類問題中。

 3.1 Subspace as constraints: Structure from motion, Optical flow, Layer extraction, Face alignment

某些問題我們事先可以透過分析得知要解的未知量(e.g., Optical flow, Structure of a scene)存在於一個線性子空間中,在計算時便可以加上subspace的限制得到更正確的估計,這種技巧可以運用在估測3D Structure [1]2D Optical flow [2]Planar homogrpahy [3]Scene layer extraction [4]、和Face alignment [5]等應用。


3.2  Face Recognition

人臉辨識大概是線性子空間應用最多的一個問題,因為人臉影像在不同光線下的變化往往可以由低維的線性子空間[1]來表示,各種降維的方式如PCA [2]LDA [3]LPP [4]ICA [5]或是經由optimization得來的subspace [6]都曾被提出來以求更佳的辨識率。過去一般的認知往往是Discriminative的方法(e.g., LDA)會比Generative方法(e.g., PCA)來得有效,但實際上並不是所有的情況下都是如此 [8]。另外除了線性的方法之外,Nonlinear manifold learning的觀念也被帶進來處理非線性的問題 [7]

[6] On Optimizing Subspaces for Face Recognition, ICCV 2009
[8] PCA versus LDA, PAMI 2001

3.3 Motion segmentation

在同一場景時常同時有許多運動產生,Motion segmentation的問題即是將這些不同的Motion切割出來,此時subspace方法便可以非常有效地去model每一種motion,於是原先的motion segmentation這個問題便轉化成一個Subspace separation的問題了 [1-4]

[2] A General Framework for Motion Segmentation: Independent, Articulated, Rigid, Non-rigid, Degenerate and Non-degenerate, ECCV 2006


 3.4  Lighting

光線的變化對於拍攝的物體本身會產生非常大的改變,這些改變使得電腦視覺的演算法難以掌握物體外觀的本質,然而光線的變化所產生的影響被認為可以使用線性子空間來表示,於是利用Linear Subspace的方法便能夠捕捉到所有光線(包含未見過的)影響 [1-4]


3.5 Visual tracking

基於一個物件的外觀可以用Subspace來簡潔地表示,便可將subspace的觀念運用在物件追蹤的問題上,但是在追蹤的時候,雖著時間推進,物體的外觀可能會受到光線、角度、遮蔽等等影響,所以在追蹤時subspace方法必須考量到如何做incremental update [1-2]


除了以上的應用子空間的方法之外,還可以運用在物件辨識(Object recognition)、動作辨識(Action recognition),和許多電腦圖學的問題上。