Saturday, May 29, 2010

Valder Fields


最近聽起半年前朋友推薦給我的清單,偶然聽到了這首歌Valder Fields,音樂一開始我就被這輕快的節奏和乾淨的吟唱給深深吸引著了,簡單平淡卻很深刻。

於是試著去網路上找尋歌詞在寫些甚麼,卻找不到一個真正正確的版本 (也許也沒有所謂正確的詞),基於這些不同版本歌詞翻譯出來的意境也相差十萬八千里,後來乾脆自己練起聽力,直接從音樂還原真正的歌詞。

Valder Fields這首歌描寫的是人在面對社會責任以及安逸自由兩種生活時的壓力。一方面想做個有用的人,追求社會所認同的成功、物質生活、別人的讚美 (如歌詞中的申請工作,一個男人哭著說他愛他的生活)。然而另一方面也渴望能夠尋求內心安逸,自由自在掙脫這些外在的眼光形成的枷鎖 (在陽光下泉水旁的泥土地上睡覺)。

會對這首歌感到共鳴,我想是因為我很明白自己內心總是處於這種矛盾狀態。常常在這兩種生活形態中拉扯,試著去尋找一個平衡點。以前的我挺愛計較,在意成績,在意別人的看法與認同,甚至在意自己是不是贏得過別人。然而隨著過了幾年再回頭審視時,那些以前在意的東西,似乎全都沒那麼重要了。我開始學著敞開心胸接受每個隨興的決定,享受偶然瞥見的絢爛夕陽或正藍的天,讓自己沉浸在簡單而美好的事物中。

學著活在當下。




Valder Fields by Tamas Wells

I was found on the ground by the fountain
At Valder Fields and was almost dry
Lying in the sun after I had tried
Lying in the sun by the side

We had agreed that the council would end
At three hours over time
Shoelaces were tied at the traffic lights
I was running late, (I) could apply

For another one I guess
If department stores are best
They said there would be delays
On the temporary pay

For another one I guess
If department stores are best
They said there would be delays
On the temporary pay

She was found on the ground in a gown
Made at Valder Fields and was sound asleep
(On the) stairs outside the door
To the man who cried
When he said that he loved his life

We had agreed that the council should
Take his keys to the bedroom door
In case he slept outside and was found in two
Days in Valder Fields with a mountain view

Monday, May 24, 2010

演講:如何找研究題目 (電腦視覺)?





兩個禮拜之後將給這個演講,應該是暑假前最後一次講這個題目,對於電腦視覺研究有興趣的朋友(e.g., 碩博士生)歡迎來參與討論。


注意時間有更正
時間:   6月8日下午 4:00-5:30 
地點:台北市南港中央研究院資訊所 106室

時間:6/8 (二) 下午2:00 - 3:30 (3:30-4:00 問題討論)
地點:台北市南港中央研究院資訊所 107 室





題目:How to come up with new research ideas (in Computer Vision)?

摘要
Computer vision has been studied for more than 40 years. Due to the increasingly diverse and rapidly developed topics in vision and the related fields (e.g., machine learning, signal processing, cognitive science), the tasks to come up with new research ideas are usually daunting for junior graduate students in this field. In this talk, I will present five methods to come up with new research ideas. For each method, I will give several examples (i.e., existing works in the literature) to illustrate how the method works in practice. 


This is a common sense talk and will not have complicated math equations and theories.

Note: The content of this talk is inspired by Prof. Ramesh Raskar's talk on "How to come up with new Ideas".


投影片

Saturday, May 22, 2010

How to Solve Problems in Computer Vision? Preface



How to Solve Problems in Computer Vision? Preface




Jia-Bin Huang
Jbhuang0604@gmail.com

Latest Update: May 20, 2010


The Feynman Problem-Solving Algorithm:
                      (1) Write down the problem
                      (2) Think very hard
                      (3) Write down the answer
         -Richard Feynman


Motivation

When writing the article series “How to come up with new research ideas in computer vision?”, I think that there is another thing which is also essential to the research process, namely, “How to solve problems in computer vision?”. It may not be a difficult task for junior researchers (especially the imaginative ones) to come up with new research ideas. However, it often takes years of research experiences to know how to describe and analyze a problem via scientific approaches, deal with data uncertainty though statistical approaches, and realize an effective and efficient implementation using engineering approaches.

The gap between the abstract thinking and the problem-solving skills usually forms the major communication barriers between graduate students and their advisors. For example, students might feel that the feedback from the advisor useless or impractical, while advisors might worry about the fact that students could have been lost in technical details. In fact, these two things (i.e., knowing what to do and how to do) are both of great importance and complement with each other. As the MIT motto says

Mens et Manus (Mind and Hand) - MIT's motto

In “How to come up with new research ideas?” I addressed the Mens (mind) part. In this series, I will focus on the Manus (hand) part.

After introducing the basic concepts in computer vision, I will follow the Feynman Problem-Solving Algorithm to show how to solve problems in computer vision. The three problem-solving steps will be accompanied with various examples in the following articles. Through examples, people can learn how these vision problems had been addressed and know how to apply these methodologies on their own problems. In other words, I am a believer in this quote:

Do not tell people how to live their lives. Just tell them stories. And they will figure out how those stories apply to them. - Randy Pausch


Note: The content of this article series is inspired from the new computer vision textbook: Computer Vision: Algorithms and Applications, by Richard Szeliski.


Three Level of Analysis in Computer Vision

David Marr, one of the pioneers in computer vision, proposed to understand vision (i.e., an information processing system) at three different levels of analysis.

1.    Computational Level
What does the system do, and why?

2.    Representation and Algorithm Level
How does the system do what it wants to do? What are the representations and what are the algorithms to build and manipulate the representations.

3.    Implementation Level
How does the system be physically realized?

Even after thirty-years, Marr’s view on vision remains useful and serve a good guide for formulating and solving problems in computer vision.


What Kinds of Problems Are There in Computer Vision?

Computer vision processes and interprets visual information (e.g., images or videos). We can roughly classify vision problems into three categories by the level of the output labels: 1) Low-level, 2) Mid-level, and 3) High-level vision.
(Note that this is my own interpretation. There may be other viewpoints on how to define low-, mid-, and high-level vision.)

The output from low-level vision problems is a label map, i.e., one label for each pixel. Mid-level vision problems produce meaningful representations from images or videos. High-level vision predicts (semantic) class labels.

InputVisual information (e.g., image or video)

=====Low-level vision=====

OutputA label map
Example problems                         Label means

Depth estimation                       Depth from the viewer
Figure / ground estimation         Foreground or background
Edge detection                          Edge or non-edge
Segmentation                            Region membership
Motion (optical flow)                   Motion vector
Intrinsic image                           Reflectance
Image restoration               High-resolution, clear, clean images
(denoising,  demosaicking,
deblurring, inpainting,
contrast enhancement,
dehazing, super-resolution)

=====Mid-level vision=====

OutputRepresentation of images
Example problems

Shape reconstruction
Human/head pose estimation
Hand gesture representation
Face representation
(Snakes / Active shape models / Active appearance models)
Texture representation and synthesis
Image representation 
(Bag-of-features / Spatial pyramid, i.e., coding and pooling of low-level features)

=====High-level vision=====

InputSemantic label of an image
Example problems

Object detection / localization
Object recognition
Optical character recognition
Hand-written digit recognition
Event recognition
Scene understanding

Relationship between Computer Vision and Computer Graphics
From the three-level of vision problems above we know that all computer vision problems aim to recover information Y we need by input visual information X. In other words, the goal of computer vision problems is to accurately estimate the conditional probability P(Y|X). On the other hand, computer graphics cares P(X|Y), i.e., given the information Y, what’s the realistic image X would be?

Take the Natal Project for an example, computer vision attempts to the recover the underlying human pose (or facial expression) from image and depth sensors, while computer graphics generates the corresponding animation (the one you saw in your TV) from the estimated pose.


Another practical example is 3D city in the Google Earth. Computer vision reconstructs the 3D structure from a large collection of photos, while computer graphics renders realistic scenes from these 3D models.


Over the last decades, we have witnessed an increasing interaction between computer vision and computer graphics, such as image-based modeling and rendering, morphing, light field capture and rendering, panoramic image stitching and computational photography.


How to Solve Problems in Computer Vision?

How to solve these difficult inverse problems in computer vision? We will follow Richard Feynman’s Problem-Solving Algorithm and show how three different kinds of approaches, namely, scientific, statistical, and engineering approach are applied in this problem-solving algorithm.

The Feynman Problem-Solving Algorithm
Step 1: Write down the problem.
Step 2: Think very hard.
Step 3: Write down the answer.

The problem-solving algorithm seems trivial at the first glance. However, it can be widely applied to solve complex problems. Below is a tentative outline:

1.    Write down the problem (Scientific)
  1. Low-level vision
  2. Mid-level vision
  3. High-level vision

2.    Think very hard (Statistical)
  1. Bayesian Modeling
  2. Inference: Maximum Likelihood (ML), Maximum a Posteriori (MAP) and Minimum Mean Squared Error estimation (MMSE)
  3. Approaximate inference: Monte Carlo methods, Variational methods, and (loopy) belief propagation.
  4. Structured prediction: Markov random Field, Conditional random field, Max-Margin Markov Networks, Structured SVM, Joint Kernel Support Estimation

3.    Write down the answer (Engineering)
  1. Implementation algorithms
  2. Testing: synthetic experiments, noise and abbreviation from the model, real-world images

Friday, May 21, 2010

The Computer Vision Genealogy Project



The Computer Vision Genealogy Project



The Computer Vision Genealogy Project (CVGP) aims at collecting people in computer vision by organizing a family tree according to dissertation supervision relationships. Computer vision has been studied for more than 40 years. This field becomes increasingly active over the past decades. Although there are some other related academic genealogy projects exist (e.g., The AI Genealogy Project or The Mathematics Genealogy Project), we believe that building a genealogy database specific to the this area can provide more useful and interesting information for people in this community. 


This is an ongoing project. Therefore, you may find that there are some researchers missing or incorrect entries in our database. Please help us build a better computer vision genealogy database by inserting researchers, or modifying wrong profiles.




People









Thursday, May 20, 2010

如何解電腦視覺中的研究題目? 前言



如何解電腦視覺中的研究題目? 前言
(How to solve problems in computer vision?)

Jia-Bin Huang 
jbhuang0604@gmail.com

Latest update: May 20, 2010

The Feynman Problem-Solving Algorithm:
             (1) Write down the problem
             (2) Think very hard
             (3) Write down the answer                                                                                                   
                                                        -Richard Feynman

在撰寫如何找研究題目(How to come up with new research ideas?)系列文章的同時,我個人認為還有另一個重要的過程是研究過程中所不可或缺的,那就是如何解題(How to solve problems?)。對問題的觀察和思考而提出抽象的新想法對許多人(尤其是想像力豐富的學生)也許已不是難事,但怎麼經由科學的方法分析及描述問題,統計與機率的方法來處理資料的不確定性,和工程的方法實現有效率的解法,仍然不曉得從何著手。以至於研究作品很容易就陷入別人所定義的問題框架下。

這種抽象的想法(ideas)和實際的問題解決(problem-solving)之間的差距常常是研究生與指導教授之間的溝通問題所在。舉例來說,學生常覺得指導教授們只會用抽象且零散的思考來回應(i.e., 嘴砲),對於他們手上問題的解決沒有幫助。另一方面教授可能對於學生埋首於實現技巧而忽略問題本質的現象感到憂心。而這兩者其實是互補的,彼此缺一不可的,正如MIT的格言:

Mens et Manus (Mind and Hand) - MIT's motto

在前頭如何找研究題目(How to come up with new research ideas?)中著墨的是Mens (Mind),而這系列如何解研究題目?(How to solve research problems?)則強調Manus (Hand) 的重要。

在前言中介紹的是電腦視覺的基本觀念,接下來會跟著費曼的解題演算法來達成三步解題的過程概略簡介,在之後的文章會做藉由實例做詳盡的介紹。



電腦視覺的三個層次

David Marr,電腦視覺研究的先驅者之一,對於描述電腦視覺系統提出了三種不同的層次:
1. 計算層級 (Computational Level)
電腦視覺系統要做甚麼(What)?為什麼要做這些事情(Why)?

2. 表示與演算法層級 (Representation and algorithm level)
輸入輸出和中間的影像資訊如何表示(How to represent)? 以及如何處理這些表式來解決一個問題 (Which algorithms)?

3. 實現層級 (Implementation level)
如何實現? (How is the system physically realized.)

儘管這已是將近三十年前的描述,在處理現代的電腦視覺問題仍然通用,首先要了解要做甚麼問題以及其動機(i.e., application-oriented)。接著定義輸入輸出的表示方式(representation),然後設計有效的方法去求得想得知的未知數(algorithm)。最後再利用程式或是硬體(software/hardware)來實現這個功能。



電腦視覺研究中有那些問題?

電腦視覺處理的是視覺資訊(e.g., image, videos),並且利用這些資訊經由處理來得到我們想知道的訊息。依照取得的訊息(output)層級來分類,可以分為Low-levelMid-level、以及High-level vision三類:

輸入:Visual information (e.g., image or video)

=====Low-level vision=====

輸出:label of every pixel
範例問題                                            Label表示
Depth estimation                       Depth from the viewer
Figure / ground estimation        Foreground or background
Edge detection                         Edge or non-edge
Segmentation                           Region membership
Motion (optical flow)                  Motion vector
Intrinsic image                          Reflectance
Restoration                        High-resolution, clear/clean image
(denoising,  emosaicking, 
deblurring, inpainting, 
contrast enhancement, 
dehazing, super-resolution)


=====Mid-level vision=====

輸出:representation of an image
範例問題
Shape reconstruction
Human/head pose estimation
Hand gesture representation
Face representation (Snakes / Active shape models / Active appearance models)
Texture representation and synthesis
Image representation (Bag-of-features / Spatial pyramid, i.e., coding and pooling of low-level features)

=====High-level vision=====

輸出:Label of a window or an image
範例問題:
Object detection / localization
Object recognition
Optical character recognition
Hand-written digit recognition
Event recognition
Scene understanding



電腦視覺與電腦圖學之間的關係為何?

從上面三層電腦視覺的問題我們可以把所有問題都看成是:經由觀察到的影像資訊(image, video) input X,去求得我們想要的訊息 output Y。也就是在電腦視覺中我們關心的是 P(Y|X),而電腦圖學則關心相反的事情:P(X|Y)

舉最近很紅的Natal Project為例,電腦視覺的問題是如何從影像資訊(image + depth sensor)來得到使用者的姿態(Pose estimation),而電腦圖學則將以知的(或是估計得來的)姿態表示來產生相對應的影像(Animation)


另一個實際的例子是Google Earth中的3D城市,這裡包含了從影像推估真實世界中的3D模型(電腦視覺-3D reconstruction),以及如何用此3D模型合成正確的影像(電腦圖學-rendering)





如何解題?

接下來要介紹的便是如何解決電腦視覺中的問題?Richard Feynman對於解題提出了一個很有趣的但是很有效的三步解題演算法 (The Feynman Problem-Solving Algorithm)

第一步:寫下問題 (Write down the problem)
第二步:想破頭 (Think very hard)
第三步:寫下解答 (Write down the answer)

這個解題演算法看似trival,但是仔細思考後發現我們解題的過程中怎麼也脫離不了這三個步驟,接下來便會依照這三個步驟來分析並解釋如何解決電腦視覺中的研究題目。



(1) Write down the problem (Scientific)

第一步,也是解題過程中最困難的一步,寫下問題。怎麼樣將題目寫下來呢?這部分通常都會採用科學(Scientific)的方法。將想要求得的資訊與觀察到的影像之間的關係用數學式來表示 (i.e., forward model)。其中必須要考量到的可能有
a. Image formation process
b. Physics of light and reflectance
c. Geometry properties and constraints
d. Simplifying assumptions
等等因素。

先舉個最簡單的例子來說明如何寫下一個問題:去除影像中的雜訊(image denoising)

要寫下這個問題,便需要假設我們觀察到的影像 y 是由乾淨無雜訊的影像 x 加上雜訊n (additive model),所以我們便可以寫下輸入與輸出之間的關係式為

y = x + n

上述式子便構成了最陽春的Inverse problem

有了這個式子(i.e., forward model),便知道Image denoising這個問題即是如何從y來推估x是甚麼。然而到這個階段第一步驟還沒完成。我們還必須給定明確的定義說明甚麼是好的x?也就是運用我們對於問題本身的了解加上prior來對於問題進行規範(constraint),以解決Inverse problem (否則有無限多組解)

Image denoising這個問題中常用的目標函數(objective function)便是

f(x) = 0.5*| y - x |^2 + Pr(x)

Pr(x)便是我們對於影像xPrior / regularization

到這裡我們便可以將Image denoising的問題用數學的語言描述成:找到一個x可以將f(x)最小化。



(2) Think very hard (Statistical / Probabilistic)

完成了第一步驟:寫下問題之後,接下來便是要思考如何解決該問題,在電腦視覺中我們取得的資料往往是noisy的甚或是corrupted的,加上真實世界中的種種不確定性,使得我們常常必須要以ProbabilisticStatistical的方法來處理電腦視覺中的問題。

藉由一些觀察(observation)來預測一些未知量(unknown parameter, i.e., labels),這樣的問題屬於Estimation Theory的研究範圍。在這邊我僅介紹電腦視覺中被廣泛運用的貝式模型與估測(Bayesian modeling and inference),如Maximum Likelihood (ML)Maximum a Posteriori (MAP)、和Minimum Mean Squared Error estimation (MMSE)等等。還有在沒有辦法得到close-form解法時所使用的一些逼近(Approaximation)方法,如Monte Carlo Methods, Variational methods, (loopy) belief propagation等等。


(3) Write down the answer (Engineering)

完成第二步驟之後,若沒有經過工程方法來得到一個有效並且有效率的實現方法的話,那麼問題還沒有真的解決。我們在第一步時有了問題的定義(forward model),以及第二步中找到了解決問題的工具(Inference algorithm to invert the image formation process),在第三步驟便是用程式語言(或硬體)有效率地將其實踐並做三個層次的測試。

a. Synthetic experiments

自己產生許多無雜訊的合成的影像,這個層次的測試目的在測試實現程式的正確性。

b. Noise or abbreviation from the model

在這個階段,試著將合成的影像加上一定程度的雜訊。若在第一步時做了許多問題的假設,在這個階段可以看看你的方法在假設不成立時發生甚麼事情?這個階段測試方法對於Noise或是Assumption敏感的程度。

c. Real-World Images

電腦視覺是應用的學門,所以測試一個方法的最後一個階段即是將他應用在真實世界中。40年前我們只有一張Lena,一張影像同時是訓練影像也是測試影像。隨著照相機的普及到現在Flickr上成上百萬上千萬的照片,因此若希望提出的方法具有實際的效用,最好的方法便是將你的方法在真實世界的照片(或是自己的照片)上做測試。