Saturday, May 22, 2010

How to Solve Problems in Computer Vision? Preface

How to Solve Problems in Computer Vision? Preface

Jia-Bin Huang

Latest Update: May 20, 2010

The Feynman Problem-Solving Algorithm:
                      (1) Write down the problem
                      (2) Think very hard
                      (3) Write down the answer
         -Richard Feynman


When writing the article series “How to come up with new research ideas in computer vision?”, I think that there is another thing which is also essential to the research process, namely, “How to solve problems in computer vision?”. It may not be a difficult task for junior researchers (especially the imaginative ones) to come up with new research ideas. However, it often takes years of research experiences to know how to describe and analyze a problem via scientific approaches, deal with data uncertainty though statistical approaches, and realize an effective and efficient implementation using engineering approaches.

The gap between the abstract thinking and the problem-solving skills usually forms the major communication barriers between graduate students and their advisors. For example, students might feel that the feedback from the advisor useless or impractical, while advisors might worry about the fact that students could have been lost in technical details. In fact, these two things (i.e., knowing what to do and how to do) are both of great importance and complement with each other. As the MIT motto says

Mens et Manus (Mind and Hand) - MIT's motto

In “How to come up with new research ideas?” I addressed the Mens (mind) part. In this series, I will focus on the Manus (hand) part.

After introducing the basic concepts in computer vision, I will follow the Feynman Problem-Solving Algorithm to show how to solve problems in computer vision. The three problem-solving steps will be accompanied with various examples in the following articles. Through examples, people can learn how these vision problems had been addressed and know how to apply these methodologies on their own problems. In other words, I am a believer in this quote:

Do not tell people how to live their lives. Just tell them stories. And they will figure out how those stories apply to them. - Randy Pausch

Note: The content of this article series is inspired from the new computer vision textbook: Computer Vision: Algorithms and Applications, by Richard Szeliski.

Three Level of Analysis in Computer Vision

David Marr, one of the pioneers in computer vision, proposed to understand vision (i.e., an information processing system) at three different levels of analysis.

1.    Computational Level
What does the system do, and why?

2.    Representation and Algorithm Level
How does the system do what it wants to do? What are the representations and what are the algorithms to build and manipulate the representations.

3.    Implementation Level
How does the system be physically realized?

Even after thirty-years, Marr’s view on vision remains useful and serve a good guide for formulating and solving problems in computer vision.

What Kinds of Problems Are There in Computer Vision?

Computer vision processes and interprets visual information (e.g., images or videos). We can roughly classify vision problems into three categories by the level of the output labels: 1) Low-level, 2) Mid-level, and 3) High-level vision.
(Note that this is my own interpretation. There may be other viewpoints on how to define low-, mid-, and high-level vision.)

The output from low-level vision problems is a label map, i.e., one label for each pixel. Mid-level vision problems produce meaningful representations from images or videos. High-level vision predicts (semantic) class labels.

InputVisual information (e.g., image or video)

=====Low-level vision=====

OutputA label map
Example problems                         Label means

Depth estimation                       Depth from the viewer
Figure / ground estimation         Foreground or background
Edge detection                          Edge or non-edge
Segmentation                            Region membership
Motion (optical flow)                   Motion vector
Intrinsic image                           Reflectance
Image restoration               High-resolution, clear, clean images
(denoising,  demosaicking,
deblurring, inpainting,
contrast enhancement,
dehazing, super-resolution)

=====Mid-level vision=====

OutputRepresentation of images
Example problems

Shape reconstruction
Human/head pose estimation
Hand gesture representation
Face representation
(Snakes / Active shape models / Active appearance models)
Texture representation and synthesis
Image representation 
(Bag-of-features / Spatial pyramid, i.e., coding and pooling of low-level features)

=====High-level vision=====

InputSemantic label of an image
Example problems

Object detection / localization
Object recognition
Optical character recognition
Hand-written digit recognition
Event recognition
Scene understanding

Relationship between Computer Vision and Computer Graphics
From the three-level of vision problems above we know that all computer vision problems aim to recover information Y we need by input visual information X. In other words, the goal of computer vision problems is to accurately estimate the conditional probability P(Y|X). On the other hand, computer graphics cares P(X|Y), i.e., given the information Y, what’s the realistic image X would be?

Take the Natal Project for an example, computer vision attempts to the recover the underlying human pose (or facial expression) from image and depth sensors, while computer graphics generates the corresponding animation (the one you saw in your TV) from the estimated pose.

Another practical example is 3D city in the Google Earth. Computer vision reconstructs the 3D structure from a large collection of photos, while computer graphics renders realistic scenes from these 3D models.

Over the last decades, we have witnessed an increasing interaction between computer vision and computer graphics, such as image-based modeling and rendering, morphing, light field capture and rendering, panoramic image stitching and computational photography.

How to Solve Problems in Computer Vision?

How to solve these difficult inverse problems in computer vision? We will follow Richard Feynman’s Problem-Solving Algorithm and show how three different kinds of approaches, namely, scientific, statistical, and engineering approach are applied in this problem-solving algorithm.

The Feynman Problem-Solving Algorithm
Step 1: Write down the problem.
Step 2: Think very hard.
Step 3: Write down the answer.

The problem-solving algorithm seems trivial at the first glance. However, it can be widely applied to solve complex problems. Below is a tentative outline:

1.    Write down the problem (Scientific)
  1. Low-level vision
  2. Mid-level vision
  3. High-level vision

2.    Think very hard (Statistical)
  1. Bayesian Modeling
  2. Inference: Maximum Likelihood (ML), Maximum a Posteriori (MAP) and Minimum Mean Squared Error estimation (MMSE)
  3. Approaximate inference: Monte Carlo methods, Variational methods, and (loopy) belief propagation.
  4. Structured prediction: Markov random Field, Conditional random field, Max-Margin Markov Networks, Structured SVM, Joint Kernel Support Estimation

3.    Write down the answer (Engineering)
  1. Implementation algorithms
  2. Testing: synthetic experiments, noise and abbreviation from the model, real-world images

No comments :

Post a Comment