Skip to main content

Zhaoyin Jia


In recent years the problem of object recognition has been extensively studied. Under many circumstances users will have the access to control the vision system, e.g. a guided robot. In that case not only can we acquire multiple views, but also are able to actively control the system to pick a certain angle. This is where the context of active view recognition appears.


3D reconstruction and modeling become useful in many applications. This is usually achieved by either Laser Scanning or using Structure from Motion algorithms. Both methods can generate sparse point clouds. We propose an algorithm to interpolate the sparse 3D points into denser clouds by estimating the latent surface geometry of the point clouds and using the color information. The algorithm estimates the latent surface given the 3D points considering normal, distance etc, and performs a more robust segmentation on 3D points along with 2D colors. The resulting algorithm gives a boost to the baseline method with lower interpolation error.


We build a camera array with around 100 network cameras to achieve image based rendering. Image based rendering is to interpolate virtual scene using multiple cameras. With a dense camera array, like in hundreds, we can render the scene of any virtual camera within the range. These cameras are synchronized to capture the images at relatively fast speed, and the final rendering result is appealing.


We implement a C++ based Structure from Motion library, which takes in multi-view images and produces the 3D reconstruction of the scene. Structure from Motion is a computer vision term describing a set of algorithm reconstructing three-dimension structure from an object from a set of two-dimension images. 'Structure' means to recover the structure of the scene/object, and 'motion' means the motion of the camera. In this library, we use a set of images in the same scene, and a known camera providing the intrinsic camera parameters. At first feature detection and mapping algorithm should be applied. Secondly it is to estimate the camera position from the feature correspondence and calculate the three-dimension location of all the features. In this step some algorithm like factorization or triangulation is implemented. At last usually there is a refined process minimize the reconstruction error, such as sparse bundle adjustment. The library is written in OpenCV and can be given upon request.[man] [code]

up to date projects, work, and information can be found at my personal website