CS180 - Project 4b - Auto Stitching Photo Mosaics

PART A: Image Warping and Mosaicing

Recover Homographies

In order to do alignment we must first be able to warp the images to each other so we can overlay them. To do this we first need to compute matrix transformation that will warp the images, in this case we need 8 degrees of freedom and thus we use a $3\times 3$ matrix $H$ known as a homography. This has the following set up where each $(x,y)$ point in one image has the corresponding point $(x', y')$ in the other image:

p'=Hp\\\left[\begin{array}{c}w x^{\prime} \\ w y^{\prime} \\ w\end{array}\right]=\left[\begin{array}{lll}a & b & c \\ d & e & f \\ g & h & i\end{array}\right]\left[\begin{array}{l}x \\ y \\ 1\end{array}\right]

But here $i$ is our scaling factor so we will set it equal to $1$ .

So when we expand out the multiplication and divide by $w$

(gx + hy + 1)x' = ax +by + c \\\text{and}\\ (gx + hy + 1)y' = dx +ey + f

Given that we have multiple points we can combine this into a system of $2n$ equations and $8$ unknowns (not $9$ because of $i=1$ ). Leaving us with the following

\left[\begin{array}{c}x_1^{\prime} \\y_1^{\prime} \\\vdots \\x_n^{\prime} \\y_n^{\prime}\end{array}\right]=\left[\begin{array}{cccccccc}x_1 & y_1 & 1 & 0 & 0 & 0 & -x_1 x_1^{\prime} & -y_1 x_1^{\prime} \\0 & 0 & 0 & x_1 & y_1 & 1 & -x_1 y_1^{\prime} & -y_1 y_1^{\prime} \\\vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\x_n & y_n & 1 & 0 & 0 & 0 & -x_n x_n^{\prime} & -y_n x_n^{\prime} \\0 & 0 & 0 & x_n & y_n & 1 & -x_n y_n^{\prime} & -y_n y_n^{\prime}\end{array}\right]\left[\begin{array}{l}a \\b \\c \\d \\e \\f \\g \\h\end{array}\right]

To solve for the homography matrix, we can use SVD, since the least squares solution of $Ah = 0$ corresponds to the singular vector associated with the smallest singular value in $\Sigma$ .

Warp the Images

In order to warp the images, we use this homography that we computed $H$ , and then utilize inverse warping as we did in Project 3. To compute the pixel values I use bilinear interpolation. By computing $H$ and applying it essentailly we are taking the points from one image and translating it into the other’s coordinate system.

Image Rectification

For rectifying image I took pictures of images that I knew what their original shape, so that I could define corresponding points that matched the original shape. Then we can apply the warp function to warp to these new correspondence points so that we get the desired non skewed shape and can rectify what they look like as if we were looking at them head on. Here are two examples

Note that in the original images the green points are the points corresponding to the non skewed and ideal shape

Here we can now see what the computer looks like from above and see the apple logo more clearly.

Blend the Images into a Mosaic

To blend the images into a mosaic I took photos that overlapped and defined correspondence points between the two. Then I compute the homography matrix to warp the image and for smooth blend I make an alpha mask for both images. As you get further away from the overlap points the mask becomes less intense. This way I don’t get harsh seems. Next I normalize the resulting blend by the total alpha contribution at each pixel in order to prevent overlapping areas to have a higher intensity/brightness.

Example 1:

Example 2A:

I tried making another mosaic (different view point this time), but unfortunately the correspondence points were slightly off and not well spread out and so you get this blur at the bottom as seen below. I think this is more because of the lack of spread of the points. Especially because the shift in angle is more dramatic towards the bottom and thus the resulting mosiac looks blurring and the bottom.

Example 2B:

Now with a more wide spread set of correspondence points here is the resulting image that is much clearer.

Original A + Better Spread Correspondence Pts

Original B + Better Spread Correspondence Pts

Example 3 (more of the same skyline):

Resulting Mosaic Image (Golden Gate Bridge)

Example 4:

Part B: Feature Matching for Autostitching

For part A a large part of the project was manually choosing the correspondence points that would match up. In part B we will now automate the process of choosing correspondence points that we did in part A.

Detecting Corners

The first step in the project is to generally find the corners across all of the images as these are key points in the image. I used the provided harris.py file, which under the hood uses the following idea

We can mathematically write out the appearance change of window $W$ given a shift [ $u, v$ ] as:

I’ll demonstrate this using a pair of images provided by TA Ryan T.

Doing this corner detecting resulted in the following:

Adaptive Non-Maxmimal Suppression (ANMS)

As we can see, this results in much too many points and it would be hard to match all of these points. In fact, many of the points won’t have a good match simply because they don’t exist in the other picture. To do this I followed the ANMS algorithm detailed in the MOPS paper which specifically calculates $\displaystyle r_i=\min _j\left|\mathbf{x}_i-\mathbf{x}_j\right|, \text { s.t. } f\left(\mathbf{x}_i\right)<c_{\text {robust }} f\left(\mathbf{x}_j\right), \mathbf{x}_j \in \mathcal{I}$ for each point. Here $r_i$ is a score that represents locality, from here I chose the top 200 points with the highest $r_i$ . Leaving me with the following:

Feature Descriptor Extraction

In order to match features across the images we need to figure out the local information about each point and see how much it matches in any of the other points in the image we are mapping to. The dimensions suggested in the specs were to use an $40\times40$ patch for each feature point, which then gets downsampled into an $8\times8$ patch. Here are some examples of feature patches that I calculated

Matching Feature Descriptors

To match the feature descriptors I use a distance metric along with Lowe’s thresholding so that I only retain the best matches. This is done by looking at the ratio of each feature descriptor’s best match to the second best match. Certain images worked better with different distance metrics, but I for all of the examples below I used the Euclidean distance.

RANSAC

Still after all of this, we may have some poor matches, thus to ensure we can get the best possible homography we randomly sample a set of matching points and computes the corresponding homograph. By using epsilon thresholding we can figure out how many inliers the corresponding homography produces. After repeating this 2000 times we then choose the one with the highest number of inliers and use this as our best set of points and homography.

Results

With our correspondence points automatically detected, we can now use our previous methods detailed in part A in order to stitch everything together. These were the results

Example 1:

Example 2:

Example 3: note one of the examples from part A

One of the biggest learning points for me was from reading the paper was how to do the adaptive non-maximal suppression (anms) which allows you to select more interesting points by ensuring a better spatial distribution across the image. Both RANSAC and ANMS were pretty cool to learn about! These were the coolest things to learn about in this project!