# Homework 1

This notebook includes both coding and written questions. Please hand in this notebook file with all the outputs and your answers to the written questions.

## Part 1: Convolutions

### 1.1 Commutative Property (10 points)

Recall that the convolution of an image $f:\mathbb{R}^2\rightarrow \mathbb{R}$ and a kernel $h:\mathbb{R}^2\rightarrow\mathbb{R}$ is defined as follows:
$$(f*h)[m,n]=\sum_{i=-\infty}^\infty\sum_{j=-\infty}^\infty f[i,j]\cdot h[m-i,n-j]$$

Or equivalently,

\begin{align} (f*h)[m,n] &= \sum_{i=-\infty}^\infty\sum_{j=-\infty}^\infty h[i,j]\cdot f[m-i,n-j]\\ &= (h*f)[m,n] \end{align}

Show that this is true (i.e. prove that the convolution operator is commutative: $fh = hf$).

\begin{align} (f*h)[m,n]&=\sum_{i=-\infty}^\infty\sum_{j=-\infty}^\infty f[i,j]\cdot h[m-i,n-j] \\ &=\sum_{x=+\infty}^{-\infty}\sum_{y=+\infty}^{-\infty} f[m-x,n-y]\cdot h[x,y] \\ &=\sum_{x=+\infty}^{-\infty}\sum_{y=+\infty}^{-\infty} h[x,y]\cdot f[m-x,n-y] \\ &=\sum_{x=-\infty}^{\infty}\sum_{y=-\infty}^{\infty} h[x,y]\cdot f[m-x,n-y] \\ &=(h*f)[m,n] \end{align}

### 1.2 Linear and Shift Invariance (10 points)

Let $f$ be a function $\mathbb{R}^2\rightarrow\mathbb{R}$. Consider a system $f\xrightarrow{s}g$, where $g=(f*h)$ with some kernel $h:\mathbb{R}^2\rightarrow\mathbb{R}$. Show that $S$ defined by any kernel $h$ is a Linear Shift Invariant (LSI) system. In other words, for any $h$, show that $S$ satisfies both of the following:

• $S[a\cdot{f_1}+b\cdot{f_2}]= a\cdot{S[f_1]}+b\cdot{S[f_2]}$
• If $f[m,n]\xrightarrow{s}g[m,n]$ then $f[m-m_0,n-n_0]\xrightarrow{s}g[m-m_0,n-n_0]$

\begin{align} S[a\cdot{f_1}+b\cdot{f_2}] &= [a\cdot{f_1}+b\cdot{f_2}]*h[m,n]\\ &=\sum_{i=-\infty}^\infty\sum_{j=-\infty}^\infty (a\cdot f_1[i,j]+b \cdot f_2[i,j])\cdot h[m-i,n-j] \\ &=\sum_{i=-\infty}^\infty\sum_{j=-\infty}^\infty (a \cdot f_1[i,j] \cdot h[m-i,n-j] + b \cdot f_2[i,j] \cdot h[m-i,n-j]) \\ &= a \cdot \sum_{i=-\infty}^\infty\sum_{j=-\infty}^\infty f_1[i,j] \cdot h[m-i,n-j] + b \cdot \sum_{i=-\infty}^\infty\sum_{j=-\infty}^\infty f_2[i,j] \cdot h[m-i,n-j] \\ &=a \cdot (f_1*h) + b \cdot (f_2*h) \\ &= a \cdot S[f_1] + b\cdot S[f_2] \end{align}

### 1.3 Implementation (30 points)

In this section, you will implement two versions of convolution:

• conv_nested
• conv_fast

First, run the code cell below to load the image to work with.

Now, implement the function conv_nested in filters.py. This is a naive implementation of convolution which uses 4 nested for-loops. It takes an image $f$ and a kernel $h$ as inputs and outputs the convolved image $(f*h)$ that has the same shape as the input image. This implementation should take a few seconds to run.

• Hint: It may be easier to implement $(h*f)$

We’ll first test your conv_nested function on a simple input.

• 按照卷积函数的公式定义来进行计算。
$$(f*h)[m,n]=\sum_{i=-\infty}^\infty\sum_{j=-\infty}^\infty f[i,j]\cdot h[m-i,n-j]$$

• 按照卷积计算的可视化过程进行计算，即翻转内核，移动内核，累加迭代的过程。

conv_nested的实现在filters.py中。

Now let’s test your conv_nested function on a real image.

Let us implement a more efficient version of convolution using array operations in numpy. As shown in the lecture, a convolution can be considered as a sliding window that computes sum of the pixel values weighted by the flipped kernel. The faster version will i) zero-pad an image, ii) flip the kernel horizontally and vertically, and iii) compute weighted sum of the neighborhood at each pixel.

First, implement the function zero_pad in filters.py.

1. 对图像边缘进行零填充
2. 翻转卷积核
3. 移动卷积核计算每个像素位置的卷积结果

1. 根据填充边缘大小生成一个大背景
2. 将原图拷贝到背景中心

Next, complete the function conv_fast in filters.py using zero_pad. Run the code below to compare the outputs by the two implementations. conv_fast should run significantly faster than conv_nested.
Depending on your implementation and computer, conv_nested should take a few seconds and conv_fast should be around 5 times faster.

conv_nested: took 5.063853 seconds.
conv_fast: took 1.080498 seconds.

### Extra Credit 1 (1% of final grade)

Devise a faster version of convolution and implement conv_faster in filters.py. You will earn extra credit only if the conv_faster runs faster (by a fair margin) than conv_fast and outputs the same result.

1. 将卷积核进行向量化
2. 将迭代过程中的计算块（计算中心及其领域）进行向量化，由于运算的对象都是核，此处可以将向量化的计算块拼成矩阵
3. 两者进行内积计算，并reshape成原有图像大小

• image: $(H_i,W_i)$
• kernel: $(H_k,W_k)$
• kernel_vectorized: $(H_k\cdot W_k,1)$
• patch_vectorized: $(1, H_k\cdot W_k)$
• image_vectorized2Matrix: $(H_i\cdot W_i, H_k\cdot W_k)$
• image_vectorized2Matrix * kernel_vectorized: $(H_i\cdot W_i,1)$
• reshaped: $(H_i,W_i)$
conv_fast: took 1.084435 seconds.
conv_faster: took 0.310908 seconds.

## Part 2: Cross-correlation

Cross-correlation of two 2D signals $f$ and $g$ is defined as follows:
$$(f\star{g})[m,n]=\sum_{i=-\infty}^\infty\sum_{j=-\infty}^\infty f[i,j]\cdot g[i-m,j-n]$$

### 使用互相关性实现模板匹配

Suppose that you are a clerk at a grocery store. One of your responsibilites is to check the shelves periodically and stock them up whenever there are sold-out items. You got tired of this laborious task and decided to build a computer vision system that keeps track of the items on the shelf.

Luckily, you have learned in CS131 that cross-correlation can be used for template matching: a template $g$ is multiplied with regions of a larger image $f$ to measure how similar each region is to the template.

The template of a product (template.jpg) and the image of shelf (shelf.jpg) is provided. We will use cross-correlation to find the product in the shelf.

Implement cross_correlation function in filters.py and run the code below.

- Hint: you may use the conv_fast function you implemented in the previous question.

note 可以使用之前的卷积函数来进行计算，但要注意卷积里面对卷积核进行翻转，而计算互相关时，匹配模板如同滤波一样，并不需要翻转匹配模板。

#### Interpretation

How does the output of cross-correlation filter look like? Was it able to detect the product correctly? Explain what might be the problem with using raw template as a filter.

• 一个可能是图像和模板的亮度不同导致了匹配的出错。
• 还有一个问题就是，一般的滤波核每个元素和加起来为一的，这个匹配图像作为内核去做计算的，肯定计算结果爆炸了呀，都是趋于255了？能量有种不协调的感觉

### 2.2 Zero-mean cross-correlation (6 points)

A solution to this problem is to subtract off the mean value of the template so that it has zero mean.

Implement zero_mean_cross_correlation function in filters.py and run the code below.

You can also determine whether the product is present with appropriate scaling and thresholding.

The product is on the shelf

The product is not on the shelf

### 2.3 Normalized Cross-correlation (12 points)

One day the light near the shelf goes out and the product tracker starts to malfunction. The zero_mean_cross_correlation is not robust to change in lighting condition. The code below demonstrates this.

A solution is to normalize the pixels of the image and template at every step before comparing them. This is called normalized cross-correlation.

The mathematical definition for normalized cross-correlation of $f$ and template $g$ is:
$$(f\star{g})[m,n]=\sum_{i,j} \frac{f[i,j]-\overline{f_{m,n}}}{\sigma_{f_{m,n}}} \cdot \frac{g[i-m,j-n]-\overline{g}}{\sigma_g}$$

where:

• $f_{m,n}$ is the patch image at position $(m,n)$
• $\overline{f_{m,n}}$ is the mean of the patch image $f_{m,n}$
• $\sigma_{f_{m,n}}$ is the standard deviation of the patch image $f_{m,n}$
• $\overline{g}$ is the mean of the template $g$
• $\sigma_g$ is the standard deviation of the template $g$

Implement normalized_cross_correlation function in filters.py and run the code below.

## Part 3: Separable Filters

### 3.1 Theory (10 points)

Consider a $M_1\times{N_1}$ image $I$ and a $M_2\times{N_2}$ filter $F$. A filter $F$ is separable if it can be written as a product of two 1D filters: $F=F_1F_2$.

For example,
$$F= \begin{bmatrix} 1 & -1 \ 1 & -1 \end{bmatrix}$$
can be written as a matrix product of
$$F_1= \begin{bmatrix} 1 \ 1 \end{bmatrix}, F_2= \begin{bmatrix} 1 & -1 \end{bmatrix}$$
Therefore $F$ is a separable filter.

Prove that for any separable filter $F=F_1F_2$,
$I*F=(I*F_1)*F_2$

\begin{align} (I*F)[M_2,N_2] &= \sum_{i=-\infty}^\infty \sum_{j=-\infty}^\infty I[i,j]\cdot F[M_2-i,N_2-j] \tag{1}\\ &= \sum_{i=-\infty}^\infty\sum_{j=-\infty}^\infty I[i,j]\cdot F_1[M_2-i]\cdot F_2[N_2-j]\\ &= \sum_{j=-\infty}^\infty F_2[N_2-j] \cdot (\sum_{i=-\infty}^\infty I[i,j]\cdot F_1[M_2-i]) \tag{2}\\ &= \sum_{j=-\infty}^\infty F_2[N_2-j]\cdot(I(j)*F_1)\\ &= F_2 * (I * F_1) \\ &= (I * F_1) * F_2 \end{align}

### 3.2 Complexity comparison (10 points)

(i) How many multiplications do you need to do a direct 2D convolution (i.e. $I*F$?)

(ii) How many multiplications do you need to do 1D convolutions on rows and columns (i.e. $(I*F_1)*F_2$)

(iii) Use Big-O notation to argue which one is more efficient in general: direct 2D convolution or two successive 1D convolutions?

1. $I*F$乘法操作的次数为 $M_1 \cdot N_1$
2. $(I*F_1)*F_2$乘法操作的次数为 $M_1+ N_1$
3. 我不会Big-O啊，明显分解的效率高哇。

Now, we will empirically compare the running time of a separable 2D convolution and its equivalent two 1D convolutions. Gaussian kernel, widely used for blurring images, is one example of a separable filter. Run the code below to see its effect.

In the below code cell, define the two 1D arrays (k1 and k2) whose product is equal to the Gaussian kernel.

We now apply the two versions of convolution to the same image, and compare their running time. Note that the outputs of the two convolutions must be the same.

Normal convolution: took 13.643008 seconds.
Separable convolution: took 5.948668 seconds.