CS131 Computer Vision: Foundations and Applications,Homework7,Object-Detection。 实现人脸检测。 使用Hog表征和滑窗法进行人脸检测, 使用图像金字塔对尺度问题进行改进, 使用DPM(可变形组件模型)进行人脸检测。
Homework 7
In this homework, we will implement a simplified version of object detection process. Note that the tests on the notebook are not comprehensive, autograder will contain more tests.
from __future__ import print_function import random import numpy as np import matplotlib.pyplot as plt import matplotlib.patches as patches from skimage import io from skimage.feature import hog from skimage import data, color, exposure from skimage.transform import rescale, resize, downscale_local_mean import glob, os import fnmatch import time import math
import warnings warnings.filterwarnings('ignore')
from detection import * from util import *
# This code is to make matplotlib figures appear inline in the # notebook rather than in a new window. %matplotlib inline plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots plt.rcParams['image.interpolation'] = 'nearest' plt.rcParams['image.cmap'] = 'gray'
# Some more magic so that the notebook will reload external python modules; # see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython %load_ext autoreload %autoreload 2 %reload_ext autoreload
Part 1: Hog Representation (10 points)
In this section, we will compute the average hog representation of human faces.
There are 31 aligned face images provided in the \face folder. They are all aligned and have the same size. We will get an average face from these images and compute a hog feature representation for the averaged face.
Use the hog function provided by skimage library, and implement a hog representation of objects. Implement hog_feature function in detection.py
plt.subplot(1,2,1) plt.imshow(avg_face) plt.axis('off') plt.title('average face image')
plt.subplot(1,2,2) plt.imshow(face_hog) plt.title('hog representation of face') plt.axis('off')
plt.show()
Part 2: Sliding Window (30 points)
Implement sliding_window function to have windows slide across an image with a specific window size. The window slides through the image and check if an object is detected with a high score at every location. These scores will generate a response map and you will be able to find the location of the window with the highest hog score.
Sliding window successfully found the human face in the above example. However, in the cell below, we are only changing the scale of the image, and you can see that sliding window does not work once the scale of the image is changed.
In order to make sliding window work for different scales of images, you need to implement image pyramids where you resize the image to different scales and run the sliding window method on each resized image. This way you scale the objects and can detect both small and large objects.
Implement pyramid function in detection.py, this will create pyramid of images at different scales. Run the following code, and you will see the shape of the original image gets smaller until it reaches a minimum size.
images = pyramid(image, scale = 0.9) sum_r = 0 sum_c = 0 for i,result inenumerate(images): (scale, image) = result if (i==0): sum_c = image.shape[1] sum_r+=image.shape[0]
composite_image = np.zeros((sum_r, sum_c))
pointer = 0 for i, result inenumerate(images): (scale, image) = result composite_image[pointer:pointer+image.shape[0], :image.shape[1]] = image pointer+= image.shape[0] plt.imshow(composite_image) plt.axis('off') plt.title('image pyramid') plt.show()
3.2 Pyramid Score (20 points)
After getting the image pyramid, we will run sliding window on all the images to find a place that gets the highest score. Implement pyramid_score function in detection.py. It will return the highest score and its related information in the image pyramids.
From the above example, we can see that image pyramid has fixed the problem of scaling. Then in the example below, we will try another image and implement deformable part model.
In order to solve the problem above, you will implement deformable parts model in this section, and apply it on human faces.
The first step is to get a detector for each part of the face, including left eye, right eye, nose and mouth.
For example for the left eye, we have provided the groundtruth location of left eyes for each image in the \face directory. This is stored in the lefteyes array with shape (n,2), each row is the (r,c) location of the center of left eye. You will then find the average hog representation of the left eyes in the images.
Implement compute_displacement to get an average shift vector mu and standard deviation sigma for each part of the face. The vector mu is the distance from the main center, i.e the center of the face, to the center of the part.
找到各个部位的位置
这里计算的是相对位置,即各个组件的中心与脸部中心的距离。
1 2 3 4 5 6 7
# test for compute_displacement test_array = np.array([[0,1],[1,2],[2,3],[3,4]]) test_shape = (6,6) mu, std = compute_displacement(test_array, test_shape) assert(np.all(mu == [1,0])) assert(np.sum(std-[ 1.11803399, 1.11803399])<1e-5) print("Your implementation is correct!")
After getting the shift vectors, we can run our detector on a test image. We will first run the following code to detect each part of left eye, right eye, nose and mouth in the image. You will see a response map for each of them.
After getting the response maps for each part of the face, we will shift these maps so that they all have the same center as the face. We have calculated the shift vector mu in compute_displacement, so we are shifting based on vector mu. Implement shift_heatmap function in detection.py.
In this part, apply gaussian filter convolution to each heatmap. Blur by kernel of standard deviation sigma, and then add the heatmaps of the parts with the heatmap of the face. On the combined heatmap, find the maximum value and its location. You can use function provided by skimage to implement gaussian_heatmap.
Does your DPM work on detecting human faces? Can you think of a case where DPM may work better than the detector we had in part 3 (sliding window + image pyramid)? You can also have examples that are not faces.
DPM的原理是很能理解的,就是将各个组件的响应值累加起来,判断哪个区域拥有最高的响应值。
上图的检测出现了问题,怀疑是脸部组件的检测不咋滴。
Your Answer: Write your answer in this markdown cell.
Extra Credit (1 point)
You have tried detecting one face from the image, and the next step is to extend it to detecting multiple occurences of the object. For example in the following image, how do you detect more than one face from your response map? Implement the function detect_multiple, and write code to visualize your detected faces in the cell below.