CNN图像分类技巧集锦(论文笔记)
Bag of Tricks for Image Classification with Convolutional Neural Networks
《Bag of Tricks for Image Classification with Convolutional Neural Networks》阅读笔记。
论文根据近几年来的关于训练过程优化的大量的优秀论文,总结整理了一大袋用于提升网络模型性能的技巧,这些技巧包含了数据增广、大批量训练、低精度训练、模型调整、学习率调整策略等等。
笔记按照论文的行文大纲,整理一下重点。
行文大纲
- 训练基本过程(baseline training procedure ):包括训练集和测试集的数据增广、参数初始化、优化方法、学习率调整策略。
- 高效训练(efficient training):包括大批量训练、学习率 warm up策略、低精度训练等。
- 模型调整(model tweak):比较三种ResNet变体的优劣。
- 训练细化(training refinement):包括Cosine Learning rate decaying、Label smoothing、knowledge distillation、mixup training。
- 图像分类模型与迁移学习:讲了图像分类模型的准确率与迁移至其他领域后的准确的相关性。
Training Procedure
选取ResNet作为baseline model. ResNet实现细节可参考Training and investigating Residual Nets。
预处理(数据增广)
训练和测试的预处理不同,主要为测试时不进行数据增广,区别如下:
训练时的预处理
- Randomly sample an image and decode it into 32-bit floating point raw pixel values in [0, 255].
- Randomly crop a rectangular region whose aspect ratio is randomly sampled in [3/4, 4/3] and area randomly sampled in [8%, 100%], then resize the cropped region into a 224-by-224 square image.
- Flip horizontally with 0.5 probability.
- Scale hue, saturation, and brightness with coefficients uniformly drawn from [0.6, 1.4].
- Add PCA noise with a coefficient sampled from a normal distribution N (0, 0.1).
- Normalize RGB channels by subtracting 123.68, 116.779, 103.939 and dividing by 58.393, 57.12,
57.375, respectively.
测试时的预处理(validation and test)
- resize each image’s shorter edge to 256 pixels while keeping its aspect ratio.
- crop out the 224-by-224 region in the center
- normalize RGB channels similar to training.
Xavier Initialization
使用Xavier 算法进行卷积层和全连接层的参数初始化。
Xavier初始化的paper:Understanding the difficulty of training deep feedforward neural networks
具体来说,就是公式
$$
W \sim U[-\frac{\sqrt6}{n_j + n_{j+1}},\frac{\sqrt6}{n_j + n_{j+1}}]
$$
每一层的参数都从上式范围内随机均匀抽取,其中 $n_j$ 是 第$j$层的输入通道数。