Bag of Tricks for Image Classification with Convolutional Neural Networks

《Bag of Tricks for Image Classification with Convolutional Neural Networks》阅读笔记。

论文根据近几年来的关于训练过程优化的大量的优秀论文,总结整理了一大袋用于提升网络模型性能的技巧,这些技巧包含了数据增广、大批量训练、低精度训练、模型调整、学习率调整策略等等。

笔记按照论文的行文大纲,整理一下重点。

行文大纲

  • 训练基本过程(baseline training procedure ):包括训练集和测试集的数据增广、参数初始化、优化方法、学习率调整策略。
  • 高效训练(efficient training):包括大批量训练、学习率 warm up策略、低精度训练等。
  • 模型调整(model tweak):比较三种ResNet变体的优劣。
  • 训练细化(training refinement):包括Cosine Learning rate decaying、Label smoothing、knowledge distillation、mixup training。
  • 图像分类模型与迁移学习:讲了图像分类模型的准确率与迁移至其他领域后的准确的相关性。

Training Procedure

选取ResNet作为baseline model. ResNet实现细节可参考Training and investigating Residual Nets

baseline

预处理(数据增广)

训练和测试的预处理不同,主要为测试时不进行数据增广,区别如下:

训练时的预处理

  1. Randomly sample an image and decode it into 32-bit floating point raw pixel values in [0, 255].
  2. Randomly crop a rectangular region whose aspect ratio is randomly sampled in [3/4, 4/3] and area randomly sampled in [8%, 100%], then resize the cropped region into a 224-by-224 square image.
  3. Flip horizontally with 0.5 probability.
  4. Scale hue, saturation, and brightness with coefficients uniformly drawn from [0.6, 1.4].
  5. Add PCA noise with a coefficient sampled from a normal distribution N (0, 0.1).
  6. Normalize RGB channels by subtracting 123.68, 116.779, 103.939 and dividing by 58.393, 57.12,
    57.375, respectively.

测试时的预处理(validation and test)

  1. resize each image’s shorter edge to 256 pixels while keeping its aspect ratio.
  2. crop out the 224-by-224 region in the center
  3. normalize RGB channels similar to training.

Xavier Initialization

使用Xavier 算法进行卷积层和全连接层的参数初始化。

Xavier初始化的paper:Understanding the difficulty of training deep feedforward neural networks

具体来说,就是公式
$$
W \sim U[-\frac{\sqrt6}{n_j + n_{j+1}},\frac{\sqrt6}{n_j + n_{j+1}}]
$$
每一层的参数都从上式范围内随机均匀抽取,其中 $n_j$ 是 第$j$层的输入通道数。

未完待续