pytorch关于卷积操作的初始化方式(kaiming_uniform_详解)
作者:两只蜡笔的小新
摘要:
最近写了一篇文章,reviewers给了几个意见,其中之一就是:不同配置下的网络初始化条件是否相同,是怎样初始化的?
之前竟然没有关注过这个问题,应该是torch默认情况下会初始化卷积核参数,这里详细讲解一下torch卷积操作的初始化过程。
1. pytorch中的卷积运算分类
在pycharm的IDE中,按住ctrl+鼠标点击torch.nn.Conv2d可以进入torch的内部卷积运算的源码(conv.py)
搭建网络经常使用到的模块
如下图所示:
class _ConvNd(Module): class Conv1d(_ConvNd): class Conv2d(_ConvNd): class Conv3d(_ConvNd): class _ConvTransposeNd(_ConvNd): class ConvTranspose1d(_ConvTransposeNd): class ConvTranspose2d(_ConvTransposeNd): class ConvTranspose3d(_ConvTransposeNd):
可以看到:常用的卷积的父类均是
class _ConvNd(Module):
并且点开 class Conv2d(_ConvNd): 并没有发现参数初始化的具体方法,
如下图所示
所以猜想卷积初始化参数的方法应该在父类 _ConvNd(Module):
2. pytorch中的卷积操作的父类
下面是父类 _ConvNd 的源码,其中初始化参数的 方法是
def reset_parameters(self) -> None:
class _ConvNd(Module): __constants__ = ['stride', 'padding', 'dilation', 'groups', 'padding_mode', 'output_padding', 'in_channels', 'out_channels', 'kernel_size'] __annotations__ = {'bias': Optional[torch.Tensor]} def _conv_forward(self, input: Tensor, weight: Tensor, bias: Optional[Tensor]) -> Tensor: ... _in_channels: int out_channels: int kernel_size: Tuple[int, ...] stride: Tuple[int, ...] padding: Tuple[int, ...] dilation: Tuple[int, ...] transposed: bool output_padding: Tuple[int, ...] groups: int padding_mode: str weight: Tensor bias: Optional[Tensor] def __init__(self, in_channels: int, out_channels: int, kernel_size: Tuple[int, ...], stride: Tuple[int, ...], padding: Tuple[int, ...], dilation: Tuple[int, ...], transposed: bool, output_padding: Tuple[int, ...], groups: int, bias: bool, padding_mode: str) -> None: super(_ConvNd, self).__init__() if in_channels % groups != 0: raise ValueError('in_channels must be divisible by groups') if out_channels % groups != 0: raise ValueError('out_channels must be divisible by groups') valid_padding_modes = {'zeros', 'reflect', 'replicate', 'circular'} if padding_mode not in valid_padding_modes: raise ValueError("padding_mode must be one of {}, but got padding_mode='{}'".format( valid_padding_modes, padding_mode)) self.in_channels = in_channels self.out_channels = out_channels self.kernel_size = kernel_size self.stride = stride self.padding = padding self.dilation = dilation self.transposed = transposed self.output_padding = output_padding self.groups = groups self.padding_mode = padding_mode # `_reversed_padding_repeated_twice` is the padding to be passed to # `F.pad` if needed (e.g., for non-zero padding types that are # implemented as two ops: padding + conv). `F.pad` accepts paddings in # reverse order than the dimension. self._reversed_padding_repeated_twice = _reverse_repeat_tuple(self.padding, 2) if transposed: self.weight = Parameter(torch.Tensor( in_channels, out_channels // groups, *kernel_size)) else: self.weight = Parameter(torch.Tensor( out_channels, in_channels // groups, *kernel_size)) if bias: self.bias = Parameter(torch.Tensor(out_channels)) else: self.register_parameter('bias', None) self.reset_parameters() def reset_parameters(self) -> None: init.kaiming_uniform_(self.weight, a=math.sqrt(5)) if self.bias is not None: fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight) bound = 1 / math.sqrt(fan_in) init.uniform_(self.bias, -bound, bound) def extra_repr(self): s = ('{in_channels}, {out_channels}, kernel_size={kernel_size}' ', stride={stride}') if self.padding != (0,) * len(self.padding): s += ', padding={padding}' if self.dilation != (1,) * len(self.dilation): s += ', dilation={dilation}' if self.output_padding != (0,) * len(self.output_padding): s += ', output_padding={output_padding}' if self.groups != 1: s += ', groups={groups}' if self.bias is None: s += ', bias=False' if self.padding_mode != 'zeros': s += ', padding_mode={padding_mode}' return s.format(**self.__dict__) def __setstate__(self, state): super(_ConvNd, self).__setstate__(state) if not hasattr(self, 'padding_mode'): self.padding_mode = 'zeros'
3. def reset_parameters(self) -> None
卷积操作的默认的初始化方式:
def reset_parameters(self) -> None: init.kaiming_uniform_(self.weight, a=math.sqrt(5)) if self.bias is not None: fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight) bound = 1 / math.sqrt(fan_in) init.uniform_(self.bias, -bound, bound)
该类中的参数的初始化方式是: Kaiming
初始化
由我国计算机视觉领域专家何凯明提出了针对于relu的初始化方法,pytorch默认使用kaiming正态分布初始化卷积层参数。
Fills the input Tensor with values according to the method described in Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification - He, K. et al. (2015),
using a uniform distribution.
The resulting tensor will have values sampled from U( − bound, bound) where bound = gain × √((3)/( fan_mode))Also known as He initialization.
3.1 卷积核部分的参数初始化:
init.kaiming_uniform_(self.weight, a=math.sqrt(5))
关于init.kaiming_uniform_这个函数,源码如下:
def kaiming_uniform_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu'): r"""Fills the input `Tensor` with values according to the method described in `Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification` - He, K. et al. (2015), using a uniform distribution. The resulting tensor will have values sampled from :math:`\mathcal{U}(-\text{bound}, \text{bound})` where .. math:: \text{bound} = \text{gain} \times \sqrt{\frac{3}{\text{fan\_mode}}} Also known as He initialization. Args: tensor: an n-dimensional `torch.Tensor` a: the negative slope of the rectifier used after this layer (only used with ``'leaky_relu'``) mode: either ``'fan_in'`` (default) or ``'fan_out'``. Choosing ``'fan_in'`` preserves the magnitude of the variance of the weights in the forward pass. Choosing ``'fan_out'`` preserves the magnitudes in the backwards pass. nonlinearity: the non-linear function (`nn.functional` name), recommended to use only with ``'relu'`` or ``'leaky_relu'`` (default). Examples: >>> w = torch.empty(3, 5) >>> nn.init.kaiming_uniform_(w, mode='fan_in', nonlinearity='relu') """ fan = _calculate_correct_fan(tensor, mode) gain = calculate_gain(nonlinearity, a) std = gain / math.sqrt(fan) bound = math.sqrt(3.0) * std # Calculate uniform bounds from standard deviation with torch.no_grad(): return tensor.uniform_(-bound, bound)
torch中卷积核默认的初始化的详细参数为:
init.kaiming_uniform_(self.weight, a=math.sqrt(5),mode='fan_in', nonlinearity='leaky_relu'))
关于 init.kaiming_uniform_中所使用的其他函数 ,如下不做进一步的分析,不过还是简单介绍一下。
_calculate_correct_fan(tensor, mode) # 用于计算计算当前网络层的fan_in(输入神经元个数)或 fan_out(输出神经元个数的),取决于 mode 的值 'fan_in' 'fan_out' calculate_gain:# 对于给定的非线性函数,返回推荐的增益值,其实就是一个数,从下面图中的列表中选出对应的值
- _calculate_correct_fan:在这里 model = fan_in, 计算 的是 当前网络层的fan_in(输入神经元个数)
- calculate_gain: 在这里 nonlinearity='leaky_relu',param = a = math.sqrt(5) 得到的值就是:(negative_slope = param = math.sqrt(5))
gan = math.sqrt(2.0 / (1 + negative_slope ** 2))
前文讲到,
The resulting tensor will have values sampled from U( − bound, bound) where bound = gain × √((3)/( fan_mode))
所以上面的一通计算得到了bound
下面的 uniform_(from=0, to=1) → Tensor, 将tensor用从均匀分布中抽样得到的值填充。
3.2 bias部分的初始化
这里不做详细介绍了,相信认真看了 weights部分的初始化过程,这部分自然会明白。
if self.bias is not None: fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight) bound = 1 / math.sqrt(fan_in) init.uniform_(self.bias, -bound, bound)
附加的:
init._calculate_fan_in_and_fan_out(self.weight)
函数来计算当前网络层的fan_in(输入神经元个数)和fan_out(输出神经元个数的)
总结
以上为个人经验,希望能给大家一个参考,也希望大家多多支持脚本之家。