Synchronized BatchNorm¶
The current BN is implementated insynchronized accross the gpus, which is a big problem for memory consuming tasks such as Semantic Segmenation, since the minibatch is very small.
To synchronize the batchnorm accross multiple gpus is not easy to implment within the current Dataparallel framework. We address this difficulty by making each layer ‘selfparallel’ encoding.parallel.SelfDataParallel
, that is accepting the inputs from multigpus. Therefore, we can handle the synchronizing across gpus.
Modules¶
BatchNorm1d¶

class
encoding.nn.
BatchNorm1d
(num_features, eps=1e05, momentum=0.1, affine=True)[source]¶ Synchronized Batch Normalization 1d
Implementation ideas. Please use compatible
encoding.parallel.SelfDataParallel
andencoding.nn
 Reference::
 We provide this code for a comming paper.
Applies Batch Normalization over a 2d or 3d input that is seen as a minibatch.
\[y = \frac{x  \mu[x]}{ \sqrt{var[x] + \epsilon}} * \gamma + \beta\]The mean and standarddeviation are calculated perdimension over the minibatches and gamma and beta are learnable parameter vectors of size C (where C is the input size).
During training, this layer keeps a running estimate of its computed mean and variance. The running sum is kept with a default momentum of 0.1.
During evaluation, this running mean/variance is used for normalization.
Parameters:  num_features – num_features from an expected input of size batch_size x num_features [x width]
 eps – a value added to the denominator for numerical stability. Default: 1e5
 momentum – the value used for the running_mean and running_var computation. Default: 0.1
 affine – a boolean value that when set to true, gives the layer learnable affine parameters. Default: True
 Shape:
 Input: \((N, C)\) or \((N, C, L)\)
 Output: \((N, C)\) or \((N, C, L)\) (same shape as input)
Examples
>>> m = encoding.nn.BatchNorm1d(100).cuda() >>> input = autograd.Variable(torch.randn(20, 100)).cuda() >>> output = m(input)
BatchNorm2d¶

class
encoding.nn.
BatchNorm2d
(num_features, eps=1e05, momentum=0.1, affine=True)[source]¶ Synchronized Batch Normalization 2d
Implementation ideas. Please use compatible
encoding.parallel.SelfDataParallel
andencoding.nn
. Reference::
 We provide this code for a comming paper.
Applies Batch Normalization over a 4d input that is seen as a minibatch of 3d inputs
\[y = \frac{x  \mu[x]}{ \sqrt{var[x] + \epsilon}} * \gamma + \beta\]The mean and standarddeviation are calculated perdimension over the minibatches and gamma and beta are learnable parameter vectors of size C (where C is the input size).
During training, this layer keeps a running estimate of its computed mean and variance. The running sum is kept with a default momentum of 0.1.
During evaluation, this running mean/variance is used for normalization.
Parameters:  num_features – num_features from an expected input of size batch_size x num_features x height x width
 eps – a value added to the denominator for numerical stability. Default: 1e5
 momentum – the value used for the running_mean and running_var computation. Default: 0.1
 affine – a boolean value that when set to true, gives the layer learnable affine parameters. Default: True
 Shape:
 Input: \((N, C, H, W)\)
 Output: \((N, C, H, W)\) (same shape as input)
Examples
>>> m = encoding.nn.BatchNorm2d(100).cuda() >>> input = autograd.Variable(torch.randn(20, 100, 35, 45)).cuda() >>> output = m(input)
Functions¶
batchnormtrain¶

encoding.functions.
batchnormtrain
(input, gamma, beta, mean, std)[source]¶ Applies Batch Normalization over a 3d input that is seen as a minibatch.
\[y = \frac{x  \mu[x]}{ \sqrt{var[x] + \epsilon}} * \gamma + \beta\] Shape:
 Input: \((N, C)\) or \((N, C, L)\)
 Output: \((N, C)\) or \((N, C, L)\) (same shape as input)
batchnormeval¶

encoding.functions.
batchnormeval
(input, gamma, beta, mean, std)[source]¶ Applies Batch Normalization over a 3d input that is seen as a minibatch.
Please see encoding.batchnormtrain