Synchronized BatchNorm¶
The current BN is implementated insynchronized accross the gpus, which is a big problem for memory consuming tasks such as Semantic Segmenation, since the minibatch is very small.
To synchronize the batchnorm accross multiple gpus is not easy to implment within the current Dataparallel framework. We address this difficulty by making each layer ‘selfparallel’ encoding.parallel.SelfDataParallel
, that is accepting the inputs from multigpus. Therefore, we can handle the synchronizing across gpus.
Note
This code is provided together with the paper
Hang Zhang, Kristin Dana, Jianping Shi, Zhongyue Zhang, Xiaogang Wang, Ambrish Tyagi, Amit Agrawal. “Context Encoding for Semantic Segmentation” The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2018:
@InProceedings{Zhang_2018_CVPR, author = {Zhang, Hang and Dana, Kristin and Shi, Jianping and Zhang, Zhongyue and Wang, Xiaogang and Tyagi, Ambrish and Agrawal, Amit}, title = {Context Encoding for Semantic Segmentation}, booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2018} }
Modules¶
BatchNorm1d¶

class
encoding.nn.
BatchNorm1d
(num_features, eps=1e05, momentum=0.1, affine=True)[source]¶ Synchronized Batch Normalization 1d
Implementation ideas. Please use compatible
encoding.parallel.SelfDataParallel
andencoding.nn
Applies Batch Normalization over a 2d or 3d input that is seen as a minibatch.
\[y = \frac{x  \mu[x]}{ \sqrt{var[x] + \epsilon}} * \gamma + \beta\]The mean and standarddeviation are calculated perdimension over the minibatches and gamma and beta are learnable parameter vectors of size C (where C is the input size).
During training, this layer keeps a running estimate of its computed mean and variance. The running sum is kept with a default momentum of 0.1.
During evaluation, this running mean/variance is used for normalization.
Parameters:  num_features – num_features from an expected input of size batch_size x num_features [x width]
 eps – a value added to the denominator for numerical stability. Default: 1e5
 momentum – the value used for the running_mean and running_var computation. Default: 0.1
 affine – a boolean value that when set to true, gives the layer learnable affine parameters. Default: True
 Shape:
 Input: \((N, C)\) or \((N, C, L)\)
 Output: \((N, C)\) or \((N, C, L)\) (same shape as input)
 Reference:
 Hang Zhang, Kristin Dana, Jianping Shi, Zhongyue Zhang, Xiaogang Wang, Ambrish Tyagi, Amit Agrawal. “Context Encoding for Semantic Segmentation. CVPR 2018
Examples
>>> m = encoding.nn.BatchNorm1d(100).cuda() >>> input = autograd.Variable(torch.randn(20, 100)).cuda() >>> output = m(input)
BatchNorm2d¶

class
encoding.nn.
BatchNorm2d
(num_features, eps=1e05, momentum=0.1, affine=True)[source]¶ Synchronized Batch Normalization 2d
Implementation ideas. Please use compatible
encoding.parallel.SelfDataParallel
andencoding.nn
.Applies Batch Normalization over a 4d input that is seen as a minibatch of 3d inputs
\[y = \frac{x  \mu[x]}{ \sqrt{var[x] + \epsilon}} * \gamma + \beta\]The mean and standarddeviation are calculated perdimension over the minibatches and gamma and beta are learnable parameter vectors of size C (where C is the input size).
During training, this layer keeps a running estimate of its computed mean and variance. The running sum is kept with a default momentum of 0.1.
During evaluation, this running mean/variance is used for normalization.
Parameters:  num_features – num_features from an expected input of size batch_size x num_features x height x width
 eps – a value added to the denominator for numerical stability. Default: 1e5
 momentum – the value used for the running_mean and running_var computation. Default: 0.1
 affine – a boolean value that when set to true, gives the layer learnable affine parameters. Default: True
 Shape:
 Input: \((N, C, H, W)\)
 Output: \((N, C, H, W)\) (same shape as input)
 Reference:
 Hang Zhang, Kristin Dana, Jianping Shi, Zhongyue Zhang, Xiaogang Wang, Ambrish Tyagi, Amit Agrawal. “Context Encoding for Semantic Segmentation. CVPR 2018
Examples
>>> m = encoding.nn.BatchNorm2d(100).cuda() >>> input = autograd.Variable(torch.randn(20, 100, 35, 45)).cuda() >>> output = m(input)
Functions¶
batchnormtrain¶

encoding.functions.
batchnormtrain
(input, gamma, beta, mean, std)[source]¶ Applies Batch Normalization over a 3d input that is seen as a minibatch.
\[y = \frac{x  \mu[x]}{ \sqrt{var[x] + \epsilon}} * \gamma + \beta\] Shape:
 Input: \((N, C)\) or \((N, C, L)\)
 Output: \((N, C)\) or \((N, C, L)\) (same shape as input)
batchnormeval¶

encoding.functions.
batchnormeval
(input, gamma, beta, mean, std)[source]¶ Applies Batch Normalization over a 3d input that is seen as a minibatch.
Please see encoding.batchnormtrain