encoding.functions

Encoding Autograd Fuctions

batchnormtrain

encoding.functions.batchnormtrain(input, gamma, beta, mean, std)[source]

Applies Batch Normalization over a 3d input that is seen as a mini-batch.

\[y = \frac{x - \mu[x]}{ \sqrt{var[x] + \epsilon}} * \gamma + \beta\]
Shape:
  • Input: \((N, C)\) or \((N, C, L)\)
  • Output: \((N, C)\) or \((N, C, L)\) (same shape as input)

batchnormeval

encoding.functions.batchnormeval(input, gamma, beta, mean, std)[source]

Applies Batch Normalization over a 3d input that is seen as a mini-batch.

Please see encoding.batchnormtrain

dilatedavgpool2d

encoding.functions.dilatedavgpool2d(input, kernel_size, stride=None, padding=0, dilation=1)[source]

Dilated Average Pool 2d, for dilation of DenseNet.

Reference:

Hang Zhang, Kristin Dana, Jianping Shi, Zhongyue Zhang, Xiaogang Wang, Ambrish Tyagi, Amit Agrawal. “Context Encoding for Semantic Segmentation. CVPR 2018

Applies 2D average-pooling operation in kh x kw regions by step size dh x dw steps. The number of output features is equal to the number of input planes.

See DilatedAvgPool2d for details and output shape.

Parameters:
  • input – input tensor (minibatch x in_channels x iH x iW)
  • kernel_size – size of the pooling region, a single number or a tuple (kh x kw)
  • stride – stride of the pooling operation, a single number or a tuple (sh x sw). Default is equal to kernel size
  • padding – implicit zero padding on the input, a single number or a tuple (padh x padw), Default: 0
  • dilation – the dilation parameter similar to Conv2d

aggregate

encoding.functions.aggregate(A, X, C)[source]

Aggregate operation, aggregate the residuals of inputs (\(X\)) with repect to the codewords (\(C\)) with assignment weights (\(A\)).

\[e_{k} = \sum_{i=1}^{N} a_{ik} (x_i - d_k)\]
Shape:
  • Input: \(A\in\mathcal{R}^{B\times N\times K}\) \(X\in\mathcal{R}^{B\times N\times D}\) \(C\in\mathcal{R}^{K\times D}\) (where \(B\) is batch, \(N\) is total number of features, \(K\) is number is codewords, \(D\) is feature dimensions.)
  • Output: \(E\in\mathcal{R}^{B\times K\times D}\)

Examples

>>> B,N,K,D = 2,3,4,5
>>> A = Variable(torch.cuda.DoubleTensor(B,N,K).uniform_(-0.5,0.5), requires_grad=True)
>>> X = Variable(torch.cuda.DoubleTensor(B,N,D).uniform_(-0.5,0.5), requires_grad=True)
>>> C = Variable(torch.cuda.DoubleTensor(K,D).uniform_(-0.5,0.5), requires_grad=True)
>>> func = encoding.aggregate()
>>> E = func(A, X, C)

scaledL2

encoding.functions.scaledL2(X, C, S)[source]

scaledL2 distance

\[sl_{ik} = s_k \|x_i-c_k\|^2\]
Shape:
  • Input: \(X\in\mathcal{R}^{B\times N\times D}\) \(C\in\mathcal{R}^{K\times D}\) \(S\in \mathcal{R}^K\) (where \(B\) is batch, \(N\) is total number of features, \(K\) is number is codewords, \(D\) is feature dimensions.)
  • Output: \(E\in\mathcal{R}^{B\times N\times K}\)

sum_square

encoding.functions.sum_square(input)[source]

Calculate sum of elements and sum of squares for Batch Normalization