# My NN Layers¶

## Modules¶

### Encoding¶

class encoding.nn.Encoding(D, K)[source]

Encoding Layer: a learnable residual encoder over 3d or 4d input that is seen as a mini-batch.

$e_{ik} = \frac{exp(-s_k\|x_{i}-c_k\|^2)}{\sum_{j=1}^K exp(-s_j\|x_{i}-c_j\|^2)} (x_i - c_k)$

Please see the example of training Deep TEN.

Reference:
Hang Zhang, Jia Xue, and Kristin Dana. “Deep TEN: Texture Encoding Network.” The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017
Parameters: D – dimention of the features or feature channels K – number of codeswords
Shape:
• Input: $$X\in\mathcal{R}^{B\times N\times D}$$ or $$\mathcal{R}^{B\times D\times H\times W}$$ (where $$B$$ is batch, $$N$$ is total number of features or $$H\times W$$.)
• Output: $$E\in\mathcal{R}^{B\times K\times D}$$
Variables: codewords (Tensor) – the learnable codewords of shape ($$K\times D$$) scale (Tensor) – the learnable scale factor of visual centers

Examples

>>> import encoding
>>> import torch
>>> import torch.nn.functional as F
>>> B,C,H,W,K = 2,3,4,5,6
>>> layer = encoding.Encoding(C,K).double().cuda()
>>> E = layer(X)


### Inspiration¶

class encoding.nn.Inspiration(C, B=1)[source]

Inspiration Layer (CoMatch Layer) enables the multi-style transfer in feed-forward network, which learns to match the target feature statistics during the training. This module is differentialble and can be inserted in standard feed-forward network to be learned directly from the loss function without additional supervision.

$Y = \phi^{-1}[\phi(\mathcal{F}^T)W\mathcal{G}]$

Please see the example of MSG-Net training multi-style generative network for real-time transfer.

Reference:
Hang Zhang and Kristin Dana. “Multi-style Generative Network for Real-time Transfer.” arXiv preprint arXiv:1703.06953 (2017)

### UpsampleConv2d¶

class encoding.nn.UpsampleConv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, scale_factor=1, bias=True)[source]

To avoid the checkerboard artifacts of standard Fractionally-strided Convolution, we adapt an integer stride convolution but producing a $$2\times 2$$ outputs for each convolutional window.

Reference:
Hang Zhang and Kristin Dana. “Multi-style Generative Network for Real-time Transfer.” arXiv preprint arXiv:1703.06953 (2017)
Parameters: in_channels (int) – Number of channels in the input image out_channels (int) – Number of channels produced by the convolution kernel_size (int or tuple) – Size of the convolving kernel stride (int or tuple, optional) – Stride of the convolution. Default: 1 padding (int or tuple, optional) – Zero-padding added to both sides of the input. Default: 0 output_padding (int or tuple, optional) – Zero-padding added to one side of the output. Default: 0 groups (int, optional) – Number of blocked connections from input channels to output channels. Default: 1 bias (bool, optional) – If True, adds a learnable bias to the output. Default: True dilation (int or tuple, optional) – Spacing between kernel elements. Default: 1 scale_factor (int) – scaling factor for upsampling convolution. Default: 1
Shape:
• Input: $$(N, C_{in}, H_{in}, W_{in})$$
• Output: $$(N, C_{out}, H_{out}, W_{out})$$ where $$H_{out} = scale * (H_{in} - 1) * stride[0] - 2 * padding[0] + kernel\_size[0] + output\_padding[0]$$ $$W_{out} = scale * (W_{in} - 1) * stride[1] - 2 * padding[1] + kernel\_size[1] + output\_padding[1]$$
Variables: weight (Tensor) – the learnable weights of the module of shape (in_channels, scale * scale * out_channels, kernel_size[0], kernel_size[1]) bias (Tensor) – the learnable bias of the module of shape (scale * scale * out_channels)

Examples

>>> # With square kernels and equal stride
>>> m = nn.UpsampleCov2d(16, 33, 3, stride=2)
>>> # non-square kernels and unequal stride and with padding
>>> m = nn.UpsampleCov2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2))
>>> input = autograd.Variable(torch.randn(20, 16, 50, 100))
>>> output = m(input)
>>> # exact output size can be also specified as an argument
>>> input = autograd.Variable(torch.randn(1, 16, 12, 12))
>>> downsample = nn.Conv2d(16, 16, 3, stride=2, padding=1)
>>> upsample = nn.UpsampleCov2d(16, 16, 3, stride=2, padding=1)
>>> h = downsample(input)
>>> h.size()
torch.Size([1, 16, 6, 6])
>>> output = upsample(h, output_size=input.size())
>>> output.size()
torch.Size([1, 16, 12, 12])


### DilatedAvgPool2d¶

class encoding.nn.DilatedAvgPool2d(kernel_size, stride=None, padding=0, dilation=1)[source]

We provide Dilated Average Pooling for the dilation of Densenet as in encoding.dilated.DenseNet.

Reference:

Hang Zhang, Kristin Dana, Jianping Shi, Zhongyue Zhang, Xiaogang Wang, Ambrish Tyagi, Amit Agrawal. “Context Encoding for Semantic Segmentation. CVPR 2018

Applies a 2D average pooling over an input signal composed of several input planes.

In the simplest case, the output value of the layer with input size $$(N, C, H, W)$$, output $$(B, C, H_{out}, W_{out})$$, kernel_size $$(k_H,k_W)$$, stride $$(s_H,s_W)$$ dilation $$(d_H,d_W)$$ can be precisely described as:

$\begin{array}{ll} out(b, c, h, w) = 1 / (k_H \cdot k_W) \cdot \sum_{{m}=0}^{k_H-1} \sum_{{n}=0}^{k_W-1} input(b, c, s_H \cdot h + d_H \cdot m, s_W \cdot w + d_W \cdot n) \end{array}$
If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points
The parameters kernel_size, stride, padding, dilation can either be:
• a single int – in which case the same value is used for the height and width dimension
• a tuple of two ints – in which case, the first int is used for the height dimension, and the second int for the width dimension
Parameters: kernel_size – the size of the window stride – the stride of the window. Default value is kernel_size padding – implicit zero padding to be added on both sides dilation – the dilation parameter similar to Conv2d
Shape:
• Input: $$(B, C, H_{in}, W_{in})$$
• Output: $$(B, C, H_{out}, W_{out})$$ where $$H_{out} = floor((H_{in} + 2 * padding[0] - kernel\_size[0]) / stride[0] + 1)$$ $$W_{out} = floor((W_{in} + 2 * padding[1] - kernel\_size[1]) / stride[1] + 1)$$ For stride=1, the output featuremap preserves the same size as input.

Examples:

>>> # pool of square window of size=3, stride=2, dilation=2
>>> m = nn.DilatedAvgPool2d(3, stride=2, dilation=2)
>>> input = autograd.Variable(torch.randn(20, 16, 50, 32))
>>> output = m(input)


## Functions¶

### aggregate¶

encoding.functions.aggregate(A, X, C)[source]

Aggregate operation, aggregate the residuals of inputs ($$X$$) with repect to the codewords ($$C$$) with assignment weights ($$A$$).

$e_{k} = \sum_{i=1}^{N} a_{ik} (x_i - d_k)$
Shape:
• Input: $$A\in\mathcal{R}^{B\times N\times K}$$ $$X\in\mathcal{R}^{B\times N\times D}$$ $$C\in\mathcal{R}^{K\times D}$$ (where $$B$$ is batch, $$N$$ is total number of features, $$K$$ is number is codewords, $$D$$ is feature dimensions.)
• Output: $$E\in\mathcal{R}^{B\times K\times D}$$

Examples

>>> B,N,K,D = 2,3,4,5
>>> func = encoding.aggregate()
>>> E = func(A, X, C)


### dilatedavgpool2d¶

encoding.functions.dilatedavgpool2d(input, kernel_size, stride=None, padding=0, dilation=1)[source]

Dilated Average Pool 2d, for dilation of DenseNet.

Reference:

Hang Zhang, Kristin Dana, Jianping Shi, Zhongyue Zhang, Xiaogang Wang, Ambrish Tyagi, Amit Agrawal. “Context Encoding for Semantic Segmentation. CVPR 2018

Applies 2D average-pooling operation in kh x kw regions by step size dh x dw steps. The number of output features is equal to the number of input planes.

See DilatedAvgPool2d for details and output shape.

Parameters: input – input tensor (minibatch x in_channels x iH x iW) kernel_size – size of the pooling region, a single number or a tuple (kh x kw) stride – stride of the pooling operation, a single number or a tuple (sh x sw). Default is equal to kernel size padding – implicit zero padding on the input, a single number or a tuple (padh x padw), Default: 0 dilation – the dilation parameter similar to Conv2d