Data Parallel

  • Current PyTorch DataParallel Table is not supporting mutl-gpu loss calculation, which makes the gpu memory usage very in-efficient. We address this issue here by doing CriterionDataParallel.
  • encoding.parallel.SelfDataParallel is compatible with Synchronized Batch Normalization encoding.nn.BatchNorm2d.

ModelDataParallel

class encoding.parallel.ModelDataParallel(module, device_ids=None, output_device=None, dim=0)[source]

Implements data parallelism at the module level.

Reference::
We provide this code for a comming paper.

This container parallelizes the application of the given module by splitting the input across the specified devices by chunking in the batch dimension. In the forward pass, the module is replicated on each device, and each replica handles a portion of the input. During the backwards pass, gradients from each replica are summed into the original module. Note that the outputs are not gathered, please use compatible encoding.parallel.CriterionDataParallel.

The batch size should be larger than the number of GPUs used. It should also be an integer multiple of the number of GPUs so that each chunk is the same size (so that each GPU processes the same number of samples).

Parameters:
  • module – module to be parallelized
  • device_ids – CUDA devices (default: all devices)

Example:

>>> net = encoding.nn.ModelDataParallel(model, device_ids=[0, 1, 2])
>>> output = net(input_var)

CriterionDataParallel

class encoding.parallel.CriterionDataParallel(module, device_ids=None, output_device=None, dim=0)[source]

Calculate loss in multiple-GPUs, which balance the memory usage for Semantic Segmentation.

Reference::
We provide this code for a comming paper.

The targets are splitted across the specified devices by chunking in the batch dimension. Please use together with encoding.parallel.ModelDataParallel.

SelfDataParallel

class encoding.parallel.SelfDataParallel(module, device_ids=None, output_device=None, dim=0)[source]

SelfDataParallel, please make sure you understand it before using.

Reference::
We provide this code for a comming paper.

Each module in the network should be in self-parallel mode, which allows list of inputs from multiple GPUs. Please see encoding.nn for detail, use with cautious

AllReduce

class encoding.parallel.AllReduce[source]

Cross GPU all reduce autograd operation for calculate mean and variance in SyncBN.

Broadcast

class encoding.parallel.Broadcast(target_gpus)[source]

Multi-GPU broadcast autograd function