Tensor Parallelism and Pipeline Parallelism

less than 1 minute read

There are multiple parallelism strategies from video

1 Data Parallelism

This basic parallelism is split data between GPUs Alt text In the LLM era, model parallelism are actually used Alt text Callout NVLink between GPUs and InfiniBand between nodes. SXM version GPU are GPU with NVLink connections, instead of PCIe Alt text

2 Pipeline Parallelism

Also call inter-layer parallelism. (inter- means between) Alt text It has bubbles so Google paper introduced micro-batch to mitigate the time waste. Alt text NVidia introduced 1F1B in this blog Alt text

3 Tensor Parallelism

Also call intra-layer parallelism. (intra- means within) Matrix calculation can be divided, so it leads to TP algorithm Alt text

Examples can be found in the Megatron-LM paper Alt text

4 3D Parallelism

All these method can be used together and that’s the idea behind Deepspeed. Alt text

Tags:

Categories:

Updated: