GPU and related techs 101
Last time I touched CUDA and GPU was in 2018, when I was preparing for job hunting at CGG. It’s time to review some basics about GPU now
-
Ray Tracing Ray Tracing models the behavior of light in the scene. The “ray” of light are followed, or “traced” to determine the color of all the objects in the scene.
-
Resolutions
Displays consist of pixels. The color of each pixel is determined by little programs, or shaders A 4K display at 60Hz requires 5million (497,664,000 = 2160 x 3840 x 60) pixels - Virtual GPUs
- Virutal PC
- Virtual Applications: app streaming
- RTX Virutal Workstations: performance graphics
-
CUDA Cores
CUDA cores for parallel work
RT cores for Ray Tracking acceleration
Tensor cores for AI acceleration - Networking
- HPC: InfiniBand
- AI: Mellanox GPUDirect RDMA(Remote Direct Memory Access)
- Storage: Ethernet Solutions/storage fabric
- Cloud: BlueField DPU
- Zero-Trust Security: Nvidia DOCA and BlueField DPU
- Zero Trust is a security framework requiring all users, whether in or outside the organization’s network, to be authenticated,and continuously validated for security configuration and posture before being granted or keeping access to applications and data.
- Versus traditional approach automatically trusted users and endpoints within the organization’s perimeter, putting the organization at risk from malicious internal actors
- Cypersecurity
- Digital fingerprinting
- Spear phishing
- Deployment
- Docker Engine vs Podman
- Container Runtimes
- Containerd
- CRI-O
- K8s distribution - Upstream K8s
- OpenShift Container Platform (OCP)
- HPE Ezmeral Runtime Enterprise - RHEL KVM (Kernel based Virtual Machine) - VMware vSphere with Tanzu Last here is a good summary of how container system works
- Application management
- Container Toolkit w Docker/Podman
- GPU Operators works w k8s clusters
- Workload management
- Altair PBS pro
- Slurm
- K8s
- Doherty Threshold
Productivity soars when a computer and its users interact at a pace (<400ms) - RAPIDS techs
- Apache Arrow: standard memory format
- Unified Comminication X (UCX): faster version of TCP for GPU communications
- SuperPOD Storage ref
- DDN
- NetApp BeeGFS
- IBM Spectrum Scale
- VAST
- DPU
- BlueField
- DOCA
- SDN Software Defined Network
- SR-IOV: Single Root, I/O Virtualization, one NIC to appear as multile virtual NICs
- OVS Open Virtual Switch
- VirtIO: Virtual I/O
- Storage
- DAS: Direct Attached Storage
- NVMe: Non-Volatile Memory Express
- BlueField Snap: hardware-accelerated virtualization of NVMe
- Ethernet
- ConnectX
- RDMA over Converged Ethernet (RoCE) enables reading and writing without involving GPU
- VXLAN offload enables TCP/IP offloads without involving GPU
- SR-IOV enables VM’s direct access to network adapters - Cumulus Linux
- OS for Open Ethernet Switches
- (air.nvidia.com)Virtual - LinkX (up to 40km) - SONiC Software for Open Network in Cloud - NetQ: telemetry collection - What-just-happened: packet lost
- InfiniBand
- EDR/HDR/NDR switches
- 100/200/400 GB/s
- Enhanced/High/Next Date Rate
- S - iSER - RDMA storage protocols - IPoIB: IP over InfiniBand - DAC: Direct Attach Copper - AOC: Active Optical Cables - HCA: Host Channel Adapters
- EDR/HDR/NDR - SHARP: Scalable Hierarchical Aggregation and Reduction Protocol - UFM Unified Fabric Manager: Telemetry collection - RDAM: one side communication enabled by hardware.
- No header
- bypassing the OS
- socket application
- Spectrum-X
- Spectrum-4 switch
- BlueField-3 SuperNIC
- RoCE: InfiniBand Trade Associat(IBTA)