Detailed Course Outline
Introduction
- Meet the instructor.
- Create an account at courses.nvidia.com/join
Stochastic Gradient Descent and the Effects of Batch Size
- Learn the significance of stochastic gradient descent when training on multiple GPUs
- Understand the issues with sequential single-thread data processing and the theory behind speeding up applications with parallel processing.
- Understand loss function, gradient descent, and stochastic gradient descent (SGD).
- Understand the effect of batch size on accuracy and training time with an eye towards its use on multi-GPU systems.
Training on Multiple GPUs with PyTorch Distributed Data Parallel (DDP)
- Learn to convert single GPU training to multiple GPUs using PyTorch Distributed Data Parallel
- Understand how DDP coordinates training among multiple GPUs.
- Refactor single-GPU training programs to run on multiple GPUs with DDP.
Maintaining Model Accuracy when Scaling to Multiple GPUs
- Understand and apply key algorithmic considerations to retain accuracy when training on multiple GPUs
- Understand what might cause accuracy to decrease when parallelizing training on multiple GPUs.
- Learn and understand techniques for maintaining accuracy when scaling training to multiple GPUs.
Workshop Assessment
- Use what you have learned during the workshop: complete the workshop assessment to earn a certificate of competency
Final Review
- Review key learnings and wrap up questions.
- Take the workshop survey.