Autodist: Distributed Deep Learning Training Engine
Petuum is excited to announce our latest Open Source project, AutoDist, a distributed deep learning training engine. It provides an easy to use interface to automatically distribute the training of a wide variety of deep learning models across many CPUs and GPUs at scale with very minimal code change.
Why use AutoDist?
Are you searching for an intuitive library on TensorFlow for distributed training? As an alternative to Horovod better performance, AutoDist allows a developer to scale a model from a single GPU to many, without requiring changes to your model building scripts . We’ve approached this from a different perspective — graph optimization with composable representation of strategies to enable manual crafting or automated selection of the best options for your model training.
How to use AutoDist?
AutoDist is very easy and intuitive to use. It adopts the philosophy of minimizing users’ code modification by exposing a set of interfaces that allows the distribution of arbitrary single-node version code with little-to-zero modification.
For example, after a developer prototypes and finalizes the model design, they may want to accelerate the training process with multiple GPU devices or nodes. However, it requires a lot of effort to change a model built for a single device to fit parallel training; add to that the effort to configure the cluster, to install the required technology, to adopt another launcher and so on. AutoDist aims to free developers from those concerns: once you have a TensorFlow environment for prototyping models, all you need to do to use AutoDist is install one package:
Next, you make AutoDist aware of your resource specification.
This allows you to keep the single-device model building script under graph mode while training it in a distributed way. (If you are curious about eager mode, we may have the answers here at the FAQ page.)
As deep learning models become more structurally complex, existing distributed Machine Learning (ML) systems often have limitations on providing better all-round performance for a wide variety of models, since some of them are specialized for one monolithic architecture or technology e.g. MPI.
AutoDist allows for different ML models (or different components in a complex model) to exhibit different runtime characteristics, and different learning algorithms that demonstrate distinct computational patterns. This demands model-and-algorithm-aware systems or parallelization treatments for distributed execution. develops a unified strategy representation space to adapt different computational units or states to different synchronization configurations.
Such a representation is composed of multiple atomic deep learning parallelization techniques and factors contributing to performance. A complete representation strategy directs how the model should be parallelized on the target cluster — specifically how a single-device model is transformed into a distributed model. This approach isolates strategy prototyping from low-level distributed systems, allows composition, assessment and execution of complex synchronization strategies via a unified interface, and is extensible to emerging synchronization techniques.
The introduced representation spans a combinatorial space enclosing all possible strategies. already provides some built-in strategy builders to choose from as a constructor argument to instruct the automatic graph transformation and optimization.
More importantly, AutoDist generates an auto-strategy with an auto-optimization pipeline to efficiently propose the strategy adaptive to a certain model and resources. Our team at Petuum designed AutoDist to not only improve parallel performance, but for greater convenience for users by eliminating the need to manually select a strategy . The experimental auto — strategy optimizer is built on top of data-driven ML models trained on low-shot trial-run data and will improve as more data is acquired.
We, as ML technologists, constantly face the challenges of training and tweaking models of ever-increasing complexity and data scale. Instead of further evolving expert-designed optimizers for better performance, we chose a different path. Rather than defaulting to the common develop-a-new–model–optimizer technique, we decided to take another principled approach. Here at Petuum we set out to build a system to optimize/compile/transform a computational graph together with cluster resources combined in a distributed setting. Note, as of now, the system only supports declarative frameworks like TensorFlow, but in the near future it can be extended to eager frameworks with JIT support.
This article is a brief introduction into one of the exciting projects (AutoDist, which has recently been open-sourced) the Scalable ML team at Petuum is working on. We invite you to join us on our journey to explore novel approaches to push the limits of distributed machine learning and welcome any feedback and ideas you might have.
Originally published at https://petuum.com on July 21, 2020.