Hops Enterprise Scale-Out Deep Learning

Hops supports GPUs as a managed resources, and enables Data Scientists to easily parallelize experiments (such as hyperparameter optimization) and parallelize distributed training. Once a satisfactory model has been trained, just a few clicks are need to roll out that trained model into production, by deploying the model on TensorFlow Serving. Client applications can then send inferencing requests to the deployed model using gRPC.


Scale Out Deep Learning by parallelizing experiments (hyperparameter optimization) and training over many GPUs.


Systems Support for Scale-Out Deep Learning

The Hopsworks platform is richer than the simplified above model shows. We need to close the loop by collecting data from client applications to build a continuous learning platform. We need data management, both for the client application feedback (typically ingested through Kafka), and for the long-term storage of training data and experiments/programs.