Hopsworks

Hopsworks is an open-source platform for developing and operating ML models at scale.
AWSOn-PremisesGCP

The Hopsworks Platform

Hopsworks is an open-source Enterprise platform for the development and operation of Machine Learning (ML) pipelines at scale, based around the industry’s first Feature Store for ML. You can easily progress from data exploration and model development in Python using Jupyter notebooks and conda to running production quality end-to-end ML pipelines, without having to learn how to manage a Kubernetes cluster.

Connect your datasources

Hopsworks can ingest data from the datasources you use. Whether they are in the cloud, on‑premise, IoT networks, or from your Industry 4.0-solution.

On-premise
Cloud
IoT-devices
Industry 4.0
Hopsworks' White Paper

Data Pipelines and Feature Engineering

The starting point for building ML applications is to ingest data from a variety of structured and unstructured sources, wrangle and validate the data so that it can be used to build the features that will be used to train models. Hopsworks supports data preparation and feature engineering in the following frameworks:

  • Python programs that use Pandas for feature engineering;
  • Spark or PySpark applications to process larger amounts with more resources - faster;
  • Beam on Flink which enables the use of TensorFlow Extended (TFX) components: TensorFlow Data validation and TensorFlow Transform.

Feature Store

World's first publicly available Feature Store

Hopsworks’ Feature Store is a new data layer in horizontally scalable machine learning pipelines that:

  • Enables features to be discovered, analyzed, and reused across applications
  • Ensures consistency of feature engineering between training and model serving
  • Enables time-travel queries to read historical values for feature values
  • Ensure high quality feature data through integration with data validation tooling

Experimentation & Model Training

Tools and Frameworks loved by most

Hopsworks provides framework support for enabling Machine Learning code to:

  • Easily move from development to production: Jupyter notebooks can be run directly in ML pipelines;
  • Easily scale from a single container/GPU to a cluster of 10s or 100s of GPUs using the Maggy framework, which provides a unified framework for the parallel execution of Hyperparameter Optimization and Ablation Studies using PySpark;
  • Easily scale from a single container/GPU to a cluster of 10s or 100s of GPUs for distributed training on TensorFlow with the CollectiveAllReduce strategy, and
  • Be reproducible through management of experiments, showing output models of training runs, the notebook used to generate the model, and the conda environment used;
  • Produce and validate models that are managed by Hopsworks.

Putting ML Pipelines in Production

Hopsworks includes Airflow as an orchestration engine for managing the execution of ML pipelines.

  • Notebooks or Jobs in Hopsworks can be run as stages in ML pipelines through the HopsworksJob Airflow Operator;
  • Airflow provides error handling and notification support to ensure that any production problems with pipelines can be immediately discovered and acted upon;
  • Pipelines are data-aware - as Hopsworks manages versioned models, you can check if the model that has been trained outperforms the currently deployed model before deploying it to production.

Model Serving & Monitoring

Hopsworks supports the deployment and management of both batch and real-time models:

  • Batch applications use models either by downloading them via the REST API or within Hopsworks by Spark/Beam/Flink applications reading them from as files from HopsFS;
  • Online applications can use models that are served in real-time from either TensorFlow Serving Server or from a flask application (Scikit-learn / H2O). Hopsworks Enterprise serves these models on elastic Kubernetes containers;
  • Logs for real-time models (prediction requests and responses) can be stored in a Kafka topic so that streaming applications can monitor the performance/behaviour of the model in real-time;
  • Online model serving provide a TLS-enabled REST API with self-service access control.

On-Premises or in the Cloud

Deploy on‑premises on your own hardware or at your preferred cloud provider. Hopsworks will provide the same user experience in the cloud or in the most secure of air‑gapped deployments.

On-premise
AWS
GCP, Azure
OpenStack
Get to know Hopsworks: 
Hopsworks Product SheetGet a live walk-through

Hopsworks at a glance

Community Edition vs Enterprise Edition

Feature
Community Edition
Enterprise Edition
Distributed Deep Learning
Offline Feature Store
Online Feature Store
Jupyter Notebooks for Development
Dynamic Role-based Access Control
Encryption At-Rest, In-Motion
Data Provenance & Lineage
Jupyter Notebooks on Kubernetes
Model Serving on Kubernetes
Support via email, chat & phone


Contact us