Manage your own Feature Store on Databricks with Hopsworks
Wednesday July 15th @8:00 PM CEST
Sign up now →
15

Hopsworks

Hopsworks is an open-source platform for developing and operating Machine Learning models at scale.

From Data to models
in Production

Hopsworks is an open-source Enterprise platform for the development and operation of Machine Learning (ML) pipelines at scale, based around the industry’s first Feature Store for ML. You can easily progress from data exploration and model development in Python using Jupyter notebooks and conda to run production quality end-to-end ML pipelines, without having to learn how to manage a Kubernetes cluster.

Hopsworks can ingest data from the datasources you use. Whether they are in the cloud, on‑premise, IoT networks, or from your Industry 4.0-solution.

On-premise
Cloud
IoT-devices
Industry 4.0
Hopsworks' White Paper

Data Pipelines and Feature Engineering

The starting point for building ML applications is to ingest data from a variety of structured and unstructured sources, wrangle and validate the data so that it can be used to build the features that will be used to train models. Hopsworks supports data preparation and feature engineering in the following frameworks:

  • Python programs that use Pandas for feature engineering;
  • Spark or PySpark applications to process larger amounts with more resources - faster;
  • Beam on Flink which enables the use of TensorFlow Extended (TFX) components: TensorFlow Data validation and TensorFlow Transform.

Feature Store

World's first publicly available Feature Store
Hopsworks’ Feature Store is a new data layer in horizontally scalable machine learning pipelines that:

  • Enables features to be discovered, analyzed, and reused across applications
  • Ensures consistency of feature engineering between training and model serving
  • Enables time-travel queries to read historical values for feature values
  • Ensure high quality feature data through integration with data validation tooling

Experimentation & Model Training

Tools and Frameworks loved by most
Hopsworks provides framework support for enabling Machine Learning code to:

  • Easily move from development to production: Jupyter notebooks can be run directly in ML pipelines;
  • Easily scale from a single container/GPU to a cluster of 10s or 100s of GPUs using the Maggy framework, which provides a unified framework for the parallel execution of Hyperparameter Optimization and Ablation Studies using PySpark;
  • Easily scale from a single container/GPU to a cluster of 10s or 100s of GPUs for distributed training on TensorFlow with the CollectiveAllReduce strategy, and
  • Be reproducible through management of experiments, showing output models of training runs, the notebook used to generate the model, and the conda environment used;
  • Produce and validate models that are managed by Hopsworks.

Putting ML Pipelines in Production

Hopsworks includes Airflow as an orchestration engine for managing the execution of ML pipelines.

  • Notebooks or Jobs in Hopsworks can be run as stages in ML pipelines through the HopsworksJob Airflow Operator;
  • Airflow provides error handling and notification support to ensure that any production problems with pipelines can be immediately discovered and acted upon;
  • Pipelines are data-aware - as Hopsworks manages versioned models, you can check if the model that has been trained outperforms the currently deployed model before deploying it to production.

Model Serving & Monitoring

Hopsworks supports the deployment and management of both batch and real-time models:

  • Batch applications use models either by downloading them via the REST API or within Hopsworks by Spark/Beam/Flink applications reading them from as files from HopsFS;
  • Online applications can use models that are served in real-time from either TensorFlow Serving Server or from a flask application (Scikit-learn / H2O). Hopsworks Enterprise serves these models on elastic Kubernetes containers;
  • Logs for real-time models (prediction requests and responses) can be stored in a Kafka topic so that streaming applications can monitor the performance/behaviour of the model in real-time;
  • Online model serving provide a TLS-enabled REST API with self-service access control.

On-Premises or in the Cloud

Deploy on‑premises on your own hardware or at your preferred cloud provider. Hopsworks will provide the same user experience in the cloud or in the most secure of air‑gapped deployments.

On-premise
AWS
GCP, Azure
OpenStack
Hopsworks' White Paper

Data Pipelines and Feature Engineering

The starting point for building ML applications is to ingest data from a variety of structured and unstructured sources, wrangle and validate the data so that it can be used to build the features that will be used to train models. Hopsworks supports data preparation and feature engineering in the following frameworks:

  • Python programs that use Pandas for feature engineering;
  • Spark or PySpark applications to process larger amounts with more resources - faster;
  • Beam on Flink which enables the use of TensorFlow Extended (TFX) components: TensorFlow Data validation and TensorFlow Transform.

Feature Store

World's first publicly available Feature Store
Hopsworks’ Feature Store is a new data layer in horizontally scalable machine learning pipelines that:

  • Python programs that use Pandas for feature engineering;
  • Spark or PySpark applications to process larger amounts with more resources - faster;
  • Beam on Flink which enables the use of TensorFlow Extended (TFX) components: TensorFlow Data validation and TensorFlow Transform.

Experimentation & Model Training

World's first publicly available Feature Store
Hopsworks’ Feature Store is a new data layer in horizontally scalable machine learning pipelines that:

  • Easily move from development to production: Jupyter notebooks can be run directly in ML pipelines;
  • Easily scale from a single container/GPU to a cluster of 10s or 100s of GPUs using the Maggy framework, which provides a unified framework for the parallel execution of Hyperparameter Optimization and Ablation Studies using PySpark;
  • Easily scale from a single container/GPU to a cluster of 10s or 100s of GPUs for distributed training on TensorFlow with the CollectiveAllReduce strategy, and
  • Be reproducible through management of experiments, showing output models of training runs, the notebook used to generate the model, and the conda environment used;
  • Produce and validate models that are managed by Hopsworks.

Putting ML Pipelines in Production

Hopsworks includes Airflow as an orchestration engine for managing the execution of ML pipelines.

  • Notebooks or Jobs in Hopsworks can be run as stages in ML pipelines through the HopsworksJob Airflow Operator;
  • Airflow provides error handling and notification support to ensure that any production problems with pipelines can be immediately discovered and acted upon;
  • Pipelines are data-aware - as Hopsworks manages versioned models, you can check if the model that has been trained outperforms the currently deployed model before deploying it to production.

Model Serving & Monitoring

Hopsworks supports the deployment and management of both batch and real-time models:

  • Batch applications use models either by downloading them via the REST API or within Hopsworks by Spark/Beam/Flink applications reading them from as files from HopsFS;
  • Online applications can use models that are served in real-time from either TensorFlow Serving Server or from a flask application (Scikit-learn / H2O). Hopsworks Enterprise serves these models on elastic Kubernetes containers;
  • Logs for real-time models (prediction requests and responses) can be stored in a Kafka topic so that streaming applications can monitor the performance/behaviour of the model in real-time;
  • Online model serving provide a TLS-enabled REST API with self-service access control.

Better AI with Hopsworks

Watch this Feature Store presentation to understand how it can improve your ML model development and operations.

Try Hopsworks now

Hopsworks at a glance

Efficiency & Performance

Development & Operations

Governance & Compliance

Feature Store
Data warehouse for ML
Distributed Deep Learning
Faster with more GPUs
HopsFS
NVMe Speed with Big Data
Horizontally Scalable
Ingestion, Dataprep, training, Serving
Notebooks For development
First-class Python Support
Version Everything
Code, Infrastructure, Data
Model Serving on Kubernetes
TF Serving, MLeap, SkLearn
End-to-End ML Pipelines
Orchestrated by Airflow
Secure Multi-tenancy
Project-based restricted Access
Encription At-rest, In-Motion
TLS/SSL everywhere
AI-Asset Governance
Models, Experiment, data, GPUs
Data/Model/Feature Lineage
Discover/track dependencies

Efficiency & Performance

Feature Store
Data warehouse for ML
Distributed Deep Learning
Faster with more GPUs
HopsFS
NVMe Speed with Big Data
Horizontally Scalable
Ingestion, Dataprep, training, Serving

Development & Operations

Notebooks For development
First-class Python Support
Version Everything
Code, Infrastructure, Data
Model Serving on Kubernetes
TF Serving, MLeap, SkLearn
End-to-End ML Pipelines
Orchestrated by Airflow

Governance & Compliance

Secure Multi-tenancy
Project-based restricted Access
Encription At-rest, In-Motion
TLS/SSL everywhere
AI-Asset Governance
Models, Experiment, data, GPUs
Data/Model/Feature Lineage
Discover/track dependencies

Book a demo

Get an introduction to Hopsworks and Hopsworks Feature Store for your Machine Learning projects together with one of our engineers.

A comprehensive walk-through
• How Hopsworks can align with your current ML pipelines
• How to manage Features within Hopsworks feature store
• The benefits of Hopsworks Feature Store for your teams

Let us know if your specific wishes and pre-requisites for your personal demonstration.