Manage your own Feature Store on Databricks with Hopsworks
Wednesday July 15th @8:00 PM CEST
Sign up now →
15

Feature Store

The Hopsworks Feature Store manages your features for training and serving models. It provides scale-out storage for training and batch inference as well as low-latency storage for online applications that need to build feature vectors to make real-time predictions. It provides both Python and Java/Scala APIs to enable Batch and Online applications manage and use features for machine learning.
AWSOn-PremisesAzure/GCP

Integrates with Sagemaker,
Databricks, Kubeflow, Cloudera

The Hopsworks Feature Store integrates seamlessly with popular platforms for Data Science, such as AWS Sagemaker and Databricks. It also integrates with backend datalakes, such as S3 and Hadoop. Whether you deploy on‑premises or at your preferred cloud provider, the Hopsworks Feature Store will provide consistent, dependable functionality.

On-Premises
Cloud
Read Feature Store Whitepaper

Ensure Consistent Features in both Training and Serving

The starting point for building ML applications is to ingest data from a variety of structured and unstructured sources, wrangle and validate the data so that it can be used to build the features that will be used to train models. Hopsworks supports feature engineering in the following frameworks:

  • Python programs that use Pandas for feature engineering;
  • Spark or PySpark applications;
  • Beam/Flink with TensorFlow Extended (TFX) components.

Govern, Access-Control, and Version

Features, Feature Groups, and Training Datasets support:

  • versioning for easier reproduction of model training jobs and model interpretability;
  • access control to sensitive feature data;
  • extensive data provenance.

Create Training Data in the File Format of your Choice

Hopsworks Feature Store can transform materialized feature data:

  • .tfrecords for TensorFlow
  • .numpy for PyTorch
  • .csv for ScikitLearn and others
  • .petastorm, .hdf5, .parquet, .orc, and others.

Unified Online and Offline Storage

Hopsworks Feature Store consists of dual database engines for storing feature data:

Feature Data Validation

Feature Data can be validated either in ML pipelines for by Citizen Data Scientists from Hopsworks:

  • UI support for declaratively specifiying data validation rules for Feature Groups;
  • TFX Data Validation can be included in ML pipelines in Hopsworks;
  • Stage updated feature data scheduled for ingestion, so that it can be inspected and evaluated before ingestion - this is a useful capability when debugging a new Feature Pipeline.

Time Travel Queries for Features

Time-travel queries support the querying of historic values for features to generate new training data:

  • when an outcome is observed for an oldprediction, use time-travel to query the point-in-time values of the features when the prediction was made - then recreate the original feature vectors now that you have the label (outcome);
  • Reproduce experiments that trained models on historic values of features.

Single Sign-On with Sagemaker, Databricks

Hopsworks Feature Store can be used seamless from within AWS Sagemaker or Databricks (AWS):

  • uses Sagemaker or Databricks IAM role to authenticate with the Feature Store to acquire an API key to access the Feature Store using its REST API;
  • Online Feature Store access with single digit ms latency via a MySQL Server;
  • Offline Feature Store data can be consumed directly from Hive using JDBC in Sagemaker/Databricks notebooks.

Rich Connectors to S3, Hadoop, JDBC/ODBC Sources and Sinks

Hopsworks Feature Store can be connected with a variety of sources and sinks:

  • create training datasets on S3, HDFS, or HopsFS;
  • create on-demand features that pull data from external JDBC/ODBC sources;
  • Offline Feature Store data can be consumed directly from Hive using JDBC in Sagemaker/Databricks notebooks.

Better AI with Hopsworks

Watch this Feature Store presentation to understand how it can improve your ML model development and operations.

Try Hopsworks now

Hopsworks at a glance

Efficiency & Performance

Development & Operations

Governance & Compliance

Feature Store
Data warehouse for ML
Distributed Deep Learning
Faster with more GPUs
HopsFS
NVMe Speed with Big Data
Horizontally Scalable
Ingestion, Dataprep, training, Serving
Notebooks For development
First-class Python Support
Version Everything
Code, Infrastructure, Data
Model Serving on Kubernetes
TF Serving, MLeap, SkLearn
End-to-End ML Pipelines
Orchestrated by Airflow
Secure Multi-tenancy
Project-based restricted Access
Encription At-rest, In-Motion
TLS/SSL everywhere
AI-Asset Governance
Models, Experiment, data, GPUs
Data/Model/Feature Lineage
Discover/track dependencies

Efficiency & Performance

Feature Store
Data warehouse for ML
Distributed Deep Learning
Faster with more GPUs
HopsFS
NVMe Speed with Big Data
Horizontally Scalable
Ingestion, Dataprep, training, Serving

Development & Operations

Notebooks For development
First-class Python Support
Version Everything
Code, Infrastructure, Data
Model Serving on Kubernetes
TF Serving, MLeap, SkLearn
End-to-End ML Pipelines
Orchestrated by Airflow

Governance & Compliance

Secure Multi-tenancy
Project-based restricted Access
Encription At-rest, In-Motion
TLS/SSL everywhere
AI-Asset Governance
Models, Experiment, data, GPUs
Data/Model/Feature Lineage
Discover/track dependencies

Book a demo

Get an introduction to Hopsworks and Hopsworks Feature Store for your Machine Learning projects together with one of our engineers.

A comprehensive walk-through
We’ll go through:
• How Hopsworks can align with your current ML pipelines
• How to manage Features within Hopsworks feature store
• The benefits of Hopsworks Feature Store for your teams

Let us know if your specific wishes and pre-requisites for your personal demonstration.