Hopsworks Feature Store

“Data is the hardest part of ML and the most important piece to get right. Modelers spend most of their time selecting and transforming features at training time and then building the pipelines to deliver those features to production models. Broken data is the most common cause of problems in production ML systems” - Uber

Hopsworks Feature Store

The Hopsworks Feature Store manages your Feature Data for machine learning, providing scalable data storage, a compute platform to generate training datasets, and Python and Java/Scala APIs to manage and use the features - in both Batch and Online applications.

Integrates with Sagemaker, Databricks, Hadoop

Hopsworks Feature Store integrates seamlessly with popular platforms for Data Science, such as AWS Sagemaker and Databricks. It also integrates with backend datalakes, such as S3, Hadoop. Whether you deploy on‑premises or at your preferred cloud provider, Hopsworks will provide the same great user experience.

Read Feature Store Whitepaper

Ensure Consistent Features in both Training and Serving

The starting point for building ML applications is to ingest data from a variety of structured and unstructured sources, wrangle and validate the data so that it can be used to build the features that will be used to train models. Hopsworks supports feature engineering in the following frameworks:

  • Python programs that use Pandas for feature engineering;
  • Spark or PySpark applications;
  • Beam/Flink with TensorFlow Extended (TFX) components.

Govern, Access-Control, and Version

Features, Feature Groups, and Training Datasets support:

  • versioning for easier reproduction of model training jobs and model interpretability;
  • access control to sensitive feature data;
  • extensive data provenance.

Create Training Data in the File Format of your Choice

Hopsworks Feature Store can transform materialized feature data:

  • .tfrecords for TensorFlow
  • .numpy for PyTorch
  • .csv for ScikitLearn and others
  • .petastorm, .hdf5, .parquet, .orc, and others.

Unified Online and Offline Storage

Hopsworks Feature Store consists of dual database engines for storing feature data:

Feature Data Validation

Feature Data can be validated either in ML pipelines for by Citizen Data Scientists from Hopsworks:

  • UI support for declaratively specifiying data validation rules for Feature Groups;
  • TFX Data Validation can be included in ML pipelines in Hopsworks;
  • Stage updated feature data scheduled for ingestion, so that it can be inspected and evaluated before ingestion - this is a useful capability when debugging a new Feature Pipeline.

Time Travel Queries for Features

Time-travel queries support the querying of historic values for features to generate new training data:

  • when an outcome is observed for an oldprediction, use time-travel to query the point-in-time values of the features when the prediction was made - then recreate the original feature vectors now that you have the label (outcome);
  • Reproduce experiments that trained models on historic values of features.

Single Sign-On with Sagemaker, Databricks

Hopsworks Feature Store can be used seamless from within AWS Sagemaker or Databricks (AWS):

  • uses Sagemaker or Databricks IAM role to authenticate with the Feature Store to acquire an API key to access the Feature Store using its REST API;
  • Online Feature Store access with single digit ms latency via a MySQL Server;
  • Offline Feature Store data can be consumed directly from Hive using JDBC in Sagemaker/Databricks notebooks.

Rich Connectors to S3, Hadoop, JDBC/ODBC Sources and Sinks

Hopsworks Feature Store can be connected with a variety of sources and sinks:

  • create training datasets on S3, HDFS, or HopsFS;
  • create on-demand features that pull data from external JDBC/ODBC sources;
  • Offline Feature Store data can be consumed directly from Hive using JDBC in Sagemaker/Databricks notebooks.
Get to know Hopsworks Feature Store: 
Feature Store Product SheetGet a live walkthrough

Hopsworks Feature Store at a glance

Contact us