Build your own Feature Store

Welcome to the feature store newsletter brought to you by Logical Clocks, where we in a monthly cadence will highlight the latest news, events, and insights as we help make companies successful in their machine learning transformation journey and empower businesses to be applied-AI model-driven companies‍

‍

How to Build your own Feature Store‍

As of May 2020, Logical Clocks are the only vendor of a Feature Store for machine learning (ML) and the only maker of a fully open-source and cloud-native Feature Store for ML. As such, we have many conversations with companies and organisations who are deciding between building their own feature store and buying one.

If you are considering following down the same path, we suggest that you read this blog post to learn from our experience of building the Hopsworks Feature Store. We walk through the most important benefits of feature stores, explain the reasoning behind some of the decisions and choices we made (and did not make) and include a flowchart to help you make tough decisions.

‍

Editor's Picks

Unifying Single-host and Distributed Machine Learning with Maggy

Moritz Meister from Logical Clocks introduces a new feature to an open-source framework, Maggy, that enables write-once training functions that can be reused in single-host Python programs and cluster-scale PySpark programs.

‍

Why StreamSQL moved from Apache Kafka to Apache Pulsar

Simba Khadder wrote in April about why they moved to Apache Pulsar and how it is used to support their online feature store for ML.

‍

Accelerate Your Machine Learning With The StreamSQL Feature Store

Simba Khadder had this interview at the Data Engineering Podcast on the feature store they are building at StreamSQL. It’s a good listen.

‍

Hopsworks for AWS SageMaker

Fabio Buso from Logical Clocks wrote a step by step guide for AWS Sagemaker users on how to perform exploratory data analysis with the feature store, discovering available features, and join features together to create train/test datasets - all from the comfort of the Jupyter notebook.

‍

Why we need DevOps for ML Data

Kevin Stumpf at tecton.ai wrote about the 5 challenges for infrastructure support for data for machine learning (building, combining, monitoring, and serving features, as well as accessing the raw data).

‍

Feature Stores: a guide for CEOs

ScribbleData wrote a blog that gives a high-level description of the motivation for feature stores, their view of the industry, and their experience with feature stores.

‍

ML Pipelines with a Feature Store

Altexsoft wrote a blog about ML pipelines and the role of the Feature Store in ML pipelines (“if an eCommerce store recommends products that other users with similar tastes and preferences purchased, the feature store will provide the model with features related to that.”).

‍

Machine Learning in Production: Lessons Learned from Deploying Our First ML Model

Alex Post from Clearcover wrote this article in thenewstack.io about how the starting point for their ML production journey was fixing the data layer - that is, building a feature store to manage their features.

‍

Manage your own Feature Store on Kubeflow with Hopsworks

Jim Dowling from Logical Clocks explains step by step on how to integrate Hopsworks with Kubeflow. As Kubernetes has limited support for Spark, this blog post shows how to Hopsworks is often used for both its feature store and Spark and scale-out deep learning capabilities.

Highlights of Spark AI Summit on Feature Stores

Spark+AI Summit ended last week and we could prove that feature stores have never been a hotter topic! There were six talks on feature stores at this year’s conference - that is a 600% increase from last year! Click below to watch the videos.

Building a Feature Store around Dataframes and Apache Spark

Fabio Buso and Jim Dowling from Logical Clocks talked about the Hopsworks Feature Store and how it integrates with Databricks. They explain how the Hopsworks Feature Store centralises features for ML for easier discovery and governance, enables the reuse of features in different ML projects and provides a single pipeline or engineering features for both training and inference.

Watch now

From Python to PySpark and Back Again – Unifying Single-host and Distributed Deep Learning with Maggy

Moritz Meister and Jim Dowling from Logical Clocks introduce an open-source framework, Maggy, that enables write-once training functions that can be reused in single-host Python programs and cluster-scale PySpark programs.

Watch now

A Declarative Feature Engineering Framework

Nikhil Simha from AirBnB will talk about Zipline, their declarative feature engineering framework and Feature Store.

Watch now

Orchestrating Spark ML Pipelines and MLflow for Production

Nathan Buesgens at Accenture is due to talk about orchestrating Spark ML Pipelines and MLflow for Production in a Feature Store - “ break the anti-pattern of ‘clone and own’ ML pipelines”.

Watch now

Assisted-AI Feature Selection for Big Data Modelling

Alvin Henrick from Clarify Health Solutions will talk about an AI assisted way to identify which features improve the accuracy of a model and by how much - using AI to help select the best features for your models.

Watch now

Accelerate Real-Time ML with a Feature Platform

Mike Del Balso of Tacton.ai will talk about their Feature Store and its support for centralized management of features for serving and training along with version-control.

Watch now

Job Opportunities

Data Scientist – Machine Learning Platform

Spotify is hiring a ML engineer to work on their feature store (data platform for AI).
Apply here

Software Engineer - ML Serving Platform

Pinterest is hiring a Software Engineer to join their ML Serving Platform team and work on their ML feature store.

Apply here

Upcoming Webinars & Events

Webinar: Manage your own Feature Store on Databricks with Hopsworks

Attend this webinar to learn how to write Python or Scala programs to compute features and register them in Hopsworks, browse and inspect features, and create train/test datasets from within a notebook.

‍

Build your own Feature Store

How to Build your own Feature Store‍

Editor's Picks

Unifying Single-host and Distributed Machine Learning with Maggy

Why StreamSQL moved from Apache Kafka to Apache Pulsar

Accelerate Your Machine Learning With The StreamSQL Feature Store

Hopsworks for AWS SageMaker

Why we need DevOps for ML Data

Feature Stores: a guide for CEOs

ML Pipelines with a Feature Store

Machine Learning in Production: Lessons Learned from Deploying Our First ML Model

Manage your own Feature Store on Kubeflow with Hopsworks

Highlights of Spark AI Summit on Feature Stores

Building a Feature Store around Dataframes and Apache Spark

From Python to PySpark and Back Again – Unifying Single-host and Distributed Deep Learning with Maggy

A Declarative Feature Engineering Framework

Orchestrating Spark ML Pipelines and MLflow for Production

Assisted-AI Feature Selection for Big Data Modelling

Accelerate Real-Time ML with a Feature Platform

Job Opportunities

Data Scientist – Machine Learning Platform

Spotify is hiring a ML engineer to work on their feature store (data platform for AI).Apply here

Software Engineer - ML Serving Platform

Upcoming Webinars & Events

Webinar: Manage your own Feature Store on Databricks with Hopsworks

Latest Videos

Get Started with Hopsworks

Products

Solutions

Resources

Company

Spotify is hiring a ML engineer to work on their feature store (data platform for AI).
Apply here