Welcome to the feature store newsletter brought to you by Logical Clocks, where we in a monthly cadence will highlight the latest news, events, and insights as we help make companies successful in their machine learning transformation journey and empower businesses to be applied-AI model-driven companies
How to Build your own Feature Store
As of May 2020, Logical Clocks are the only vendor of a Feature Store for machine learning (ML) and the only maker of a fully open-source and cloud-native Feature Store for ML. As such, we have many conversations with companies and organisations who are deciding between building their own feature store and buying one.
If you are considering following down the same path, we suggest that you read this blog post to learn from our experience of building the Hopsworks Feature Store. We walk through the most important benefits of feature stores, explain the reasoning behind some of the decisions and choices we made (and did not make) and include a flowchart to help you make tough decisions.
Unifying Single-host and Distributed Machine Learning with Maggy
Moritz Meister from Logical Clocks introduces a new feature to an open-source framework, Maggy, that enables write-once training functions that can be reused in single-host Python programs and cluster-scale PySpark programs.
Why StreamSQL moved from Apache Kafka to Apache Pulsar
Simba Khadder wrote in April about why they moved to Apache Pulsar and how it is used to support their online feature store for ML.
Accelerate Your Machine Learning With The StreamSQL Feature Store
Simba Khadder had this interview at the Data Engineering Podcast on the feature store they are building at StreamSQL. It’s a good listen.
Hopsworks for AWS SageMaker
Fabio Buso from Logical Clocks wrote a step by step guide for AWS Sagemaker users on how to perform exploratory data analysis with the feature store, discovering available features, and join features together to create train/test datasets - all from the comfort of the Jupyter notebook.
Why we need DevOps for ML Data
Kevin Stumpf at tecton.ai wrote about the 5 challenges for infrastructure support for data for machine learning (building, combining, monitoring, and serving features, as well as accessing the raw data).
Feature Stores: a guide for CEOs
ScribbleData wrote a blog that gives a high-level description of the motivation for feature stores, their view of the industry, and their experience with feature stores.
ML Pipelines with a Feature Store
Altexsoft wrote a blog about ML pipelines and the role of the Feature Store in ML pipelines (“if an eCommerce store recommends products that other users with similar tastes and preferences purchased, the feature store will provide the model with features related to that.”).
Machine Learning in Production: Lessons Learned from Deploying Our First ML Model
Alex Post from Clearcover wrote this article in thenewstack.io about how the starting point for their ML production journey was fixing the data layer - that is, building a feature store to manage their features.
Manage your own Feature Store on Kubeflow with Hopsworks
Jim Dowling from Logical Clocks explains step by step on how to integrate Hopsworks with Kubeflow. As Kubernetes has limited support for Spark, this blog post shows how to Hopsworks is often used for both its feature store and Spark and scale-out deep learning capabilities.
Highlights of Spark AI Summit on Feature Stores
Spark+AI Summit ended last week and we could prove that feature stores have never been a hotter topic! There were six talks on feature stores at this year’s conference - that is a 600% increase from last year! Click below to watch the videos.
Building a Feature Store around Dataframes and Apache Spark
Fabio Buso and Jim Dowling from Logical Clocks talked about the Hopsworks Feature Store and how it integrates with Databricks. They explain how the Hopsworks Feature Store centralises features for ML for easier discovery and governance, enables the reuse of features in different ML projects and provides a single pipeline or engineering features for both training and inference.
From Python to PySpark and Back Again – Unifying Single-host and Distributed Deep Learning with Maggy
Moritz Meister and Jim Dowling from Logical Clocks introduce an open-source framework, Maggy, that enables write-once training functions that can be reused in single-host Python programs and cluster-scale PySpark programs.
A Declarative Feature Engineering Framework
Nikhil Simha from AirBnB will talk about Zipline, their declarative feature engineering framework and Feature Store.
Orchestrating Spark ML Pipelines and MLflow for Production
Nathan Buesgens at Accenture is due to talk about orchestrating Spark ML Pipelines and MLflow for Production in a Feature Store - “ break the anti-pattern of ‘clone and own’ ML pipelines”.
Assisted-AI Feature Selection for Big Data Modelling
Alvin Henrick from Clarify Health Solutions will talk about an AI assisted way to identify which features improve the accuracy of a model and by how much - using AI to help select the best features for your models.
Accelerate Real-Time ML with a Feature Platform
Mike Del Balso of Tacton.ai will talk about their Feature Store and its support for centralized management of features for serving and training along with version-control.
Data Scientist – Machine Learning Platform
Spotify is hiring a ML engineer to work on their feature store (data platform for AI).
Software Engineer - ML Serving Platform
Pinterest is hiring a Software Engineer to join their ML Serving Platform team and work on their ML feature store.
Upcoming Webinars & Events
Webinar: Manage your own Feature Store on Databricks with Hopsworks
Attend this webinar to learn how to write Python or Scala programs to compute features and register them in Hopsworks, browse and inspect features, and create train/test datasets from within a notebook.