In a Nutshell
A feature store is a central vault for storing documented, curated, and access-controlled features. Here we discuss the state-of-the-art in data management for deep learning and present the first open-source and cloud-native feature store, available in Hopsworks.
what is a Feature Store?
The concept of a feature store was introduced by Uber in 2017. The feature store is a central place to store curated features within an organization. A feature is a measurable property of some data-sample. It could be for example an image-pixel, a word from a piece of text, the age of a person, a coordinate emitted from a sensor, or an aggregate value like the average number of purchases within the last hour. Features can be extracted directly from files and database tables, or can be derived values, computed from one or more data sources.
Features are the fuel for AI systems, as we use them to train machine learning models so that we can make predictions for feature values that we have never seen before.
The need for a Feature Store
Machine learning is an extremely powerful method that has the potential to help us move from a historical understanding of the world to a predictive modeling of the world around us. However, building machine learning systems is hard and requires specialized platforms and tools.
Although ad-hoc feature engineering and training pipelines is a quick way for Data Scientists to experiment with machine learning models, such pipelines have a tendency to become complex over time. As the number of models increase, it quickly becomes a pipeline jungle that is hard to manage. This motivates the usage of standardized methods and tools for the feature engineering process, helping reduce the cost of developing new predictive models. The feature store is a service designed for this purpose.
Practical benefits & economy at scale
A frequent pitfall for organizations that apply machine learning is to think of data science teams as individual groups that work independently with limited collaboration.
Having this mindset results in machine learning workflows where there is no standardized way to share features across different teams and machine learning models. Not being able to share features across models and teams is limiting Data Scientist's productivity and makes it harder to build new models.
By using a shared feature store, organizations can achieve an economies-of-scale effect. When the feature store is built up with more features, it becomes easier and cheaper to build new models as the new models can re-use features that exist in the feature store.
Integrates with Sagemaker,
Databricks, Kubeflow, Cloudera
The Hopsworks Feature Store integrates seamlessly with popular platforms for Data Science, such as AWS Sagemaker and Databricks. It also integrates with backend datalakes, such as S3 and Hadoop. Whether you deploy on‑premises or at your preferred cloud provider, the Hopsworks Feature Store will provide consistent, dependable functionality.