Learn how to connect Hopsworks to Snowflake and create features and make them available both offline in Snowflake and online in Hopsworks.Read more
Learn how to set up customized alerts in Hopsworks for different events that are triggered as part of the ingestion pipeline.Read more
Connect the Hopsworks Feature Store to Amazon Redshift to transform your data into features to train models and make predictions.Read more
Hopsworks is the first open source MLOps platform with an Enterprise-ready feature store. Its infrastructure has been developed from the ground up to create a cohesive ecosystem, in which all tools and services work together seamlessly.
Providing an open ecosystem that connects with the largest number of data storage, data pipelines, and data science platforms, enabling Snowflake users to use their preferred tools (Notebooks, Kafka, Spark, Airflow, Github, Jenkins…) and the languages (-R, Python, Java, Scala) at every point of the ML journey.
In order for features to be managed efficiently across projects and roles, they are stored in well-defined groups and training datasets.
Hopsworks' powerful expectation rules system includes warnings and alerts as part of a powerful data management solution to improve data validation in batch or real-time.
Data and feature previews with just a few clicks in the feature groups, allowing fast exploration and analysis of the data stored in the feature store.
Activity and provenance log lists events triggered by activities that affect your organization with rules, alerts, ingestions, and jobs to follow the lifecycle of a project and auditing.
Hopsworks manages and processes your data at scale for AI. The Hopsworks Feature Store manages features to be used in both training and serving models. It builds on HopsFS, the world's most scalable hierarchical HDFS- filesystem.
You can scale out training and hyperparameter optimization with as many GPUs as you can put in your cluster. And we provide framework support (Maggy and PySpark) to make distributed ML as Pythonic as possible.
ML pipelines have become the defacto way to productionize ML models. Hopsworks uses Airflow to orchestrate pipelines consisting of anything from (Py)Spark jobs, to Python programs on K8s, to Jupyter notebooks, to TensorFlow Extended (Beam/Flink).
JupyterLab is provided as a collaborative development environment, while jobs can also be deployed as programs: Python, PySpark, Scala/Java Spark, Beam/Flink.
Hopsworks can version all ML artifacts in ML pipelines: features in the feature store, train/test datasets, programs and pipelines in Github, and models in the model repository. Hopsworks also provides industry-leading support for provenance in ML pipelines: debug and explore lineage between processing steps and ML artifacts.
Unlike MLFlow and TFX, you do not need to re-write your piplines to add provenance: it is implicitly captured by our unique change-data-capture technology.