No items found.
Written by
Ermias Gebremeskel
Software engineer
Theofilos Kakantousis
VP of Product
November 12, 2021

Share & Star us:

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
More Blogs
Edited: First published 

Receiving Alerts in Slack/Email/ PagerDuty from Hopsworks - Support for managing your feature store

TLDR; Hopsworks is a Data-Intensive AI platform that manages the full AI lifecycle for MLOps, built around the industry leading Feature Store. Data ingestion tasks, for example data validation and ingestion, tend to be long-lived and typically run as part of a greater orchestrated data pipeline. Therefore, it is necessary to establish a mechanism where alerts can be customized and sent for different events that are triggered as part of the ingestion pipeline. This tutorial will go through the necessary steps to set up alerts in the Hopsworks Feature Store for feature validation and ingestion. 

Introduction

Hopsworks brings new alerting capabilities that enables users to monitor jobs and feature group validations. As alerting capabilities are relatively new to Hopsworks, we keep working on adding alerting support for other services. Currently the scope of alerts in Hopsworks is twofold; notify users about the changes in the status of jobs; notify users about the feature validation status of data being inserted into a feature group of the feature store or even of feature validations performed post-insertion.

There are two ways alerts can be configured for jobs and feature group validations:

  1. By creating alerts for all jobs or feature groups in a project
  2. By creating alerts for a specific job or feature group.

In this blog post we will go through the steps necessary to trigger alerts for jobs and feature group validations. Alerts can be sent via Slack, email, and PagerDuty. For the purposes of this blog Slack will be used as the alert receiver, but the steps described here apply to any of the methods previously mentioned.

Alerts can be set up at a project-level or at a cluster-level, meaning. Cluster wide alerts can only be configured by a platform administrator, a user with the HOPS_ADMIN role assigned, and project-level alerts on the other hand can be configured by any member of a project.

Prerequisites

To follow this tutorial you should have an instance of Hopsworks version 2.4 or above running on https://hopsworks.ai. You can register for free, without providing credit card information, and receive USD 300 worth of free credits to get started. The only thing you need to do is to connect your cloud account.

Walkthrough guide

Below we provide a step-by-step guide showing how to set up Hopsworks to trigger alerts for a PySpark feature engineering job that prior to inserting data into the feature group, uses the feature validation SDK in hsfs to ensure the correctness of the newly arrived data. Both the feature validation and the execution of the job will trigger alerts that are subsequently sent to different engineering groups, one that is responsible for monitoring jobs and another one that is responsible for the feature data itself.

In particular, by the end of this example the following events will have occurred: 

  1. The feature dataset is automatically generated inside the example program running as a job..
  2. The feature engineering job will validate and then insert synthetic data to a feature group.
  3. Two alerts will be sent to two different Slack channels, one when the feature validation fails due to incorrect data and one when the feature engineering job completes.

Step 1 - Configure the Alert manager 

To send alerts via Slack you first need to configure the global Slack webhook from the cluster settings page, as shown in the animation below. 


In addition to the webhook, you can add global receivers (channels). Global receivers will be available to all users of  the cluster.

To send alerts via email or PagerDuty you will need to add their respective configurations. If you prefer, it is also possible to add configurations and receivers by directly editing the alert manager configuration as shown in the image below.


Detailed information on how to configure the alert manager global settings is available here.   

Step 2 - Create receivers

After configuring the global Slack webhook you will be able to add project specific receivers by going to your project setting’s alerts section. Give the receiver a name that can identify the team that will receive the alert. In this example we will call it ml-team and will send alerts to the #ml-team channel and the user @admin in Slack. 


We also created a receiver called op-team that will receive alerts about jobs and feature group validations. 

Step 3 - Create alerts

Once a receiver is created we can go-ahead and create the alerts that will be triggered when feature validation fails and when the validation job finishes. 

  1. Create validation alert - project-wide

The validation alert will be created in the project settings and will be triggered on any validation event, in this demo a failed validation, in the project. When creating an alert we need to specify the trigger, receiver and severity. Here we will choose a trigger on data validation fail and the receiver will be the one created in the previous section (severity can be set to any value; info, warning or critical). 

  1. Create job alert

Now that we have created the validation alert we are ready to create the job that will create the feature group and populate it with fresh validated data. When creating the job we choose advanced to enable us to add alerts for this particular job. We want an alert to be triggered when the job finishes and send the alert to the op-team with severity info

The job’s code is based on the Data Validation with Python notebook from examples.hopsworks.ai


Finally run the job to see if alerts are sent to both teams. If everything works as expected #ml-team will receive an alert on data validation failure and #op-team will receive an alert when the job finishes (on job completion). The video below walks us through the scenario described above.