Logical Clocks

Time Travel and Provenance for Machine Learning Pipelines

Machine learning pipelines have become the defacto paradigm for productionizing machine learning applications as they clearly abstract the processing steps involved in transforming raw data into engineered features that are then used to train models. In this paper, we use a bottom-up method for capturing provenance information regarding the processing steps and artifacts produced in ML pipelines. Our approach is based on replacing traditional intrusive hooks in application code (to capture ML pipeline events) with standardized change-data-capture support in the systems involved in ML pipelines: the distributed file system, feature store, resource manager, and applications themselves. In particular, we leverage data versioning and time-travel capabilities in our feature store to show how provenance can enable model reproducibility and debugging.

‍

Sign up today

Your browser does not support JavaScript. You have to enable it to be able to submit your application and upload a CV. If you don't know how, either try from a different web browser or other computer/device.

Logical Clocks AB are the makers of Hopsworks, a data-intensive AI platform with a Feature Store.

Hopsworks For SageMaker For Databricks

Healthcare Finance Automotive Betting

Blog Newsletter Events Webinars Newsroom Community Research Whitepapers

About Us Career Contact

OpML20 - Time Travel and Provenance for Machine Learning Pipelines

Sign up today

Products

Solutions

Resources

Company