Implicit model for provenance can be used next to a feature store with versioned data to build reproducible and more easily debugged ML pipelines. We provide development tools and visualization support that can help developers more easily navigate and re-run pipelines .

Alexandru A. Ormenisan, Moritz Meister, Fabio Buso, Robin Andersson, Seif Haridi, Jim Dowling

November 19, 2020

Read

Maggy is an extension to Spark’s synchronous processing model to allow it to run asynchronous ML trials, enabling end-to-end state-of-the-art ML pipelines to be run fully on Spark. Maggy provides programming support for defining, optimizing, and running parallel ML trials.

Moritz Meister, Sina Sheikholeslami, Amir H. Payberah, Vladimir Vlassov, Jim Dowling

November 19, 2020

Read

HopsFS-S3 is a hybrid cloud-native distributed hierarchical file system that is available across availability zones, has the same cost as S3, but has 100X the performance of S3 for file move/rename operations, and 3.4X the read throughput of S3 (EMRFS) for the DFSIO Benchmark.

Mahmoud Ismail, Salman Niazi, Gautier Berthou, Mikael Ronström, Seif Haridi, Jim Dowling

November 19, 2020

Read

HopsFS-CL is a highly available distributed hierarchical file system with native support for AZ awareness using synchronous replication protocols.

Mahmoud Ismail, Salman Niazi, Mauritz Sundell, Mikael Ronstrom, Seif Haridi, and Jim Dowling

November 19, 2020

Read

The distribution oblivious training function allows ML developers to reuse the same training function when running a single host Jupyter notebook or performing scale-out hyperparameter search and distributed training on clusters.

Moritz Meister, Sina Sheikholeslami, Robin Andersson, Alexandru A. Ormenisan, Jim Dowling

February 24, 2020

Read

Implicit provenance allows us to capture full lineage for ML programs, by only instrumenting the distributed file system and APIs and with no changes to the ML code.

Alexandru A. Ormenisan, Mahmoud Ismail, Seif Haridi, Jim Dowling

February 24, 2020

Read

New version of block reporting protocol for HopsFS that uses up to 1/1000th of the resources of HDFS' block reporting protocol. IEEE BigDataCongress’19.

Mahmoud Ismail, August Bonds, Salman Niazi, Seif Haridi, Jim Dowling.

July 5, 2019

Read

Change Data Capture paper for HopsFS (ePipe). CCGRID’19.

Mahmoud Ismail, Mikael Ronström, Seif Haridi, Jim Dowling.

May 22, 2019

Read

Paper description of a demo given for Hopsworks ML pipeline at SysML 2019.

Alexandru A. Ormenisan, Mahmoud Ismail, Kim Hammar, Robin Andersson, Ermias Gebremeskel, Theofilos Kakantousis, Antonios Kouzoupis, Fabio Buso, Gautier Berthou, Jim Dowling, Seif Haridi.

March 20, 2019

Read

Describes how HopsFS supports small files in metadata on NVMe disks. Middleware 2018.

Salman Niazi, Seif Haridi, Mikael Ronström, Jim Dowling.

December 18, 2018

Read

IEEE Scale Prize Winning submission, May 2017. Heavy on database optimizations in HopsFS' metadata layer.

Salman Niazi, Mahmoud Ismail, Mikael Ronström, Seif Haridi, Jim Dowling.

May 24, 2017

Read

First main paper on HopsFS at USENIX FAST 2017.

Salman Niazi, Mahmoud Ismail, Mikael Ronström, Steffen Grohsschmiedt, Seif Haridi, Jim Dowling.

February 7, 2017

Read

HopsFS' leader election protocol that uses NDB as a backend. DAIS 2015: 158-172.

Salman Niazi, Mahmoud Ismail, Gautier Berthou, Jim Dowling.

June 18, 2015

Read