November 19, 2020
Alexandru A. Ormenisan, Moritz Meister, Fabio Buso, Robin Andersson, Seif Haridi, Jim Dowling
Implicit model for provenance can be used next to a feature store with versioned data to build reproducible and more easily debugged ML pipelines. We provide development tools and visualization support that can help developers more easily navigate and re-run pipelines .
November 19, 2020
Mahmoud Ismail, Salman Niazi, Gautier Berthou, Mikael Ronström, Seif Haridi, Jim Dowling
HopsFS-S3 is a hybrid cloud-native distributed hierarchical file system that is available across availability zones, has the same cost as S3, but has 100X the performance of S3 for file move/rename operations, and 3.4X the read throughput of S3 (EMRFS) for the DFSIO Benchmark.
November 19, 2020
Mahmoud Ismail, Salman Niazi, Mauritz Sundell, Mikael Ronstrom, Seif Haridi, and Jim Dowling
HopsFS-CL is a highly available distributed hierarchical file system with native support for AZ awareness using synchronous replication protocols.
November 19, 2020
Moritz Meister, Sina Sheikholeslami, Amir H. Payberah, Vladimir Vlassov, Jim Dowling
Maggy is an extension to Spark’s synchronous processing model to allow it to run asynchronous ML trials, enabling end-to-end state-of-the-art ML pipelines to be run fully on Spark. Maggy provides programming support for defining, optimizing, and running parallel ML trials.
February 24, 2020
Moritz Meister, Sina Sheikholeslami, Robin Andersson, Alexandru A. Ormenisan, Jim Dowling
The distribution oblivious training function allows ML developers to reuse the same training function when running a single host Jupyter notebook or performing scale-out hyperparameter search and distributed training on clusters.
February 24, 2020
Alexandru A. Ormenisan, Mahmoud Ismail, Seif Haridi, Jim Dowling
Implicit provenance allows us to capture full lineage for ML programs, by only instrumenting the distributed file system and APIs and with no changes to the ML code.
July 5, 2019
Mahmoud Ismail, August Bonds, Salman Niazi, Seif Haridi, Jim Dowling.
New version of block reporting protocol for HopsFS that uses up to 1/1000th of the resources of HDFS' block reporting protocol. IEEE BigDataCongress’19.
May 22, 2019
Mahmoud Ismail, Mikael Ronström, Seif Haridi, Jim Dowling.
Change Data Capture paper for HopsFS (ePipe). CCGRID’19.
March 20, 2019
Alexandru A. Ormenisan, Mahmoud Ismail, Kim Hammar, Robin Andersson, Ermias Gebremeskel, Theofilos Kakantousis, Antonios Kouzoupis, Fabio Buso, Gautier Berthou, Jim Dowling, Seif Haridi.
Paper description of a demo given for Hopsworks ML pipeline at SysML 2019.
December 18, 2018
Salman Niazi, Seif Haridi, Mikael Ronström, Jim Dowling.
Describes how HopsFS supports small files in metadata on NVMe disks. Middleware 2018.
May 24, 2017
Salman Niazi, Mahmoud Ismail, Mikael Ronström, Seif Haridi, Jim Dowling.
IEEE Scale Prize Winning submission, May 2017. Heavy on database optimizations in HopsFS' metadata layer.
February 7, 2017
Salman Niazi, Mahmoud Ismail, Mikael Ronström, Steffen Grohsschmiedt, Seif Haridi, Jim Dowling.
First main paper on HopsFS at USENIX FAST 2017.
June 18, 2015
Salman Niazi, Mahmoud Ismail, Gautier Berthou, Jim Dowling.
HopsFS' leader election protocol that uses NDB as a backend. DAIS 2015: 158-172.