Industrial PhD Position - Scalable ML Pipelines and Deep Anomaly Detection

Application Deadline: 
2020/10/31
Job Location: 
Stockholm, Sweden

Are you interested in distributed systems and machine learning? The EIT Doctoral School offers two industrial doctorate positions at the Swedish Company Logical Clocks to investigate (1) scalable machine learning algorithms to handle the increasing need for faster training and larger models and (2) deep anomaly detection. For both projects, you will develop novel techniques to scale data and compute for machine learning and deep learning.

Case studies will be built to assess the performance and systems characteristics (security, stability, programmability, ease-of-operations, etc)  of the frameworks developed.

Hopsworks

Hopsworks is a full-stack open-source data science platform for developing and operating AI applications at scale. Hopsworks builds on both Hops, a next generation filesystem and GPU-aware resource scheduler, and a Feature Store, a data warehouse for your ML features. Hopsworks has first-class support for Python, Spark, Flink and Jupyter notebooks. Notebooks are not just for experimentation, they can also be included in production pipelines. You can design end-to-end ML pipelines in Airflow, another component of Hopsworks, and run them directly in the platform. Hopsworks also has a unique security model based around project-based multi-tenancy. Finally, Hopsworks is available as both an Enterprise platform and as a managed platform on AWS and Azure.

Challenge Scalable ML Pipelines

Machine learning (ML) pipelines are becoming the accepted abstraction for putting AI applications in production, by automating the retraining and deployment of models when new training data becomes available. This PhD will address the systems challenges of building self-optimizing ML pipelines that adapt their operation, resource consumption, and data processing to their environment and user/system requirements. The key contribution will be algorithms and tools that provide certain guarantees and manage the data for multi-stage ML pipelines all the way from data ingestion through data preparation, feature extraction, model training and serving, to continuous model monitoring. These tools will be part of Logical Clocks’ enterprise software platform offering, Hopsworks. As a result of this PhD, the Hopsworks platform will strengthen its position as a leading global platform for operationalizing ML applications. ML pipelines on Hopsworks will become more intelligent, resulting in reduced resource consumption and lower management costs through self-management. These innovations will drive down the cost of operationalizing AI with Hopsworks, helping to improve the competitiveness of Logical Clocks as a high-tech company selling a horizontal platform for AI.

Challenge Deep Anomaly Detection 

Unsupervised learning, and more recently self-supervised learning problems using Generative Adversarial Networks (GANs) . The main challenge we will address in this thesis is to show how scaling systems software to add more compute and data will enable the reduction of structure in deep learning architectures. The expected result is more accurate Anomaly Detection systems that are more easily retrained to handle a non-stationary environment.

This research will necessitate the development of distributed storage and algorithms to scale out GANs to handle challenges related to scale, fault tolerance and minimized resource utilization. This PhD work will aim to overcome these challenges, which have prevented developing new storage file formats and indexing for time-series data, and new distributed algorithms to reduce training time for GANs. The research methodology employed will be systems research, which involves building and experimenting with software artifacts to understand their behaviour in a given environment.

Expected outcomes

The main result of both of these works will be frameworks and systems support for scalable deep learning and machine learning and anomaly detection, respectively. The second outcome of the PhD will be a collection of case studies where we show, for different machine learning architectures and anomaly detection domains, respectively, how your contributions push forward the state of the art.

Location

The doctoral students involved in this programme will share its time between the Co-Location Centre of the EIT Digital Stockholm Node, the premises of Logical Clocks and a three to six-month mobility in a European University/Research Centre/Company will also be part of the programme.

Facts

  • Industrial partner: Logical Clocks AB
  • Academic/research partner: KTH Royal Institute of Technology
  • Number of available PhD positions: 2
  • Duration: 4 years
  • This PhD will be funded by EIT Digital and Logical Clocks AB

Apply

Those interested in applying should send an e-mail to davit@logicalclocks.com, including a CV, a motivation letter, and documents showing their academic track records - a MSc in Computer Science or equivalent is a prerequisite.

Please apply before 31 October, 2020.

Apply for this position