Site Reliability Engineer

Application Deadline: 
Open
Job Location: 
Stockholm, Sweden

Description

Hopsworks is a platform for Data-Intensive AI with a Feature Store, a warehouse for machine learning features. Logical Clocks’ vision is to simplify the process of refining data into intelligence at scale. At Logical Clocks, you will be part of a fast paced startup environment with the possibility to contribute to the entire product and open-source projects.

As a site reliability engineer, you will be working on making every piece of the Hopsworks platform highly available, autoscaling and easily deployable across different cloud and on-premise environments. You will be responsible for the design and implementation of a software architecture and operational infrastructure to fulfill these requirements.

Responsibilities

  • Design and implementation of a highly-available and autoscaling infrastructure
  • Handling deployments across cloud and on-premise environments
  • Building infrastructure for monitoring and alerting
  • Working with customers to adjust deployments to their needs
  • Driving best practices around building production grade software
  • Be an active member of the community by presenting at conferences and meetups

Who you are? You probably have:

  • Experience in operating large-scale distributed systems in production
  • In-depth experience with Java, Python is a plus
  • Experience with Chef, Puppet or Ansible
  • Experience with Docker and Kubernetes
  • Knowledge of building and operating cloud infrastructure
  • Knowledge of Spark, Kafka and Hadoop is a plus

Apply for this position