Hopsworks is a platform for Data-Intensive AI with a Feature Store, a warehouse for machine learning features. Logical Clocks’ vision is to simplify the process of refining data into intelligence at scale. At Logical Clocks, you will be part of a fast paced startup environment with the possibility to contribute to the entire product and open-source projects.
As a site reliability engineer, you will be working on making every piece of the Hopsworks platform highly available, autoscaling and easily deployable across different cloud and on-premise environments. You will be responsible for the design and implementation of a software architecture and operational infrastructure to fulfill these requirements.
- Design and implementation of a highly-available and autoscaling infrastructure
- Handling deployments across cloud and on-premise environments
- Building infrastructure for monitoring and alerting
- Working with customers to adjust deployments to their needs
- Driving best practices around building production grade software
- Be an active member of the community by presenting at conferences and meetups
Who you are? You probably have:
- Experience in operating large-scale distributed systems in production
- In-depth experience with Java, Python is a plus
- Experience with Chef, Puppet or Ansible
- Experience with Docker and Kubernetes
- Knowledge of building and operating cloud infrastructure
- Knowledge of Spark, Kafka and Hadoop is a plus