Scania is the leading supplier of vehicle solutions for sustainable transport and engines in more than 100 countries.
We introduced the Hopsworks platform in Scania’s infrastructure to provide access to external researchers to share and use data related to GPS positions from connected vehicles without compromising data security.
Scania’s Fleet telematics big data analytics for vehicle Usage Modelling and Analysis (FUMA) project aims to increase the understanding of heavy vehicles by compiling and processing enormous amounts of data related to GPS positions, generated by the company’s global 300,000 connected trucks, buses and lorries.
For security reasons, Scania requires a high performance machine learning development and processing environment that allows restricted access to their data by third parties. The goal was to enable development and large-scale processing by external researchers while keeping their data sandboxed in the development environment.
Integrated Data Science Platform
Easy collaboration between external consultants and internal data scientists and engineers when managing, sharing, and processing large datasets.
Faster Data Processing
Massively parallel data processing pipeline for massive datasets.
Improved models that generate more revenue.
Scania deployed the Hopsworks scalable data platform to support multiple users working with the same data without compromising data security.
Because Hopsworks’s Multi Tenant Security Model is built around projects, Scania can invite external researchers, data scientists and data engineers into projects and give them restricted access to datasets. External researchers can analyse the data, but not download or update it - thus, keeping the data secure inside the platform.
Hopsworks’ key capabilities that we used are:
Live betting is the process of placing a bet after a sporting or racing event has started. We help organisations to reduce operational costs, improve accuracy of prediction models, and increase revenue by implementing the Hopsworks Feature Store to serve low latency features.
Similar to many AI-backed Internet services, companies offering live betting can benefit from online models generating predictions that can be used to help determine live odds in races and sports betting.
Online models require many input features to make accurate predictions, including low latency access to features that are computed from historical data. These types of features are often too complex to compute inside the online applications themselves and are impossible to reuse if they are embedded in applications.
When you implement a feature computation in the online application, you then have to ensure the consistency of the online feature implementation (in the app) with the feature implementation used to generate the train/test data for the model (the training data pipeline).
Consolidated Feature Engineering Pipelines
Feature engineering code is not duplicated in applications, instead a single pipeline computes features for serving and training.
Faster Models to Production
Data scientists can concentrate on improving models, and not on complex infrastructure for ensuring training and serving pipelines are kept in sync.
An online feature store can reuse features across training and serving, and provide low-latency access to features by online applications.
The Hopsworks Online Feature Store acts as an enterprise-wide marketplace for different teams with different remits. It serves pre-computed features to operational models used by online applications in single-digit milliseconds using either Python/Scala/Java clients or language independent JDBC.
It also enables the reuse of features and use case-specific ML-features for predictive betting models where features are reused across different models.
Hopsworks’ key capabilities for developing and operation online models:
Hopsworks is the world’s first horizontally scalable data platform for machine learning to provide a feature store. It aids in the cleaning of data and preparation of features, and it makes features reusable by other teams.
The Hopsworks Feature Store acts as an effective API between team members who are working on data engineering (and pulling data from backend data warehouses and data lakes) versus those working on data science (model building, training, and evaluation).
Security by design: Data scientists can be given sandboxed access to sensitive data, complying with GDPR and stronger security requirements.
Scale-out deep learning: Distributed Deep Learning over 10s or 100s of GPUs for parallel experiments and distributed training.
Provenance support for ML pipelines: Enables fully reproducible models, easier debugging, and comprehensive data governance for pipelines.
Integration with third party platforms: Seamless integration with data science platforms, such as AWS Sagemaker , Databricks and Kubeflow. Hopsworks also integrates with datalakes, such as S3, Hadoop, and Delta Lake. Hopsworks also supports single sign-on for ActiveDirectory, LDAP, and OAuth2.