Data Warehousing

Hopsworks supports both Apache Hive (LLAP) and SparkSQL as data warehousing solutions. These SQL-on-Hadoop platforms enable the efficient querying of potentially petabytes of data. Both platforms can store their backing data in HopsFS or on an external object store, like S3. In Hopsworks, Hive’s metadata is kept consistent with HopsFS’ metadata through sharing the same database and using transactions and foreign keys to ensure the integrity of both HopsFS’ metadata and HIve’s metadata. We also support SparkSQL and Parquet.

SQL-on-Hops with Hive and SparkSQL