A data cleanroom is a secure, hosted repository for data you've purchased.

Our data cleanrooms are access controlled, hosted Iceberg data lakes. A cleanroom is required for purchasing a data subscription; this is where we deliver your data. Cleanrooms are free, create up to 50, but most customers use a single cleanroom for all of their subscriptions. Access to a cleanroom requires an AWS IAM user or role; see Managing Cleanrooms for adding and removing access.

Cleanroom Structure

Our data cleanrooms are hosted Iceberg data lakes with parquet files stored in S3 and a Glue metadata catalog.

Iceberg

Iceberg is a popular open source (Apache), high-performance format for huge analytic tables. We've selected this format to maximize compatibility with popular data tooling. You can operate on your data in-place, directly querying, connecting application tooling like PySpark, or set up as an external table for platforms like Snowflake and Databricks. Or you can simply ETL your data into your own stack, Airflow, dbt, fivetran, and many others all work seamlessly.

Data Storage

Purchased data is stored as parquet files in AWS S3 following the Iceberg open table format. You can elect to access the raw data directly, skipping the Iceberg table management entirely.

Metadata Catalog

Each cleanroom comes with a hosted metadata catalog in AWS Glue describing your data assets and corresponding schemas. The catalog is compatible with most if not all Iceberg integrations and a wide variety of query engines, including Athena, Spark, Flink, and more.