Superset Docker Requirements

Introduction

In this post, we will expand our Superset configuration to support additional databases: Snowflake, Google BigQuery, Google Sheets, and Elasticsearch. Although we will be focusing on these specific databases for this tutorial, Superset supports many more. You can refer to the full list of supported databases and their respective drivers in the official Superset documentation.

Step 1: Create a requirements-local.txt file

First, we need to create a requirements-local.txt file within the ./docker directory. This file will be used to specify the additional database drivers required for our desired databases.

# Navigate to the docker directory
cd ./docker

# Create the requirements-local.txt file
touch requirements-local.txt

Step 2: Add database drivers to the requirements-local.txt file

Open the requirements-local.txt file in your favorite text editor and add the following lines:

snowflake-sqlalchemy<=1.2.4
pybigquery
elasticsearch-dbapi
gsheetsdb

These lines specify the necessary drivers for Snowflake, Google BigQuery, Elasticsearch, and Google Sheets, respectively.

Step 3: Modify the superset_config.py file

To display the preferred databases in the Superset UI, we need to modify the superset_config.py file. Add the following code snippet:

PREFERRED_DATABASES = [
    "Apache Druid",
    "Google BigQuery",
    "Snowflake",
    "Google Sheets",
    "PostgreSQL",
]

This will ensure that our desired databases are displayed in the UI.

Step 4 (Optional): Add database images to the UI

This step is optional but recommended for a more polished user experience. We will add custom images for each supported database in the Superset UI. Please note that these changes may not work when using Docker Compose, but they will be effective in production when building the image.

First, create a new YAML file named superset_text.yml with the following content:

DB_IMAGES:
  snowflake: "/static/assets/images/database_logo/snowflake.jpeg"
  postgresql: "/static/assets/images/database_logo/postgres.jpg"
  druid: "/static/assets/images/database_logo/druid.png"
  bigquery: "/static/assets/images/database_logo/bq.png"
  gsheets: "/static/assets/images/database_logo/gsheets.png"
  presto: "/static/assets/images/database_logo/prestodb.png"

Make sure to place the corresponding image files for each database in the specified paths within the Superset static assets directory.

That's it! You've successfully extended your Superset configuration to support Snowflake, Google BigQuery, Google Sheets, and Elasticsearch. You can now use these databases as data sources within your Superset instance.