Seamlessly Setting Up Apache Superset with Minikube and PyAthena: A Complete Guide
Unlocking the Power of Data Visualization: A Guide to Integrating Apache Superset, Minikube, and PyAthena for Enhanced Data Insights
Apache Superset is an exceptional open-source data exploration and visualization tool, boasting compatibility with numerous data sources. Here, I'll walk you through setting up Apache Superset locally using Minikube and enhancing it with PyAthena for querying AWS Athena databases. It’s worth noting, that looking into Apache Iceberg and pursuing Data Lakehouse architecture as opposed to Athena is worth it. This is a use case for visualizing data in the data lake if your data stack operates that way. Additionally, if you want to know what this might look like in a production environment, please reach out to me for any questions.
Prerequisites:
Before diving in, ensure you have the following:
Kubernetes Cluster via Minikube
Helm, the Kubernetes package manager
Getting Started with Kubernetes and Minikube
Kubernetes, or K8s, underpins this setup by orchestrating our containerized applications. If you're new to Kubernetes or need to install Minikube on your machine, find a comprehensive guide here.
Helm: The Kubernetes Package Manager
Helm simplifies Kubernetes application deployment. For an introduction or more details on Helm, their official documentation is an excellent resource.
Deploying Apache Superset Locally with Minikube
Follow these steps to get Apache Superset up and running:
1. Verify Your Kubernetes Cluster:
Check the status of your Minikube cluster with:
minikube status
You should see a response indicating that Minikube is running correctly.
2. Add the Superset Helm Repository:
helm repo add superset <https://apache.github.io/superset>
3. Search for Superset Charts:
helm search repo superset
4. Configure Your Settings:
Create a directory like ~/helm-chart-files
for your helm-chart values file: ~/helm-chart-files/superset-helm-values.yaml
. Ensure to update the SECRET_KEY
in the values file:
SECRET_KEY: 'YOUR_OWN_RANDOM_GENERATED_SECRET_KEY'
Generate a secret key with:
openssl rand -base64 42
5. Install and Run Apache Superset:
Deploy Superset using Helm with your customized values:
helm upgrade --install --values ~/helm-chart-files/superset-helm-values.yaml superset superset/superset
Follow the terminal instructions to access Superset locally.
kubectl port-forward service/superset 8088:8088 --namespace default
Integrating PyAthena
PyAthena extends our setup, allowing Superset to query AWS Athena. Here’s how to integrate it:
Activate Minikube's Docker Environment:
eval $(minikube docker-env)
Build a Custom Docker Image:
First, create a Dockerfile:
FROM apache/superset:latest
USER root
RUN pip install PyAthena
USER superset
Then, build the Docker image:
docker build --platform=linux/amd64 -t superset-pyathena:latest .
Update Your Helm Values:
Modify superset-values.yaml
to use your newly built image.
image:
repository: superset-pyathena
tag: latest
pullPolicy: IfNotPresent
What your directory should look like now:
superset-pyathena-project
.
├── docker
│ └── Dockerfile
└── helm-charts
└── apache-superset
├── Chart.lock
├── Chart.yaml
├── superset-values.yaml
└── templates
├── NOTES.txt
├── _helpers.tpl
├── configmap-superset.yaml
├── deployment-beat.yaml
├── deployment-flower.yaml
├── deployment-worker.yaml
├── deployment-ws.yaml
├── deployment.yaml
├── hpa-node.yaml
├── hpa-worker.yaml
├── ingress.yaml
├── init-job.yaml
├── secret-env.yaml
├── secret-superset-config.yaml
The other files like Chart.yaml
, Chart.lock
, and templates
can be copied from the default superset directory. However, at this point, you want to run:
helm dependency build
This will add your charts for Postgres and Redis.
├── charts
│ ├── postgresql-12.1.6.tgz
│ └── redis-17.9.4.tgz
On to deployment!
Deploying Your Custom Superset:
With your Docker image ready, deploy Superset through Helm:
helm upgrade --install superset . -f superset-values.yaml
This will output:
Release "superset" has been upgraded. Happy Helming!
NAME: superset
LAST DEPLOYED: Thu Feb 1 19:41:48 2024
NAMESPACE: default
STATUS: deployed
REVISION: 12
TEST SUITE: None
NOTES:
1. Get the application URL by running these commands:
echo "Visit http://127.0.0.1:8088 to use your application"
kubectl port-forward service/superset 8088:8088 --namespace defaultTo access Superset:
kubectl port-forward service/superset 8088:8088 --namespace default
Finally, navigate to http://localhost:8088
in your browser.
Connecting to AWS Athena
Inside Superset, add a new database connection to Athena by navigating through Settings > Databases and inputting your connection string under SQLALCHEMY URI.
Wrapping Up
Congratulations! You've successfully set up Apache Superset locally with Minikube and integrated PyAthena for querying AWS Athena. This setup provides a powerful environment for data exploration, visualization, and analysis through the power of modern data tools.
For database connection details and more advanced configurations, refer to the respective documentation of Superset and PyAthena.
Happy Data Exploring!