Setup Cloud Storage for Kirin
This guide shows you how to configure Kirin to work with AWS S3, Google Cloud Storage, and Azure Blob Storage. Each section covers authentication setup and creating your first catalog with cloud storage.
Prerequisites
- AWS account with S3 access (for S3 setup)
- Google Cloud project with GCS access (for GCS setup)
- Azure account with Blob Storage access (for Azure setup)
- Basic familiarity with Kirin datasets and catalogs
AWS S3 Setup
Configure Kirin to use AWS S3 as your storage backend.
Step 1: Configure AWS Credentials
Set up AWS credentials using one of these methods:
Option A: AWS CLI Configuration (Recommended)
aws configure --profile {{ aws_profile }}
# Enter your AWS Access Key ID
# Enter your AWS Secret Access Key
# Enter your default region (e.g., us-east-1)
Option B: Environment Variables
export AWS_ACCESS_KEY_ID="{{ access_key_id }}"
export AWS_SECRET_ACCESS_KEY="{{ secret_access_key }}"
export AWS_DEFAULT_REGION="{{ region }}"
Option C: IAM Role (for EC2/ECS/Lambda)
If running on AWS infrastructure, IAM roles are automatically used.
Step 2: Create S3 Bucket
Create an S3 bucket for your Kirin data:
aws s3 mb s3://{{ bucket_name }} --region {{ region }}
Or use the AWS Console to create a bucket with appropriate permissions.
Step 3: Create Catalog with S3
Use the S3 URL as your root_dir:
from kirin import Catalog
# Using AWS profile
catalog = Catalog(
root_dir="s3://{{ bucket_name }}/data",
aws_profile="{{ aws_profile }}"
)
# Using environment variables (no profile needed)
catalog = Catalog(root_dir="s3://{{ bucket_name }}/data")
# Create a dataset
dataset = catalog.create_dataset(
name="{{ dataset_name }}",
description="My cloud dataset"
)
Step 4: Verify S3 Setup
Test that your setup works by creating a commit:
from pathlib import Path
# Create a test file
test_file = Path("test.txt")
test_file.write_text("Hello from S3!")
# Commit to dataset
commit_hash = dataset.commit(
message="Test commit",
add_files=[str(test_file)]
)
print(f"✅ Successfully committed to S3: {commit_hash}")
What just happened? (S3)
- Kirin authenticated with AWS using your credentials
- Created dataset metadata in S3
- Stored your file using content-addressed storage in S3
- All future operations will use S3 as the backend
Google Cloud Storage Setup
Configure Kirin to use Google Cloud Storage as your storage backend.
Step 1: Create Service Account
Create a service account with Storage Object Admin permissions:
# Create service account
gcloud iam service-accounts create {{ service_account_name }} \
--display-name="Kirin Storage Service Account"
# Grant Storage Object Admin role
gcloud projects add-iam-policy-binding {{ project_id }} \
--member="serviceAccount:{{ service_account_name }}@{{ project_id }}.iam.gserviceaccount.com" \
--role="roles/storage.objectAdmin"
# Create and download key
gcloud iam service-accounts keys create {{ service_account_name }}-key.json \
--iam-account={{ service_account_name }}@{{ project_id }}.iam.gserviceaccount.com
Step 2: Create GCS Bucket
Create a GCS bucket for your Kirin data:
gsutil mb -p {{ project_id }} -l {{ region }} gs://{{ bucket_name }}
Or use the Google Cloud Console to create a bucket.
Step 3: Create Catalog with GCS
Use the GCS URL as your root_dir and provide the service account key:
from kirin import Catalog
catalog = Catalog(
root_dir="gs://{{ bucket_name }}/data",
gcs_token="/path/to/{{ service_account_name }}-key.json",
gcs_project="{{ project_id }}"
)
# Create a dataset
dataset = catalog.create_dataset(
name="{{ dataset_name }}",
description="My cloud dataset"
)
Alternative: Application Default Credentials
If you're running on Google Cloud infrastructure (Compute Engine, Cloud Run, etc.), you can use Application Default Credentials:
# No token needed - uses default credentials
catalog = Catalog(
root_dir="gs://{{ bucket_name }}/data",
gcs_project="{{ project_id }}"
)
Step 4: Verify GCS Setup
Test that your setup works:
from pathlib import Path
# Create a test file
test_file = Path("test.txt")
test_file.write_text("Hello from GCS!")
# Commit to dataset
commit_hash = dataset.commit(
message="Test commit",
add_files=[str(test_file)]
)
print(f"✅ Successfully committed to GCS: {commit_hash}")
What just happened? (GCS)
- Kirin authenticated with GCS using your service account
- Created dataset metadata in GCS
- Stored your file using content-addressed storage in GCS
- All future operations will use GCS as the backend
Azure Blob Storage Setup
Configure Kirin to use Azure Blob Storage as your storage backend.
Step 1: Create Storage Account
Create an Azure Storage Account and container:
# Create resource group
az group create --name {{ resource_group }} --location {{ location }}
# Create storage account
az storage account create \
--name {{ storage_account_name }} \
--resource-group {{ resource_group }} \
--location {{ location }} \
--sku Standard_LRS
# Create container
az storage container create \
--name {{ container_name }} \
--account-name {{ storage_account_name }} \
--auth-mode login
Step 2: Get Connection String
Retrieve the connection string for authentication:
az storage account show-connection-string \
--name {{ storage_account_name }} \
--resource-group {{ resource_group }} \
--output tsv
The connection string looks like:
DefaultEndpointsProtocol=https;AccountName=...;AccountKey=...;EndpointSuffix=core.windows.net
Step 3: Create Catalog with Azure
Use the Azure URL as your root_dir and provide the connection string:
from kirin import Catalog
catalog = Catalog(
root_dir="az://{{ container_name }}/data",
azure_connection_string="{{ connection_string }}"
)
# Create a dataset
dataset = catalog.create_dataset(
name="{{ dataset_name }}",
description="My cloud dataset"
)
Alternative: Environment Variable
You can also set the connection string as an environment variable:
export AZURE_STORAGE_CONNECTION_STRING="{{ connection_string }}"
Then create the catalog without the connection string parameter:
catalog = Catalog(root_dir="az://{{ container_name }}/data")
Step 4: Verify Azure Setup
Test that your setup works:
from pathlib import Path
# Create a test file
test_file = Path("test.txt")
test_file.write_text("Hello from Azure!")
# Commit to dataset
commit_hash = dataset.commit(
message="Test commit",
add_files=[str(test_file)]
)
print(f"✅ Successfully committed to Azure: {commit_hash}")
What just happened? (Azure)
- Kirin authenticated with Azure using your connection string
- Created dataset metadata in Azure Blob Storage
- Stored your file using content-addressed storage in Azure
- All future operations will use Azure as the backend
Next Steps
Now that you have cloud storage configured, you can:
- Create multiple datasets in your catalog
- Commit files to version control your data
- Access files from anywhere using the same API
- Use cloud storage for production workflows
See the Cloud Storage Overview tutorial for more details on working with remote files.