Cloud Storage Authentication Guide
When using kirin with cloud storage backends (S3, GCS, Azure, etc.), you need
to provide authentication credentials. This guide shows you how to authenticate
with different cloud providers using the new cloud-agnostic authentication parameters.
New Cloud-Agnostic Authentication (Recommended)
Kirin now supports cloud-agnostic authentication parameters that work across all cloud providers:
AWS/S3 Authentication
from kirin import Catalog, Dataset
# Using AWS profile
catalog = Catalog(
root_dir="s3://my-bucket/data",
aws_profile="my-profile"
)
# Using Dataset with AWS profile
dataset = Dataset(
root_dir="s3://my-bucket/data",
name="my-dataset",
aws_profile="my-profile"
)
GCP/GCS Authentication
from kirin import Catalog, Dataset
# Using service account key file
catalog = Catalog(
root_dir="gs://my-bucket/data",
gcs_token="/path/to/service-account.json",
gcs_project="my-project"
)
# Using Dataset with GCS credentials
dataset = Dataset(
root_dir="gs://my-bucket/data",
name="my-dataset",
gcs_token="/path/to/service-account.json",
gcs_project="my-project"
)
Azure Blob Storage Authentication
from kirin import Catalog, Dataset
# Using connection string
catalog = Catalog(
root_dir="az://my-container/data",
azure_connection_string=os.getenv("AZURE_CONNECTION_STRING")
)
# Using account name and key
dataset = Dataset(
root_dir="az://my-container/data",
name="my-dataset",
azure_account_name="my-account",
azure_account_key="my-key"
)
Web UI Integration
The web UI also supports cloud authentication through the
CatalogConfig.to_catalog() method:
from kirin.web.config import CatalogConfig
# Create catalog config with cloud auth
config = CatalogConfig(
id="my-catalog",
name="My Catalog",
root_dir="s3://my-bucket/data",
aws_profile="my-profile"
)
# Convert to runtime catalog
catalog = config.to_catalog()
Legacy Authentication Methods
The following sections show the legacy authentication methods that still work but are less convenient than the new cloud-agnostic approach.
Google Cloud Storage (GCS)
Error You Might See
gcsfs.retry.HttpError: Anonymous caller does not have storage.objects.list access
to the Google Cloud Storage bucket. Permission 'storage.objects.list' denied on
resource (or it may not exist)., 401
Solutions
Option 1: Application Default Credentials (Recommended)
Use gcloud CLI to set up credentials:
# Install gcloud CLI first, then:
gcloud auth application-default login
Then use kirin normally:
from kirin.dataset import Dataset
# Will automatically use your gcloud credentials
ds = Dataset(root_dir="gs://my-bucket/datasets", dataset_name="my_data")
Option 2: Service Account Key File
from kirin.dataset import Dataset
import fsspec
# Create filesystem with service account
fs = fsspec.filesystem(
'gs',
token='/path/to/service-account-key.json'
)
# Pass it to Dataset
ds = Dataset(
root_dir="gs://my-bucket/datasets",
dataset_name="my_data",
fs=fs # Use authenticated filesystem
)
Option 3: Environment Variable
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account-key.json"
from kirin.dataset import Dataset
# Will automatically use the credentials from environment variable
ds = Dataset(root_dir="gs://my-bucket/datasets", dataset_name="my_data")
Option 4: Pass Credentials Directly
import fsspec
from kirin.dataset import Dataset
fs = fsspec.filesystem(
'gs',
project='my-project-id',
token='cloud' # Uses gcloud credentials
)
ds = Dataset(root_dir="gs://my-bucket/datasets", dataset_name="my_data", fs=fs)
Amazon S3
Option 1: AWS CLI Credentials (Recommended)
# Configure AWS credentials
aws configure
Then use normally:
from kirin.dataset import Dataset
ds = Dataset(root_dir="s3://my-bucket/datasets", dataset_name="my_data")
Option 2: Environment Variables
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export AWS_DEFAULT_REGION="us-east-1"
Option 3: Pass Credentials Explicitly
import fsspec
from kirin.dataset import Dataset
fs = fsspec.filesystem(
's3',
key='your-access-key',
secret='your-secret-key',
client_kwargs={'region_name': 'us-east-1'}
)
ds = Dataset(root_dir="s3://my-bucket/datasets", dataset_name="my_data", fs=fs)
S3-Compatible Services (Minio, Backblaze B2, DigitalOcean Spaces)
Minio Example
import fsspec
from kirin.dataset import Dataset
fs = fsspec.filesystem(
's3',
key='your-access-key',
secret='your-secret-key',
client_kwargs={
'endpoint_url': 'http://localhost:9000' # Your Minio endpoint
}
)
ds = Dataset(root_dir="s3://my-bucket/datasets", dataset_name="my_data", fs=fs)
Backblaze B2 Example
import fsspec
from kirin.dataset import Dataset
fs = fsspec.filesystem(
's3',
key='your-application-key-id',
secret='your-application-key',
client_kwargs={
'endpoint_url': 'https://s3.us-west-002.backblazeb2.com'
}
)
ds = Dataset(root_dir="s3://my-bucket/datasets", dataset_name="my_data", fs=fs)
Azure Blob Storage
Option 1: Azure CLI Credentials
az login
Option 2: Connection String
import fsspec
from kirin.dataset import Dataset
fs = fsspec.filesystem(
'az',
connection_string='your-connection-string'
)
ds = Dataset(root_dir="az://container/path", dataset_name="my_data", fs=fs)
Option 3: Account Key
import fsspec
from kirin.dataset import Dataset
fs = fsspec.filesystem(
'az',
account_name='your-account-name',
account_key='your-account-key'
)
ds = Dataset(root_dir="az://container/path", dataset_name="my_data", fs=fs)
General Pattern
For any cloud provider, the recommended pattern is:
- New Cloud-Agnostic Approach (Recommended):
from kirin import Catalog, Dataset
# For AWS/S3
catalog = Catalog(root_dir="s3://bucket/path", aws_profile="my-profile")
dataset = Dataset(root_dir="s3://bucket/path", name="my_data",
aws_profile="my-profile")
# For GCS
catalog = Catalog(root_dir="gs://bucket/path", gcs_token="/path/to/key.json",
gcs_project="my-project")
dataset = Dataset(root_dir="gs://bucket/path", name="my_data",
gcs_token="/path/to/key.json", gcs_project="my-project")
# For Azure
catalog = Catalog(root_dir="az://container/path", azure_connection_string="...")
dataset = Dataset(root_dir="az://container/path", name="my_data", azure_connection_string="...")
- Auto-detect (works if credentials are already configured):
ds = Dataset(root_dir="protocol://bucket/path", name="my_data")
- Legacy explicit authentication (still supported):
import fsspec
from kirin.dataset import Dataset
# Create authenticated filesystem
fs = fsspec.filesystem('protocol', **auth_kwargs)
# Pass to Dataset
ds = Dataset(root_dir="protocol://bucket/path", name="my_data", fs=fs)
Testing Without Credentials
For local development/testing without cloud access, you can use:
# In-memory filesystem (no cloud needed)
from kirin.dataset import Dataset
ds = Dataset(root_dir="memory://test-data", dataset_name="my_data")
Troubleshooting
Issue: "Anonymous caller" or "Access Denied"
- Cause: No credentials provided
- Solution: Set up credentials using one of the methods above
Issue: "Permission denied"
- Cause: Credentials don't have required permissions
- Solution: Ensure your IAM role/service account has read/write permissions on the bucket
Issue: "Bucket does not exist"
- Cause: Bucket name is incorrect or doesn't exist
- Solution: Create the bucket first or check the bucket name
Issue: Import errors like "No module named 's3fs'"
- Cause: Cloud storage package not installed
- Solution: Install required package:
pip install s3fs # For S3
pip install gcsfs # For GCS
pip install adlfs # For Azure
Known Issues
macOS Python 3.13 SSL Certificate Verification
On macOS with Python 3.13, you may encounter SSL certificate verification errors when using cloud storage backends. This is a known Python/macOS issue.
Workaround: The web UI skips connection testing during backend creation. Backends are validated when actually used. If you encounter SSL errors during actual usage, install certificates:
/Applications/Python\ 3.13/Install\ Certificates.command
Or use certifi:
pip install --upgrade certifi
Web UI Auto-Authentication
NEW: The Kirin web UI can automatically execute authentication commands when needed.
Configuring Auto-Authentication
When creating or editing a catalog in the web UI, you can provide an optional authentication command:
AWS S3:
Auth Command: aws sso login --profile {{ aws_profile }}
GCP GCS:
Auth Command: gcloud auth login
Azure Blob Storage:
Auth Command: az login
How Auto-Authentication Works
- Catalog Access: When you click on a catalog, Kirin attempts to connect
- Authentication Check: If authentication fails or times out, Kirin checks for stored auth command
- Auto-Execution: If found, Kirin executes the command automatically (30-second timeout)
- Retry: If authentication succeeds, Kirin retries loading the datasets
- Feedback: Success/failure messages are shown in the UI
Benefits
- No manual CLI: Authentication happens automatically in the web UI
- Faster workflow: No need to switch to terminal and run commands
- Clear feedback: You see exactly what happened (success or failure)
- Manual fallback: You can still authenticate manually if needed
Timeout Protection
To prevent the UI from hanging on slow or failed connections:
- 10-second timeout for catalog operations (listing datasets)
- 5-second timeout for individual dataset loading
- 30-second timeout for authentication commands
- Clear error messages when timeouts occur
Security Considerations
- Commands are stored locally in your catalog configuration file
- No passwords stored: Auth commands only contain profile names or trigger browser-based OAuth
- Subprocess isolation: Commands run in isolated subprocesses with timeout protection