Web UI How-To Guide
Practical guides for common tasks in the Kirin web interface. Each section walks you through a specific workflow with step-by-step instructions.
Daily Workflow: Adding New Data
Goal: Add new files to an existing dataset.
Steps:
- Navigate to your dataset (Home → Catalog → Dataset)
- Click the "Commit" tab
- Upload your files:
- Drag and drop files into the upload area, or
- Click "Choose Files" to browse
- Write a descriptive commit message (e.g., "Add Q4 sales data from CRM")
- Click "Create Commit"
Tips:
- Upload multiple files in one commit for related changes
- Use clear commit messages to understand changes later
- Check the summary before committing to verify what will be added
Updating Existing Files
Goal: Replace old files with updated versions.
Steps:
- Go to your dataset's "Commit" tab
- Upload files with the same names as existing files
- Write a commit message explaining the update (e.g., "Update customer data with latest export")
- Click "Create Commit"
What happens:
- New files replace old ones with the same names
- Old versions remain in commit history
- You can browse old commits to see previous versions
Removing Files
Goal: Remove files you no longer need.
Steps:
- Go to your dataset's "Commit" tab
- In the "Remove Files" section, check boxes next to files to remove
- Write a commit message explaining why (e.g., "Remove deprecated v1 data files")
- Click "Create Commit"
Important:
- Files are removed from the current commit but remain in history
- You can always browse old commits to see removed files
- Consider if you really want to remove files or just stop using them
Combining Add and Remove Operations
Goal: Add new files and remove old ones in a single commit.
Steps:
- Go to the "Commit" tab
- Upload new files in the upload section
- Check boxes to remove files in the remove section
- Write a commit message describing all changes
- Click "Create Commit"
Use case example:
- Adding updated data files while removing outdated ones
- Reorganizing dataset structure
- Cleaning up while adding new content
Viewing File Contents
Goal: Quickly check what's in a file without downloading it.
Steps:
- In the "Files" tab, click "Preview" next to any file
- For text files: See syntax-highlighted content
- For images: View the image directly
- For code files: See formatted code with highlighting
Features:
- Syntax highlighting for code files
- Line numbers for text files
- Inline image display
- Source file links for generated files (like plots)
Limitations:
- Very large files (>1000 lines) show first 1000 lines
- Binary files show a message instead of content
- Some file types may not preview perfectly
Comparing Versions
Goal: See what changed between two commits.
Steps:
- Go to the "History" tab
- Note the commit hashes of the two versions you want to compare
- Click "Browse Files" on the older commit
- Note the files and their sizes
- Navigate back and click "Browse Files" on the newer commit
- Compare the file lists, sizes, and contents
What to look for:
- Files added (appear in newer but not older)
- Files removed (appear in older but not newer)
- Files changed (same name, different size or hash)
- Total size differences
Tip: Good commit messages make it easier to understand why changes were made.
Setting Up a Cloud Catalog
Goal: Connect to an S3 bucket for team collaboration.
Prerequisites:
- AWS account with S3 access
- AWS CLI installed and configured
- Appropriate IAM permissions
Steps:
- Ensure you're authenticated with AWS:
aws sso login --profile your-profile
# or
aws configure
-
In the web UI, click "+ Add Catalog"
-
Fill in the form:
- Catalog Name: Something descriptive (e.g., "Production S3 Data")
- Root Directory:
s3://your-bucket-name/data - AWS Profile: Select your profile from the dropdown
-
Authentication Command:
aws sso login --profile your-profile(optional, enables auto-authentication) -
Click "Create Catalog"
What happens:
- The catalog connects to your S3 bucket
- You can now create datasets that store data in S3
- Auto-authentication runs the command when needed
For GCS:
- Use path:
gs://your-bucket-name/data - Auth command:
gcloud auth loginorgcloud auth application-default login
For Azure:
- Use path:
az://your-container-name/data - Auth command:
az login
Sharing a Dataset with Your Team
Goal: Let teammates access a dataset you created.
Steps:
- Ensure the catalog is accessible:
- For cloud storage: Ensure teammates have access to the bucket/container
-
For local storage: Use a shared directory or network drive
-
Share catalog configuration:
- Catalog name
- Root directory path
-
Authentication requirements (if any)
-
Teammates set up the catalog:
- They create the same catalog in their web UI
- They configure authentication if needed
-
They can now see all datasets in that catalog
-
Share dataset information:
- Dataset name
- Catalog it's in
- Any relevant commit hashes or descriptions
Tips:
- Use the "Python Access Code" section to share exact code for programmatic access
- Document catalog setup in your team's wiki or docs
- Consider using consistent catalog names across the team
Organizing Multiple Catalogs
Goal: Structure your catalogs for easy management.
Strategies:
By environment:
production-data→ Production S3 bucketstaging-data→ Staging S3 bucketlocal-dev→ Local development directory
By team:
analytics-team→ Analytics team's shared storageml-team→ Machine learning team's storageresearch-team→ Research team's storage
By project:
customer-segmentation→ Project-specific storageprice-optimization→ Another project's storage
Best practices:
- Use consistent naming conventions
- Document catalog purposes
- Keep related datasets in the same catalog
- Don't create too many catalogs—group related work together
Finding Files in Large Datasets
Goal: Quickly locate specific files when you have many files.
Steps:
- Go to your dataset's "Files" tab
- Use browser search (Ctrl+F / Cmd+F) to search file names
- Look for file type icons to filter visually
- Use the dataset's search functionality if available
Tips:
- Use descriptive filenames to make files easier to find
- Group related files with consistent naming patterns
- Consider splitting very large datasets into smaller, focused ones
Exporting Python Code
Goal: Get Python code to access your dataset programmatically.
Steps:
- Navigate to your dataset
- Look for the "Python Access Code" section (collapsible, usually at the top)
- Click to expand it
- Click "Copy" to copy the code
- Paste into your Python script or notebook
What you get:
- Complete code to create a Dataset object
- Proper authentication configuration
- Ready to use in your scripts
Example output:
from kirin import Dataset
dataset = Dataset(
root_dir="s3://my-bucket/data",
name="my-dataset",
aws_profile="production"
)
# Checkout to latest commit
dataset.checkout()
Troubleshooting Common Issues
Can't Connect to Cloud Storage
Symptoms: Catalog shows "Error" status or can't list datasets.
Solutions:
- Check authentication:
- Run the auth command manually:
aws sso login --profile your-profile - Verify credentials are valid
-
Check expiration for temporary credentials
-
Verify permissions:
- Ensure your credentials have read/write access to the bucket
-
Check IAM policies or bucket permissions
-
Check network:
- Verify you can access the cloud storage from your network
-
Check for firewall or VPN issues
-
Review error messages:
- The web UI shows specific error messages
- Use these to diagnose the issue
Files Won't Upload
Symptoms: Upload fails or hangs indefinitely.
Solutions:
- Check file size:
- Very large files (>100MB) may take time
-
Consider splitting large files
-
Verify permissions:
- Ensure you have write access to the catalog's storage
-
Check cloud storage permissions
-
Check network:
- For cloud storage, verify connection is stable
-
Try smaller files first to test
-
Review browser console:
- Check for JavaScript errors
- Look for network request failures
Can't See Datasets
Symptoms: Catalog shows "Unknown" dataset count or empty list.
Solutions:
- Wait for connection:
- Cloud catalogs may take a few seconds to connect
-
Refresh the page if it seems stuck
-
Verify path:
- Ensure the catalog path is correct
-
Check that the storage location exists
-
Check authentication:
- Verify you're authenticated with the cloud provider
-
Run auth commands manually if needed
-
Verify permissions:
- Ensure you have read access to list datasets
- Check IAM policies or storage permissions
Best Practices
Commit Messages
Write clear, descriptive messages:
- ✅ Good: "Add Q4 2024 sales data from CRM export"
- ✅ Good: "Update customer segmentation model with new features"
- ❌ Bad: "update"
- ❌ Bad: "files"
Why it matters: Good commit messages help you and your team understand what changed and why, especially when looking at history weeks or months later.
Dataset Organization
Keep datasets focused:
- ✅ Good: One dataset per use case or analysis
- ❌ Bad: One giant dataset with everything
Why it matters: Focused datasets are easier to understand, navigate, and share with others.
File Naming
Use descriptive, consistent names:
- ✅ Good:
customer-transactions-2024-q4.csv,model-training-features-v2.json - ❌ Bad:
data.csv,file1.json,stuff.xlsx
Why it matters: Clear names make files easier to find and understand, especially when working with multiple files.
Next Steps
- Web UI Overview - Understand the concepts and architecture
- Getting Started Tutorial - Step-by-step first setup
- Catalog Management - Advanced catalog configuration