Kirin Documentation

Welcome to Kirin - simplified "git" for data versioning!

What is Kirin?

Kirin is a simplified tool for version-controlling data using content-addressed storage. It provides linear commit history for datasets without the complexity of branching and merging.

Key Benefits:

Linear versioning: Simple, Git-like commits without branching complexity
Content-addressed storage: Files stored by content hash for integrity and deduplication
Cloud support: Works with S3, GCS, Azure, and more
Ergonomic API: Designed for data science workflows
Zero-copy operations: Efficient handling of large files

Quick Start

from kirin import Catalog, Dataset

# Create a catalog (works with local and cloud storage)
catalog = Catalog(root_dir="/path/to/data")  # Local storage
catalog = Catalog(root_dir="s3://my-bucket")  # S3 storage

# Get or create a dataset
ds = catalog.get_dataset("my_dataset")

# Commit files
commit_hash = ds.commit(message="Initial commit", add_files=["file1.csv"])

# Work with files locally
with ds.local_files() as local_files:
    csv_path = local_files["file1.csv"]
    content = Path(csv_path).read_text()

Documentation

Getting Started

Quickstart - Get up and running in 5 minutes
Installation - Installation options and setup
Core Concepts - Understanding datasets, commits, and content-addressing

User Guides

Basic Usage - Essential workflows for working with datasets
Cloud Storage - Set up and use cloud storage backends
Working with Files - File operations and data science integration
Commit Management - Understanding and working with commit history

Web UI

Web UI Overview - Getting started with the web interface
Catalog Management - Advanced catalog configuration

Reference

API Reference - Complete API documentation
Storage Format - Technical storage details

Architecture

Architecture Overview - System architecture and design principles

Why Kirin Exists

Kirin addresses critical needs in machine learning and data science workflows:

Linear Data Versioning: Track changes to datasets with simple, linear commits
Content-Addressed Storage: Ensure data integrity and enable deduplication
Multi-Backend Support: Work with S3, GCS, Azure, local filesystem, and more
Serverless Architecture: No dedicated servers required
Ergonomic Python API: Focus on ease of use and developer experience
File Versioning: Track changes to individual files over time

Common Use Cases

Experiment tracking: Version your training data and model inputs
Data pipeline versioning: Track changes in ETL processes
Collaborative research: Share datasets with exact version control
Reproducible analysis: Ensure you can recreate your results
MLOps workflows: Deploy models with exact data dependencies

Installation

# Option 1: Using pixi (recommended for development)
git clone git@github.com:nll-ai/kirin
cd kirin
pixi install
pixi run kirin ui

# Option 2: Using uv tool (recommended for production)
uv tool install kirin
uv run kirin ui

# Option 3: Using uvx (one-time use)
uvx kirin ui

Next Steps

Quickstart - Try Kirin with a simple example
Core Concepts - Understand how Kirin works
Basic Usage - Learn common workflows
Cloud Storage - Set up cloud storage