Model support

kvcdn and the hosted service treat the artifact format as a model-agnostic container. The CLI verifies, quantizes, and uploads .kv files without requiring model weights. Supported model-specific generation is on the roadmap.

Current behavior

You produce .kv artifacts with your own inference stack. The artifact metadata includes model_name, dtype, and num_tokens. The actual tensor contents are opaque to the portal; only the SHA-256 checksum is verified on upload.

Roadmap

Planned future work includes:
  • Integrating a local transformer inference pipeline so the CLI can generate artifacts with real key/value tensors.
  • Supporting common decoder-only architectures such as Llama, Mistral, Mixtral, Qwen, and Yi.
  • Dispatching on the model’s config.json architectures field to select the right adapter.

Adding a new family

When real generation is implemented, adding support for a new decoder-only architecture will require:
  1. Implementing a CausalLM trait in the generation module.
  2. Registering the architecture name in a model registry.
For now, you can produce .kv files with your own tooling and use kvcdn verify, kvcdn upload, and the dashboard to manage them.

See also