Local development workflow

This repository is the hosted web application: the Rust portal, the Cloudflare Worker, and the dashboard frontend. The customer CLI is maintained separately in kvcachestore/kvcdn-cli so its release cycle can follow inference-tooling changes without coupling to the web service. When you are working on the portal or Worker, you often need a .kv file to exercise upload, metadata parsing, storage routing, and dashboard workflows. The kvcdn-cli repository includes an offline placeholder generator that writes a syntactically valid .kv artifact with the same JSON envelope but no tensor data.

Why a placeholder generator exists

Real KV caches are large, model-specific, and produced by a transformer forward pass. The right place to build them is inside the inference stack that owns the model weights and tokenizer. The customer CLI’s job is to validate, package, and transport artifacts once they exist. The placeholder generator exists only to unblock development and integration testing. It emits a valid artifact with correct metadata so you can test presigned uploads, worker routing, artifact listing, and visibility controls without a GPU.

Generate a placeholder artifact

Build the CLI from the external repository and run the development generator:
git clone https://github.com/kvcachestore/kvcdn-cli
cd kvcdn-cli
cargo build --release
./target/release/kvcdn dev generate \
  --model "Qwen/Qwen3-0.6B" \
  --dtype "F32" \
  --embedding "default" \
  --d 4096 \
  --r 128 \
  --prompt "Once upon a time" \
  --output ./artifacts

Options

OptionRequiredDefaultDescription
--modelyesModel name, used in the artifact metadata
--dtypeyesData type, e.g. F32, F16, BF16, I8
--embeddingyesEmbedding name or variant
--dyesModel dimension / hidden size
--ryesNumber of KV heads or rank
--promptnopromptPrompt text used to infer token count for metadata
--outputno.Directory where the artifact file is written
The output filename follows the pattern <model>_<dtype>_<N>tok.kv, with /, \, and : replaced by _ in the model name. The token count is derived from the number of whitespace-separated words in --prompt.

Placeholder format

The generated file is JSON with the following structure:
{
  "format_version": 1,
  "meta": {
    "model_name": "Qwen/Qwen3-0.6B",
    "dtype": "F32",
    "num_tokens": 3
  },
  "prompt": "Once upon a time",
  "kv": []
}
The kv array is empty in the placeholder. To produce real KV-cache tensors, integrate the artifact writer with your local transformer inference pipeline so it serializes key/value tensors in the same JSON envelope.

Upload the artifact

Once you have a .kv file, upload it with the customer CLI:
kvcdn upload ./context.kv --name "..." --visibility private
This requests a presigned URL from the portal, PUTs the file directly to object storage, and confirms the upload. See the upload command reference and the hosted upload flow for details.

View uploaded artifacts

Open the dashboard at https://kvcachestore.com/app to see artifacts in your active project.

Next steps