Local development workflow

This repository is the hosted web application: the Rust portal, the Cloudflare Worker, and the dashboard frontend. The customer CLI is maintained separately in kvcachestore/kvcdn-cli so its release cycle can follow inference-tooling changes without coupling to the web service. When you are working on the portal or Worker, you often need a .kv file to exercise upload, metadata parsing, storage routing, and dashboard workflows. The kvcdn-cli repository includes an offline placeholder generator that writes a syntactically valid .kv artifact with the same JSON envelope but no tensor data.

Why a placeholder generator exists

Real KV caches are large, model-specific, and produced by a transformer forward pass. The right place to build them is inside the inference stack that owns the model weights and tokenizer. The customer CLI’s job is to validate, package, and transport artifacts once they exist. The placeholder generator exists only to unblock development and integration testing. It emits a valid artifact with correct metadata so you can test presigned uploads, worker routing, artifact listing, and visibility controls without a GPU.

Generate a placeholder artifact

Build the CLI from the external repository and run the development generator:

git clone https://github.com/kvcachestore/kvcdn-cli
cd kvcdn-cli
cargo build --release
./target/release/kvcdn dev generate \
  --model "Qwen/Qwen3-0.6B" \
  --dtype "F32" \
  --embedding "default" \
  --d 4096 \
  --r 128 \
  --prompt "Once upon a time" \
  --output ./artifacts

Options

Option	Required	Default	Description
`--model`	yes	—	Model name, used in the artifact metadata
`--dtype`	yes	—	Data type, e.g. `F32`, `F16`, `BF16`, `I8`
`--embedding`	yes	—	Embedding name or variant
`--d`	yes	—	Model dimension / hidden size
`--r`	yes	—	Number of KV heads or rank
`--prompt`	no	`prompt`	Prompt text used to infer token count for metadata
`--output`	no	`.`	Directory where the artifact file is written

The output filename follows the pattern <model>_<dtype>_<N>tok.kv, with /, \, and : replaced by _ in the model name. The token count is derived from the number of whitespace-separated words in --prompt.

Placeholder format

The generated file is JSON with the following structure:

{
  "format_version": 1,
  "meta": {
    "model_name": "Qwen/Qwen3-0.6B",
    "dtype": "F32",
    "num_tokens": 3
  },
  "prompt": "Once upon a time",
  "kv": []
}

The kv array is empty in the placeholder. To produce real KV-cache tensors, integrate the artifact writer with your local transformer inference pipeline so it serializes key/value tensors in the same JSON envelope.

Upload the artifact

Once you have a .kv file, upload it with the customer CLI:

kvcdn upload ./context.kv --name "..." --visibility private

This requests a presigned URL from the portal, PUTs the file directly to object storage, and confirms the upload. See the upload command reference and the hosted upload flow for details.

View uploaded artifacts

Open the dashboard at https://kvcachestore.com/app to see artifacts in your active project.

Next steps

Read the per-command CLI reference.
Learn how to upload artifacts to the hosted service.
Understand the architecture of the CLI, portal, and Worker.

​Local development workflow

​Why a placeholder generator exists

​Generate a placeholder artifact

​Options

​Placeholder format

​Upload the artifact

​View uploaded artifacts

​Next steps