Routes every prompt to the right hosted Ollama model. Wraps chat, embeddings, catalog, and the OpenAI-compatible mirror — and ships an advise verb that picks a model by prompt features, session context, budget, and latency.
Ollama Cloud's catalog spans 20+ open-weights models (Qwen3, GPT-OSS, DeepSeek-V3, Kimi-K2, GLM, Llama, Gemma) and growing. No built-in picker exists. This CLI is the only one that combines live catalog metadata with prompt-feature extraction and emits a why/alternatives envelope you can pipe into other agents.
Authentication
Bearer auth via OLLAMA_CLOUD_API_KEY (intentionally distinct from any local-Ollama env var). Free tier is rate-limited weekly; the budget command surfaces exhaustion before workflows fail.
Quick Start
# Confirms auth, catalog reachable
ollama-cloud-pp-cli doctor
# Live catalog of hosted models
ollama-cloud-pp-cli tags --json
# Inspect drift between live catalog and curated metadata overlay
ollama-cloud-pp-cli advise --validate-catalog --json
# Pick a model for a specific prompt; pass any file path (or - for stdin)
ollama-cloud-pp-cli advise --prompt-file ./prompt.txt --task-hint coding --json
# Probe free-tier quota before launching long sessions
ollama-cloud-pp-cli budget --json
Unique Features
These capabilities aren't available in any other tool for this API.
Routing intelligence
-
advise — Picks the right Ollama Cloud model for a prompt by combining live catalog, heuristic prompt-feature extraction, curated cost/latency metadata, and an optional cheap meta-LLM tiebreak.
When an agent needs to pick a hosted Ollama model and the default routing is wrong, reach for advise instead of hardcoding the model name.
ollama-cloud-pp-cli advise --prompt-file ./prompt.txt --task-hint coding --budget-remaining-usd 0.50 --json
-
compare — Runs the same prompt against N hosted models in parallel and emits side-by-side response, tokens, and latency.
Use when calibrating advisor recommendations or picking between two close models.
ollama-cloud-pp-cli compare --prompt-file ./p.txt --models qwen3-coder:480b,gpt-oss:120b,deepseek-v3.1:671b --json
-
advise — With --explain, advise emits the full scoring trace: feature extraction, per-model scores, filter passes, tiebreak rationale.
Reach for this when an advise recommendation surprises you and you want to understand why.
ollama-cloud-pp-cli advise --prompt-file ./p.txt --explain --format md
Engagement canary
-
advise-replay — Replays advisor recommendations and reports divergence between recommended models and actually-chosen models. Foundation for the divergence canary; the prompt corpus is not retained so judge-LLM scoring is not in scope until a corpus sidecar ships.
Run weekly to detect advisor drift; surfaces divergence between recommended and actual-chosen models.
ollama-cloud-pp-cli advise-replay --since 7d --diverge-only --json --select rows,divergence_count,divergence_pct
Operations
-
budget — Probes the free-tier weekly cap with a 1-token chat. Parses Ollama Cloud's 429 prose and emits a structured verdict (ok | exhausted | unknown) with the upgrade URL so agents can pre-flight quota before launching long sessions.
Run before launching a long agent session to confirm quota is available.
ollama-cloud-pp-cli budget --json
-
cost-trace — Aggregates advisor-log cost estimates over a time window; compares per-model and per-task-hint spend.
Use to decide whether to upgrade to a paid Ollama Cloud tier.
ollama-cloud-pp-cli cost-trace --since 7d --group-by task-hint --json
Usage
Run ollama-cloud-pp-cli --help for the full command reference and flag list.
Commands
chat
Manage chat
ollama-cloud-pp-cli chat chat - Native Ollama chat endpoint. Supports streaming.
ollama-cloud-pp-cli chat completions - OpenAI-compatible chat completions endpoint.
embeddings
Manage embeddings
ollama-cloud-pp-cli embeddings embed - Native Ollama embeddings endpoint.
ollama-cloud-pp-cli embeddings openai-embed - Generate embeddings (OpenAI-compatible)
models
Manage models
ollama-cloud-pp-cli models models - Catalog in OpenAI list-models format.
ps
Manage ps
ollama-cloud-pp-cli ps ps - Shows currently-loaded models. On Ollama Cloud this typically reflects models with active sessions.
show
Manage show
ollama-cloud-pp-cli show show - Returns model metadata, template, modelfile, capabilities.
tags
Manage tags
ollama-cloud-pp-cli tags tags - Returns the live catalog of hosted Ollama Cloud models.
Output Formats
# Human-readable table (default in terminal, JSON when piped)
ollama-cloud-pp-cli chat chat --model example-value
# JSON for scripting and agents
ollama-cloud-pp-cli chat chat --model example-value --json
# Filter to specific fields
ollama-cloud-pp-cli chat chat --model example-value --json --select id,name,status
# Dry run — show the request without sending
ollama-cloud-pp-cli chat chat --model example-value --dry-run
# Agent mode — JSON + compact + no prompts in one flag
ollama-cloud-pp-cli chat chat --model example-value --agent
Agent Usage
This CLI is designed for AI agent consumption:
- Non-interactive - never prompts, every input is a flag
- Pipeable -
--json output to stdout, errors to stderr
- Filterable -
--select id,name returns only fields you need
- Previewable -
--dry-run shows the request without sending
- Explicit retries - add
--idempotent to create retries when a no-op success is acceptable
- Confirmable -
--yes for explicit confirmation of destructive actions
- Piped input - write commands can accept structured input when their help lists
--stdin
- Offline-friendly - sync/search commands can use the local SQLite store when available
- Agent-safe by default - no colors or formatting unless
--human-friendly is set
Exit codes: 0 success, 2 usage error, 3 not found, 4 auth error, 5 API error, 7 rate limited, 10 config error.
Health Check
ollama-cloud-pp-cli doctor
Verifies configuration, credentials, and connectivity to the API.
Configuration
Config file: ~/.config/ollama-cloud-pp-cli/config.toml
Static request headers can be configured under headers; per-command header overrides take precedence.
Environment variables:
| Name | Kind | Required | Description |
|---|
OLLAMA_CLOUD_API_KEY | per_call | Yes | Set to your API credential. |
Troubleshooting
Authentication errors (exit code 4)
- Run
ollama-cloud-pp-cli doctor to check credentials
- Verify the environment variable is set:
echo $OLLAMA_CLOUD_API_KEY
Not found errors (exit code 3)
- Check the resource ID is correct
- Run the
list command to see available items
API-specific
- HTTP 429 with 'you have reached your weekly usage limit' — Free tier exhausted. Run
ollama-cloud-pp-cli budget --json to confirm; pick a non-rate-limited model with advise --exclude <exhausted-models>; or upgrade at https://ollama.com/upgrade
- advise returns unexpected model — Run
advise --explain --format md to see the scoring trace. Adjust --task-hint, --exclude, or the curated models.json overlay.
- 401 unauthorized — Confirm OLLAMA_CLOUD_API_KEY is set; the CLI does NOT read OLLAMA_API_KEY (intentional, to avoid local-daemon collisions)
- advise picks a model not in /api/tags — Catalog snapshot is stale. Run
tags --no-cache to repopulate the local SQLite snapshot.
Sources & Inspiration
This CLI was built by studying these projects and resources:
Generated by CLI Printing Press