Ollama Cloud

ollama-cloud-pp-cli

Routes every prompt to the right hosted Ollama model. Wraps chat, embeddings, catalog, and the OpenAI-compatible...

Printed by @rvdlaar (Rick van de Laar)

Install

npx — recommended

npx -y @mvanhorn/printing-press-library install ollama-cloud

Installs the CLI binary and the agent skill (Claude Code, Codex, Cursor, Gemini CLI, Copilot, and more). Add --cli-only or --skill-only for one half. Requires Node.

Hermes

hermes skills install mvanhorn/printing-press-library/cli-skills/pp-ollama-cloud --force

From the Hermes CLI. Inside a chat session use /skills install with the same path.

OpenClaw

Install the pp-ollama-cloud skill from https://github.com/mvanhorn/printing-press-library/tree/main/cli-skills/pp-ollama-cloud. The skill defines how its required CLI can be installed.

Paste this to your OpenClaw agent — the skill describes how to install its CLI.

Claude Desktop

This CLI ships an MCPB bundle. Download the latest release .mcpb and double-click it — Claude Desktop walks you through the install.

View source on GitHub →

Routes every prompt to the right hosted Ollama model. Wraps chat, embeddings, catalog, and the OpenAI-compatible mirror — and ships an advise verb that picks a model by prompt features, session context, budget, and latency.

Ollama Cloud's catalog spans 20+ open-weights models (Qwen3, GPT-OSS, DeepSeek-V3, Kimi-K2, GLM, Llama, Gemma) and growing. No built-in picker exists. This CLI is the only one that combines live catalog metadata with prompt-feature extraction and emits a why/alternatives envelope you can pipe into other agents.

Created by @rvdlaar (Rick van de Laar).

Authentication

Bearer auth via OLLAMA_CLOUD_API_KEY (intentionally distinct from any local-Ollama env var). Free tier is rate-limited weekly; the budget command surfaces exhaustion before workflows fail.

Quick Start

# Confirms auth, catalog reachable
ollama-cloud-pp-cli doctor

# Live catalog of hosted models
ollama-cloud-pp-cli tags --json

# Inspect drift between live catalog and curated metadata overlay
ollama-cloud-pp-cli advise --validate-catalog --json

# Pick a model for a specific prompt; pass any file path (or - for stdin)
ollama-cloud-pp-cli advise --prompt-file ./prompt.txt --task-hint coding --json

# Probe free-tier quota before launching long sessions
ollama-cloud-pp-cli budget --json

Unique Features

These capabilities aren't available in any other tool for this API.

Routing intelligence

advise — Picks the right Ollama Cloud model for a prompt by combining live catalog, heuristic prompt-feature extraction, curated cost/latency metadata, and an optional cheap meta-LLM tiebreak.

When an agent needs to pick a hosted Ollama model and the default routing is wrong, reach for advise instead of hardcoding the model name.
```
ollama-cloud-pp-cli advise --prompt-file ./prompt.txt --task-hint coding --budget-remaining-usd 0.50 --json
```
compare — Runs the same prompt against N hosted models in parallel and emits side-by-side response, tokens, and latency.

Use when calibrating advisor recommendations or picking between two close models.
```
ollama-cloud-pp-cli compare --prompt-file ./p.txt --models qwen3-coder:480b,gpt-oss:120b,deepseek-v3.1:671b --json
```
advise — With --explain, advise emits the full scoring trace: feature extraction, per-model scores, filter passes, tiebreak rationale.

Reach for this when an advise recommendation surprises you and you want to understand why.
```
ollama-cloud-pp-cli advise --prompt-file ./p.txt --explain --format md
```

Engagement canary

advise-replay — Replays advisor recommendations and reports divergence between recommended models and actually-chosen models. Foundation for the divergence canary; the prompt corpus is not retained so judge-LLM scoring is not in scope until a corpus sidecar ships.

Run weekly to detect advisor drift; surfaces divergence between recommended and actual-chosen models.
```
ollama-cloud-pp-cli advise-replay --since 7d --diverge-only --json --select rows,divergence_count,divergence_pct
```

Operations

budget — Probes the free-tier weekly cap with a 1-token chat. Parses Ollama Cloud's 429 prose and emits a structured verdict (ok | exhausted | unknown) with the upgrade URL so agents can pre-flight quota before launching long sessions.

Run before launching a long agent session to confirm quota is available.
```
ollama-cloud-pp-cli budget --json
```
cost-trace — Aggregates advisor-log cost estimates over a time window; compares per-model and per-task-hint spend.

Use to decide whether to upgrade to a paid Ollama Cloud tier.
```
ollama-cloud-pp-cli cost-trace --since 7d --group-by task-hint --json
```

Usage

Run ollama-cloud-pp-cli --help for the full command reference and flag list.

Commands

chat

Manage chat

ollama-cloud-pp-cli chat chat - Native Ollama chat endpoint. Supports streaming.
ollama-cloud-pp-cli chat completions - OpenAI-compatible chat completions endpoint.

embeddings

Manage embeddings

ollama-cloud-pp-cli embeddings embed - Native Ollama embeddings endpoint.
ollama-cloud-pp-cli embeddings openai-embed - Generate embeddings (OpenAI-compatible)

models

Manage models

ollama-cloud-pp-cli models models - Catalog in OpenAI list-models format.

ps

Manage ps

ollama-cloud-pp-cli ps ps - Shows currently-loaded models. On Ollama Cloud this typically reflects models with active sessions.

show

Manage show

ollama-cloud-pp-cli show show - Returns model metadata, template, modelfile, capabilities.

Output Formats

# Human-readable table (default in terminal, JSON when piped)
ollama-cloud-pp-cli chat chat --model example-value

# JSON for scripting and agents
ollama-cloud-pp-cli chat chat --model example-value --json

# Filter to specific fields
ollama-cloud-pp-cli chat chat --model example-value --json --select id,name,status

# Dry run — show the request without sending
ollama-cloud-pp-cli chat chat --model example-value --dry-run

# Agent mode — JSON + compact + no prompts in one flag
ollama-cloud-pp-cli chat chat --model example-value --agent

Agent Usage

This CLI is designed for AI agent consumption:

Non-interactive - never prompts, every input is a flag
Pipeable - --json output to stdout, errors to stderr
Filterable - --select id,name returns only fields you need
Previewable - --dry-run shows the request without sending
Explicit retries - add --idempotent to create retries when a no-op success is acceptable
Confirmable - --yes for explicit confirmation of destructive actions
Piped input - write commands can accept structured input when their help lists --stdin
Offline-friendly - sync/search commands can use the local SQLite store when available
Agent-safe by default - no colors or formatting unless --human-friendly is set

Exit codes: 0 success, 2 usage error, 3 not found, 4 auth error, 5 API error, 7 rate limited, 10 config error.

Health Check

ollama-cloud-pp-cli doctor

Verifies configuration, credentials, and connectivity to the API.

Configuration

Config file: ~/.config/ollama-cloud-pp-cli/config.toml

Static request headers can be configured under headers; per-command header overrides take precedence.

Environment variables:

Name	Kind	Required	Description
`OLLAMA_CLOUD_API_KEY`	per_call	Yes	Set to your API credential.

Troubleshooting

Authentication errors (exit code 4)

Run ollama-cloud-pp-cli doctor to check credentials
Verify the environment variable is set: echo $OLLAMA_CLOUD_API_KEY Not found errors (exit code 3)
Check the resource ID is correct
Run the list command to see available items

API-specific

HTTP 429 with 'you have reached your weekly usage limit' — Free tier exhausted. Run ollama-cloud-pp-cli budget --json to confirm; pick a non-rate-limited model with advise --exclude <exhausted-models>; or upgrade at https://ollama.com/upgrade
advise returns unexpected model — Run advise --explain --format md to see the scoring trace. Adjust --task-hint, --exclude, or the curated models.json overlay.
401 unauthorized — Confirm OLLAMA_CLOUD_API_KEY is set; the CLI does NOT read OLLAMA_API_KEY (intentional, to avoid local-daemon collisions)
advise picks a model not in /api/tags — Catalog snapshot is stale. Run tags --no-cache to repopulate the local SQLite snapshot.

Sources & Inspiration

This CLI was built by studying these projects and resources:

openai-python — Python (22000 stars)
litellm — Python (14000 stars)
ollama-python — Python (5200 stars)
ollama-js — TypeScript (4500 stars)
aichat — Rust (3800 stars)

Generated by CLI Printing Press