tomoul run

One-shot generation. No server, no daemon. Pipe-friendly. Good for scripts, smoke tests, and shell pipelines.

Usage

$ tomoul run phi-4 -i "Write a haiku about a small bird called tomoul."
A tomoul takes flight—
Cloud edges trace its shortcut
Sky-stitched in silence.

Streaming output

Output prints token-by-token to stdout. Suppress with --no-stream if you're piping to a tool that wants whole-output buffering.

tomoul run phi-4 -i "Summarize this PR" --no-stream | jq -Rs .

Reading from stdin

cat README.md | tomoul run phi-4 -i "Summarize in 3 bullets:"
git diff     | tomoul run phi-4 -i "Suggest a commit message:"

Flags

Flag	Default	Notes
`-i, --input`	—	Prompt string (positional after the model also works).
`--max-tokens`	512	Generation cap.
`--temperature`	0.7	Sampling temperature.
`--system`	—	Optional system message.
`--no-stream`	off	Buffer full output instead of streaming.
`--cloud`	off	Run against `api.tomoul.ai` (requires auth).
`--json`	off	Emit OpenAI-shape JSON instead of plain text.

vs serve

run exits when generation finishes. Use for one-shots and shell pipelines.
serve stays up and serves the OpenAI-compat API. Use when an app or IDE is the consumer.

Internally they share the same engine — the only difference is the I/O surface.

← Previous

Overview

tomoul serve

Last updated 13 May 2026Edit this page on GitHub