tomoul run

One-shot generation. No server, no daemon. Pipe-friendly. Good for scripts, smoke tests, and shell pipelines.

Usage

$ tomoul run phi-4 -i "Write a haiku about a small bird called tomoul."
A tomoul takes flight—
Cloud edges trace its shortcut
Sky-stitched in silence.

Streaming output

Output prints token-by-token to stdout. Suppress with --no-stream if you're piping to a tool that wants whole-output buffering.

tomoul run phi-4 -i "Summarize this PR" --no-stream | jq -Rs .

Reading from stdin

cat README.md | tomoul run phi-4 -i "Summarize in 3 bullets:"
git diff     | tomoul run phi-4 -i "Suggest a commit message:"

Flags

FlagDefaultNotes
-i, --inputPrompt string (positional after the model also works).
--max-tokens512Generation cap.
--temperature0.7Sampling temperature.
--systemOptional system message.
--no-streamoffBuffer full output instead of streaming.
--cloudoffRun against api.tomoul.ai (requires auth).
--jsonoffEmit OpenAI-shape JSON instead of plain text.

vs serve

  • run exits when generation finishes. Use for one-shots and shell pipelines.
  • serve stays up and serves the OpenAI-compat API. Use when an app or IDE is the consumer.

Internally they share the same engine — the only difference is the I/O surface.

Last updated 13 May 2026Edit this page on GitHub