--cloud bridge

Same CLI, same model slugs, same OpenAI-compat API — but routes to `api.tomoul.ai` instead of your local GPU. The local-to-cloud handoff is one flag.

Beta.

--cloud is in beta. The exact flag and config shape may shift before GA. Treat this page as the spec we're targeting, not a frozen contract.

Why use it

You wrote your app against tomoul serve on your laptop. Production needs more GPU than your laptop has. With --cloud, the same localhost:8080 endpoint your app already calls becomes a proxy to api.tomoul.ai. Zero code changes in the application.

Setup

tomoul auth login         # opens browser, stores key in OS keychain
tomoul serve microsoft/phi-4 --cloud
# Listening on http://127.0.0.1:8080 (cloud bridge → api.tomoul.ai)

Usage

Identical to local serve. Your app keeps calling http://127.0.0.1:8080/v1/.... The CLI translates requests to https://api.tomoul.ai/v1/..., attaches your key, and streams the response back over your local socket.

Auth

Use tomoul auth login (OAuth, stored in OS keychain) or set TOMOUL_KEY in the environment.

Local fallback

Add --cloud=auto to prefer local when the model is cached and the GPU has headroom, fall back to cloud otherwise. Useful for dev laptops that flip between offline and online.

tomoul serve microsoft/phi-4 --cloud=auto

Pure --cloud always routes to the cloud; --cloud=auto is the hybrid.

Last updated 13 May 2026Edit this page on GitHub