Engine quickstart

Load a model from the registry, run one forward pass. ~15 lines.

The examples below match the modules in src/models/. Exact APIs may shift pre-1.0 — see Stability & versioning and check each model's README for the source of truth.

Voice activity detection

The simplest model in the catalog. ~2.2 MB, runs in the browser via WASM.

const std = @import("std");
const tomoul = @import("tomoul");
 
pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const a = gpa.allocator();
 
    var vad = try tomoul.models.silero_vad.load(a, .{});
    defer vad.deinit();
 
    const audio = try loadWav("clip.wav");
    const segments = try vad.detect(audio);
    for (segments) |s| std.debug.print("{d}-{d}ms\n", .{ s.start_ms, s.end_ms });
}

Embed text

var model = try tomoul.models.bge_m3.load(a, .{ .device = .auto });
defer model.deinit();
 
const vec = try model.embed("Habari za asubuhi.");
defer a.free(vec);
std.debug.print("dim={d} first={d:.4}\n", .{ vec.len, vec[0] });

Punctuate a raw transcript

var punct = try tomoul.models.xlm_roberta_punctuation.load(a, .{ .device = .auto });
defer punct.deinit();
 
const fixed = try punct.restore("habari za asubuhi karibu Tomoul");
defer a.free(fixed);
// → "Habari za asubuhi. Karibu, Tomoul."

Generate text (LLM)

var model = try tomoul.models.qwen3_5.load(a, .{ .device = .auto });
defer model.deinit();
 
const out = try model.generate("Write a haiku.", .{ .max_new_tokens = 64 });
defer a.free(out);
std.debug.print("{s}\n", .{out});

Where weights come from

Model loaders read weights from the local cache (default ~/.cache/tomoul/). Pre-fetch with tomoul pull <slug> from the CLI, or pass an explicit path. Pre-converted .tl release artifacts live at huggingface.co/tomoul. To convert your own HF weights, use the Python scripts in tools/.

Worked examples

Six end-to-end examples live in examples/:

  • silero-vad
  • whisper
  • whisper-demo
  • qwen3_5-chat
  • sentence-transformer
  • punctuation-demo

These are the canonical references — consult them before writing your own integration.

Last updated 13 May 2026Edit this page on GitHub