Ecosystem#
Zerfoo is a family of Go modules that together form a complete ML inference and training stack. Each module has its own go.mod, versioning, and release cycle.
Dependency Graph#
float16 ──┐
├──► ztensor ──► zerfoo
float8 ──┘ ▲
│
ztoken ────────────────────────┘- float16 and float8 provide reduced-precision arithmetic
- ztensor builds tensors, compute engines, and computation graphs on top of them
- zerfoo combines ztensor and ztoken into a full inference, training, and serving framework
- ztoken is independent (zero external deps) and plugs directly into zerfoo
Which Module to Import#
| You want to… | Import |
|---|---|
| Run transformer inference or serve models | github.com/zerfoo/zerfoo |
| Work with tensors, GPU compute, or computation graphs | github.com/zerfoo/ztensor |
| Tokenize text (BPE, HuggingFace, GGUF) | github.com/zerfoo/ztoken |
| Do Float16 or BFloat16 arithmetic | github.com/zerfoo/float16 |
| Do FP8 E4M3FN arithmetic | github.com/zerfoo/float8 |
| Convert ONNX models to GGUF | github.com/zerfoo/zonnx (CLI) |
Modules#
ztensor#
GPU-accelerated tensor, compute engine, and computation graph library. Provides the compute.Engine[T] interface that powers all arithmetic in the ecosystem. Supports CUDA, ROCm, and OpenCL backends loaded at runtime via purego – zero CGo.
ztoken#
BPE tokenizer with HuggingFace tokenizer.json and GGUF tokenizer extraction. Handles SentencePiece compatibility for Llama-family models. Zero external dependencies.
Numeric Types (float16 + float8)#
IEEE 754 half-precision (Float16), Brain Floating Point (BFloat16), and FP8 E4M3FN (Float8) arithmetic libraries. Used by ztensor for quantized tensor storage and mixed-precision compute.
zonnx#
ONNX-to-GGUF converter CLI. Standalone binary with no runtime dependencies on the other modules. Converts ONNX models into GGUF format for use with zerfoo.