Ecosystem#

Zerfoo is a family of Go modules that together form a complete ML inference and training stack. Each module has its own go.mod, versioning, and release cycle.

Dependency Graph#

float16 ──┐
           ├──► ztensor ──► zerfoo
float8  ──┘                    ▲
ztoken ────────────────────────┘
  • float16 and float8 provide reduced-precision arithmetic
  • ztensor builds tensors, compute engines, and computation graphs on top of them
  • zerfoo combines ztensor and ztoken into a full inference, training, and serving framework
  • ztoken is independent (zero external deps) and plugs directly into zerfoo

Which Module to Import#

You want to…Import
Run transformer inference or serve modelsgithub.com/zerfoo/zerfoo
Work with tensors, GPU compute, or computation graphsgithub.com/zerfoo/ztensor
Tokenize text (BPE, HuggingFace, GGUF)github.com/zerfoo/ztoken
Do Float16 or BFloat16 arithmeticgithub.com/zerfoo/float16
Do FP8 E4M3FN arithmeticgithub.com/zerfoo/float8
Convert ONNX models to GGUFgithub.com/zerfoo/zonnx (CLI)

Modules#

ztensor#

GPU-accelerated tensor, compute engine, and computation graph library. Provides the compute.Engine[T] interface that powers all arithmetic in the ecosystem. Supports CUDA, ROCm, and OpenCL backends loaded at runtime via purego – zero CGo.

ztoken#

BPE tokenizer with HuggingFace tokenizer.json and GGUF tokenizer extraction. Handles SentencePiece compatibility for Llama-family models. Zero external dependencies.

Numeric Types (float16 + float8)#

IEEE 754 half-precision (Float16), Brain Floating Point (BFloat16), and FP8 E4M3FN (Float8) arithmetic libraries. Used by ztensor for quantized tensor storage and mixed-precision compute.

zonnx#

ONNX-to-GGUF converter CLI. Standalone binary with no runtime dependencies on the other modules. Converts ONNX models into GGUF format for use with zerfoo.