Contributing to Zerfoo#
Thank you for your interest in contributing to Zerfoo, the Go-native ML inference and training framework. This guide covers the full Zerfoo ecosystem and applies to all six repositories.
Code of Conduct#
All participants in the Zerfoo community are expected to treat each other with respect and professionalism. We are committed to providing a welcoming and inclusive environment for everyone.
Repository Structure#
Zerfoo is an ecosystem of six independent repositories (each with its own go.mod, CI, and releases):
| Repository | Module | Purpose |
|---|---|---|
| zerfoo | github.com/zerfoo/zerfoo | Core ML framework: inference, training, serving |
| ztensor | github.com/zerfoo/ztensor | GPU-accelerated tensor, compute engine, computation graph |
| ztoken | github.com/zerfoo/ztoken | BPE tokenizer with HuggingFace compatibility |
| zonnx | github.com/zerfoo/zonnx | ONNX-to-GGUF converter CLI |
| float16 | github.com/zerfoo/float16 | IEEE 754 half-precision (Float16/BFloat16) arithmetic |
| float8 | github.com/zerfoo/float8 | FP8 E4M3FN arithmetic for quantized inference |
Dependency graph:
float16 --+
float8 --+--> ztensor --> zerfoo
ztoken --+
zonnx (standalone)Each repo is versioned and released independently. Do not treat this as a monorepo – submit PRs to the repository where the change belongs.
Development Setup#
Prerequisites#
- Go 1.26+ (generics with
tensor.Numericconstraint) - Git
- CUDA Toolkit (optional, for GPU-accelerated tests and development)
Clone and Build#
Each repository builds independently:
# Clone whichever repo you want to work on
git clone https://github.com/zerfoo/<repo>.git
cd <repo>
go mod tidy
go test ./...No CGo is required for CPU-only builds. GPU support is loaded dynamically at runtime via purego/dlopen, so go build ./... works on any platform without a C compiler.
Running Tests#
go test ./... # All CPU tests (no GPU required)
go test -race ./... # Tests with race detector (required before submitting)
go test -tags cuda ./... # GPU tests (requires CUDA toolkit and a GPU)
go test -coverprofile=coverage.out ./... # Coverage report
go tool cover -html=coverage.out -o coverage.htmlTesting Requirements#
- All new code must have tests
- Use table-driven tests with
t.Runsubtests - Always run with the
-raceflag before submitting - CI enforces a 75% coverage gate on new packages
Code Style#
Formatting and Linting#
gofmt– all code must be formatted withgofmtgoimports– imports must be organized (stdlib, external, internal)golangci-lint– rungolangci-lint runbefore submitting
Go Conventions#
- Prefer the Go standard library over third-party dependencies
- Follow standard Go naming: PascalCase for exported, camelCase for unexported
- Write documentation comments for all exported functions, types, and methods
- Use generics with
[T tensor.Numeric]constraints – avoid type-specific code where generics work - All tensor arithmetic must flow through
compute.Engine[T](see Key Conventions)
Commit Conventions#
We use Conventional Commits for automated versioning with release-please.
<type>(<scope>): <description>| Type | Description |
|---|---|
feat | A new feature |
fix | A bug fix |
perf | A performance improvement |
docs | Documentation only changes |
test | Adding or correcting tests |
chore | Maintenance tasks, CI, dependencies |
refactor | Code change that neither fixes a bug nor adds a feature |
Examples:
feat(inference): add Qwen 2.5 architecture support
fix(generate): correct KV cache eviction for sliding window attention
perf(layers): fuse SiLU and gate projection into single kernelPull Request Process#
- Branch from
mainand keep your branch up to date with rebase - One logical change per PR – keep PRs focused and reviewable
- All CI checks must pass – tests, linting, formatting
- Rebase and merge – we do not use squash merges or merge commits
- Reference related issues – use
Fixes #123orCloses #123in the PR description
Before Submitting#
go test -race ./...
go vet ./...
golangci-lint runReview Process#
- All PRs require at least one maintainer approval
- Maintainers may request changes – address feedback and force-push to update your branch
- Once approved and CI is green, a maintainer will rebase-merge your PR
GPU Development#
purego Bindings#
GPU libraries are loaded at runtime via purego/dlopen – not linked at compile time. This means:
go buildnever requires a C compiler or GPU SDK- GPU availability is detected at runtime
- The same binary runs on CPU-only machines (gracefully falls back)
When writing GPU code, use the compute.Engine[T] interface. Do not call CUDA/ROCm/OpenCL APIs directly outside of internal/gpuapi/.
Release Process#
All six repositories use release-please for automated releases:
- Conventional Commit messages drive version bumps (
feat= minor,fix= patch) - release-please opens a release PR automatically when changes land on
main - Merging the release PR creates a GitHub release and Git tag
- Semantic versioning (
vMAJOR.MINOR.PATCH) is enforced across all repos
Breaking changes require a BREAKING CHANGE: footer in the commit message, which triggers a major version bump.
Issue Reporting#
Bug Reports#
Include: clear description, steps to reproduce, expected vs actual behavior, environment (Go version, OS, architecture, GPU), and model details if applicable.
Feature Requests#
Include: problem statement, proposed solution, alternatives considered, and use case.
Good First Issues#
Looking for a place to start? Here are some beginner-friendly issues across the ecosystem.
Beginner#
| # | Issue | Repo | Effort |
|---|---|---|---|
| 1 | Fix Exp10 returning a constant instead of computing 10^f | float16 | 30 min |
| 2 | Remove doc comment erroneously pasted into Config.EnableFastMath field | float16 | 15 min |
| 3 | Add String() method to FloatClass enum type | float16 | 30 min |
| 4 | Add missing doc comments to GGUF writer AddMetadata* methods | zonnx | 20 min |
| 5 | Add String() methods to ConversionMode and ArithmeticMode enums | float8 | 30 min |
| 6 | Add table-driven tests for BFloat16 comparison functions | float16 | 45 min |
Intermediate#
| # | Issue | Repo | Effort |
|---|---|---|---|
| 7 | Fix Mod(f, Inf) returning NaN instead of f | float16 | 30 min |
| 8 | Add NaN checks to addAlgorithmic and subAlgorithmic in float8 | float8 | 30 min |
| 9 | Add SetNormalizer public method to BPETokenizer | ztoken | 30 min |
| 10 | Convert downloadFile to use defer for resource cleanup | zonnx | 45 min |
| 11 | Add unit tests for Div, Sqrt, and Neg layers | zerfoo | 1 hr |
| 12 | Add unit tests for Softmax activation layer | zerfoo | 45 min |
| 13 | Optimize RecordRequest to avoid per-token counter increment loop | zerfoo | 45 min |
Advanced#
| # | Issue | Repo | Effort |
|---|---|---|---|
| 14 | Implement Backward pass for the Gelu activation’s test coverage | zerfoo | 1.5 hr |
| 15 | Add JSON Schema $ref resolution to grammar-constrained decoding converter | zerfoo | 2 hr |
| 16 | Add a fine-tuning example application | zerfoo | 2 hr |
| 17 | Implement Backward for Div and Sqrt core layers | zerfoo | 2 hr |
| 18 | Add String() method to device.Type enum | ztensor | 20 min |
| 19 | Add R2Score metric to the metrics package | ztensor | 45 min |
| 20 | Add table-driven tests for tensor shape validation | ztensor | 45 min |
Browse issues labeled good first issue on GitHub for the full list with detailed acceptance criteria.
How to claim an issue:
- Comment on the issue to let maintainers know you’re working on it
- Fork the repo and create a feature branch
- Submit a PR referencing the issue
Key Conventions#
Engine[T] is law#
All tensor arithmetic must flow through compute.Engine[T]. Never operate on raw slices outside the engine – this enables transparent CPU/GPU switching and CUDA graph capture.
No CGo by default#
GPU bindings use purego/dlopen. A plain go build ./... must compile on any platform without a C compiler.
GGUF is the sole model format#
Do not add support for other formats (ONNX, SafeTensors, etc.) in this repo. Use zonnx to convert ONNX models to GGUF.
Fuse, don’t fragment#
Prefer fused operations (FusedAddRMSNorm, FusedSiluGate, etc.) over sequences of primitive ops. Every eliminated kernel launch matters for tok/s.
Getting Help#
- GitHub Discussions – ask questions and share ideas on each repo’s Discussions tab
- GitHub Issues – report bugs or request features