Contributing to Zerfoo#

Thank you for your interest in contributing to Zerfoo, the Go-native ML inference and training framework. This guide covers the full Zerfoo ecosystem and applies to all six repositories.

Code of Conduct#

All participants in the Zerfoo community are expected to treat each other with respect and professionalism. We are committed to providing a welcoming and inclusive environment for everyone.

Repository Structure#

Zerfoo is an ecosystem of six independent repositories (each with its own go.mod, CI, and releases):

Repository	Module	Purpose
zerfoo	`github.com/zerfoo/zerfoo`	Core ML framework: inference, training, serving
ztensor	`github.com/zerfoo/ztensor`	GPU-accelerated tensor, compute engine, computation graph
ztoken	`github.com/zerfoo/ztoken`	BPE tokenizer with HuggingFace compatibility
zonnx	`github.com/zerfoo/zonnx`	ONNX-to-GGUF converter CLI
float16	`github.com/zerfoo/float16`	IEEE 754 half-precision (Float16/BFloat16) arithmetic
float8	`github.com/zerfoo/float8`	FP8 E4M3FN arithmetic for quantized inference

Dependency graph:

float16 --+
float8  --+--> ztensor --> zerfoo
ztoken --+

zonnx (standalone)

Each repo is versioned and released independently. Do not treat this as a monorepo – submit PRs to the repository where the change belongs.

Development Setup#

Prerequisites#

Go 1.26+ (generics with tensor.Numeric constraint)
Git
CUDA Toolkit (optional, for GPU-accelerated tests and development)

Clone and Build#

Each repository builds independently:

# Clone whichever repo you want to work on
git clone https://github.com/zerfoo/<repo>.git
cd <repo>
go mod tidy
go test ./...

No CGo is required for CPU-only builds. GPU support is loaded dynamically at runtime via purego/dlopen, so go build ./... works on any platform without a C compiler.

Running Tests#

go test ./...            # All CPU tests (no GPU required)
go test -race ./...      # Tests with race detector (required before submitting)
go test -tags cuda ./... # GPU tests (requires CUDA toolkit and a GPU)
go test -coverprofile=coverage.out ./...  # Coverage report
go tool cover -html=coverage.out -o coverage.html

Testing Requirements#

All new code must have tests
Use table-driven tests with t.Run subtests
Always run with the -race flag before submitting
CI enforces a 75% coverage gate on new packages

Code Style#

Formatting and Linting#

gofmt – all code must be formatted with gofmt
goimports – imports must be organized (stdlib, external, internal)
golangci-lint – run golangci-lint run before submitting

Go Conventions#

Prefer the Go standard library over third-party dependencies
Follow standard Go naming: PascalCase for exported, camelCase for unexported
Write documentation comments for all exported functions, types, and methods
Use generics with [T tensor.Numeric] constraints – avoid type-specific code where generics work
All tensor arithmetic must flow through compute.Engine[T] (see Key Conventions)

Commit Conventions#

We use Conventional Commits for automated versioning with release-please.

<type>(<scope>): <description>

Type	Description
`feat`	A new feature
`fix`	A bug fix
`perf`	A performance improvement
`docs`	Documentation only changes
`test`	Adding or correcting tests
`chore`	Maintenance tasks, CI, dependencies
`refactor`	Code change that neither fixes a bug nor adds a feature

Examples:

feat(inference): add Qwen 2.5 architecture support
fix(generate): correct KV cache eviction for sliding window attention
perf(layers): fuse SiLU and gate projection into single kernel

Pull Request Process#

Branch from main and keep your branch up to date with rebase
One logical change per PR – keep PRs focused and reviewable
All CI checks must pass – tests, linting, formatting
Rebase and merge – we do not use squash merges or merge commits
Reference related issues – use Fixes #123 or Closes #123 in the PR description

Before Submitting#

go test -race ./...
go vet ./...
golangci-lint run

Review Process#

All PRs require at least one maintainer approval
Maintainers may request changes – address feedback and force-push to update your branch
Once approved and CI is green, a maintainer will rebase-merge your PR

GPU Development#

purego Bindings#

GPU libraries are loaded at runtime via purego/dlopen – not linked at compile time. This means:

go build never requires a C compiler or GPU SDK
GPU availability is detected at runtime
The same binary runs on CPU-only machines (gracefully falls back)

When writing GPU code, use the compute.Engine[T] interface. Do not call CUDA/ROCm/OpenCL APIs directly outside of internal/gpuapi/.

Release Process#

All six repositories use release-please for automated releases:

Conventional Commit messages drive version bumps (feat = minor, fix = patch)
release-please opens a release PR automatically when changes land on main
Merging the release PR creates a GitHub release and Git tag
Semantic versioning (vMAJOR.MINOR.PATCH) is enforced across all repos

Breaking changes require a BREAKING CHANGE: footer in the commit message, which triggers a major version bump.

Issue Reporting#

Bug Reports#

Include: clear description, steps to reproduce, expected vs actual behavior, environment (Go version, OS, architecture, GPU), and model details if applicable.

Feature Requests#

Include: problem statement, proposed solution, alternatives considered, and use case.

Good First Issues#

Looking for a place to start? Here are some beginner-friendly issues across the ecosystem.

Beginner#

#	Issue	Repo	Effort
1	Fix `Exp10` returning a constant instead of computing 10^f	float16	30 min
2	Remove doc comment erroneously pasted into `Config.EnableFastMath` field	float16	15 min
3	Add `String()` method to `FloatClass` enum type	float16	30 min
4	Add missing doc comments to GGUF writer `AddMetadata*` methods	zonnx	20 min
5	Add `String()` methods to `ConversionMode` and `ArithmeticMode` enums	float8	30 min
6	Add table-driven tests for `BFloat16` comparison functions	float16	45 min

Intermediate#

#	Issue	Repo	Effort
7	Fix `Mod(f, Inf)` returning NaN instead of `f`	float16	30 min
8	Add NaN checks to `addAlgorithmic` and `subAlgorithmic` in float8	float8	30 min
9	Add `SetNormalizer` public method to `BPETokenizer`	ztoken	30 min
10	Convert `downloadFile` to use `defer` for resource cleanup	zonnx	45 min
11	Add unit tests for `Div`, `Sqrt`, and `Neg` layers	zerfoo	1 hr
12	Add unit tests for `Softmax` activation layer	zerfoo	45 min
13	Optimize `RecordRequest` to avoid per-token counter increment loop	zerfoo	45 min

Advanced#

#	Issue	Repo	Effort
14	Implement `Backward` pass for the `Gelu` activation’s test coverage	zerfoo	1.5 hr
15	Add JSON Schema `$ref` resolution to grammar-constrained decoding converter	zerfoo	2 hr
16	Add a fine-tuning example application	zerfoo	2 hr
17	Implement `Backward` for `Div` and `Sqrt` core layers	zerfoo	2 hr
18	Add `String()` method to `device.Type` enum	ztensor	20 min
19	Add `R2Score` metric to the metrics package	ztensor	45 min
20	Add table-driven tests for tensor shape validation	ztensor	45 min

Browse issues labeled good first issue on GitHub for the full list with detailed acceptance criteria.

How to claim an issue:

Comment on the issue to let maintainers know you’re working on it
Fork the repo and create a feature branch
Submit a PR referencing the issue

Key Conventions#

Engine[T] is law#

All tensor arithmetic must flow through compute.Engine[T]. Never operate on raw slices outside the engine – this enables transparent CPU/GPU switching and CUDA graph capture.

No CGo by default#

GPU bindings use purego/dlopen. A plain go build ./... must compile on any platform without a C compiler.

GGUF is the sole model format#

Do not add support for other formats (ONNX, SafeTensors, etc.) in this repo. Use zonnx to convert ONNX models to GGUF.

Fuse, don’t fragment#

Prefer fused operations (FusedAddRMSNorm, FusedSiluGate, etc.) over sequences of primitive ops. Every eliminated kernel launch matters for tok/s.

Getting Help#

GitHub Discussions – ask questions and share ideas on each repo’s Discussions tab
GitHub Issues – report bugs or request features