Contributing to Zerfoo#

Thank you for your interest in contributing to Zerfoo, the Go-native ML inference and training framework. This guide covers the full Zerfoo ecosystem and applies to all six repositories.

Code of Conduct#

All participants in the Zerfoo community are expected to treat each other with respect and professionalism. We are committed to providing a welcoming and inclusive environment for everyone.

Repository Structure#

Zerfoo is an ecosystem of six independent repositories (each with its own go.mod, CI, and releases):

RepositoryModulePurpose
zerfoogithub.com/zerfoo/zerfooCore ML framework: inference, training, serving
ztensorgithub.com/zerfoo/ztensorGPU-accelerated tensor, compute engine, computation graph
ztokengithub.com/zerfoo/ztokenBPE tokenizer with HuggingFace compatibility
zonnxgithub.com/zerfoo/zonnxONNX-to-GGUF converter CLI
float16github.com/zerfoo/float16IEEE 754 half-precision (Float16/BFloat16) arithmetic
float8github.com/zerfoo/float8FP8 E4M3FN arithmetic for quantized inference

Dependency graph:

float16 --+
float8  --+--> ztensor --> zerfoo
ztoken --+

zonnx (standalone)

Each repo is versioned and released independently. Do not treat this as a monorepo – submit PRs to the repository where the change belongs.

Development Setup#

Prerequisites#

  • Go 1.26+ (generics with tensor.Numeric constraint)
  • Git
  • CUDA Toolkit (optional, for GPU-accelerated tests and development)

Clone and Build#

Each repository builds independently:

# Clone whichever repo you want to work on
git clone https://github.com/zerfoo/<repo>.git
cd <repo>
go mod tidy
go test ./...

No CGo is required for CPU-only builds. GPU support is loaded dynamically at runtime via purego/dlopen, so go build ./... works on any platform without a C compiler.

Running Tests#

go test ./...            # All CPU tests (no GPU required)
go test -race ./...      # Tests with race detector (required before submitting)
go test -tags cuda ./... # GPU tests (requires CUDA toolkit and a GPU)
go test -coverprofile=coverage.out ./...  # Coverage report
go tool cover -html=coverage.out -o coverage.html

Testing Requirements#

  • All new code must have tests
  • Use table-driven tests with t.Run subtests
  • Always run with the -race flag before submitting
  • CI enforces a 75% coverage gate on new packages

Code Style#

Formatting and Linting#

  • gofmt – all code must be formatted with gofmt
  • goimports – imports must be organized (stdlib, external, internal)
  • golangci-lint – run golangci-lint run before submitting

Go Conventions#

  • Prefer the Go standard library over third-party dependencies
  • Follow standard Go naming: PascalCase for exported, camelCase for unexported
  • Write documentation comments for all exported functions, types, and methods
  • Use generics with [T tensor.Numeric] constraints – avoid type-specific code where generics work
  • All tensor arithmetic must flow through compute.Engine[T] (see Key Conventions)

Commit Conventions#

We use Conventional Commits for automated versioning with release-please.

<type>(<scope>): <description>
TypeDescription
featA new feature
fixA bug fix
perfA performance improvement
docsDocumentation only changes
testAdding or correcting tests
choreMaintenance tasks, CI, dependencies
refactorCode change that neither fixes a bug nor adds a feature

Examples:

feat(inference): add Qwen 2.5 architecture support
fix(generate): correct KV cache eviction for sliding window attention
perf(layers): fuse SiLU and gate projection into single kernel

Pull Request Process#

  1. Branch from main and keep your branch up to date with rebase
  2. One logical change per PR – keep PRs focused and reviewable
  3. All CI checks must pass – tests, linting, formatting
  4. Rebase and merge – we do not use squash merges or merge commits
  5. Reference related issues – use Fixes #123 or Closes #123 in the PR description

Before Submitting#

go test -race ./...
go vet ./...
golangci-lint run

Review Process#

  • All PRs require at least one maintainer approval
  • Maintainers may request changes – address feedback and force-push to update your branch
  • Once approved and CI is green, a maintainer will rebase-merge your PR

GPU Development#

purego Bindings#

GPU libraries are loaded at runtime via purego/dlopen – not linked at compile time. This means:

  • go build never requires a C compiler or GPU SDK
  • GPU availability is detected at runtime
  • The same binary runs on CPU-only machines (gracefully falls back)

When writing GPU code, use the compute.Engine[T] interface. Do not call CUDA/ROCm/OpenCL APIs directly outside of internal/gpuapi/.

Release Process#

All six repositories use release-please for automated releases:

  1. Conventional Commit messages drive version bumps (feat = minor, fix = patch)
  2. release-please opens a release PR automatically when changes land on main
  3. Merging the release PR creates a GitHub release and Git tag
  4. Semantic versioning (vMAJOR.MINOR.PATCH) is enforced across all repos

Breaking changes require a BREAKING CHANGE: footer in the commit message, which triggers a major version bump.

Issue Reporting#

Bug Reports#

Include: clear description, steps to reproduce, expected vs actual behavior, environment (Go version, OS, architecture, GPU), and model details if applicable.

Feature Requests#

Include: problem statement, proposed solution, alternatives considered, and use case.

Good First Issues#

Looking for a place to start? Here are some beginner-friendly issues across the ecosystem.

Beginner#

#IssueRepoEffort
1Fix Exp10 returning a constant instead of computing 10^ffloat1630 min
2Remove doc comment erroneously pasted into Config.EnableFastMath fieldfloat1615 min
3Add String() method to FloatClass enum typefloat1630 min
4Add missing doc comments to GGUF writer AddMetadata* methodszonnx20 min
5Add String() methods to ConversionMode and ArithmeticMode enumsfloat830 min
6Add table-driven tests for BFloat16 comparison functionsfloat1645 min

Intermediate#

#IssueRepoEffort
7Fix Mod(f, Inf) returning NaN instead of ffloat1630 min
8Add NaN checks to addAlgorithmic and subAlgorithmic in float8float830 min
9Add SetNormalizer public method to BPETokenizerztoken30 min
10Convert downloadFile to use defer for resource cleanupzonnx45 min
11Add unit tests for Div, Sqrt, and Neg layerszerfoo1 hr
12Add unit tests for Softmax activation layerzerfoo45 min
13Optimize RecordRequest to avoid per-token counter increment loopzerfoo45 min

Advanced#

#IssueRepoEffort
14Implement Backward pass for the Gelu activation’s test coveragezerfoo1.5 hr
15Add JSON Schema $ref resolution to grammar-constrained decoding converterzerfoo2 hr
16Add a fine-tuning example applicationzerfoo2 hr
17Implement Backward for Div and Sqrt core layerszerfoo2 hr
18Add String() method to device.Type enumztensor20 min
19Add R2Score metric to the metrics packageztensor45 min
20Add table-driven tests for tensor shape validationztensor45 min

Browse issues labeled good first issue on GitHub for the full list with detailed acceptance criteria.

How to claim an issue:

  1. Comment on the issue to let maintainers know you’re working on it
  2. Fork the repo and create a feature branch
  3. Submit a PR referencing the issue

Key Conventions#

Engine[T] is law#

All tensor arithmetic must flow through compute.Engine[T]. Never operate on raw slices outside the engine – this enables transparent CPU/GPU switching and CUDA graph capture.

No CGo by default#

GPU bindings use purego/dlopen. A plain go build ./... must compile on any platform without a C compiler.

GGUF is the sole model format#

Do not add support for other formats (ONNX, SafeTensors, etc.) in this repo. Use zonnx to convert ONNX models to GGUF.

Fuse, don’t fragment#

Prefer fused operations (FusedAddRMSNorm, FusedSiluGate, etc.) over sequences of primitive ops. Every eliminated kernel launch matters for tok/s.

Getting Help#

  • GitHub Discussions – ask questions and share ideas on each repo’s Discussions tab
  • GitHub Issues – report bugs or request features