ZerfooZerfoo

    Back to Home
      • Installation
      • Quick Start
      • Your First Inference
      • Model Loading
      • Text Generation
      • API Server
      • Tabular and Time-Series
      • Generate API
      • Inference API
      • Serve API
      • Basic Text Generation
      • Streaming Chat
      • Embedding Similarity
      • OpenAI Server
      • Custom Sampling
      • Structured JSON Output
      • LoRA Fine-Tuning
      • Batch Inference
      • Speculative Decoding
      • Tool Calling
      • Retrieval-Augmented Generation (RAG)
      • Vision / Multimodal
      • Architecture Overview
      • GPU Setup
      • Production Deployment
      • Enterprise Deployment
      • zonnx Overview
      • ONNX to GGUF
      • SafeTensors to GGUF
      • ztensor
      • ztoken
      • Numeric Types
      • Contributing
      • Benchmarks
      • Extensions
      • Granite Time Series
      • API Stability
      • Migration to v1
      • Granite Guardian
      • Introducing Zerfoo: A Production-Grade ML Inference Framework for Go
      • Zerfoo vs Ollama vs llama.cpp: A Performance Comparison
      • Inside Zerfoo: An Architecture Deep Dive
      • Why Go for ML? Making the Case for Go in Machine Learning
      • Migrating from Ollama to Zerfoo
      • GGUF: Why We Standardized on the Industry Format
      • How We Beat Ollama by 18.8%: CUDA Graph Capture in Pure Go
      • Add ML Inference to Your Go Service in 10 Lines
      • Zero CGo: Why We Chose Pure Go for ML Inference

    Categories