Installation¶
Polyglot SDKs plus a standalone CLI. Most packages ship prebuilt binaries for Linux (x86_64/aarch64), macOS, and Windows — no compile step needed.
CLI / Docker¶
No SDK, no code — just your terminal.
MCP Server included
Prebuilt binaries (Homebrew, install.sh, Docker full) include the MCP server. If building from source with cargo install kreuzberg-cli, add --features mcp (or --features mcp-http for HTTP transport) to include it.
x86_64 CPU — AVX/AVX2 instruction set required
The bundled ONNX Runtime binaries require AVX/AVX2 CPU instructions. CPUs without AVX support (e.g. Intel Atom, Celeron N5105/Jasper Lake, older pre-2011 processors) will crash with an invalid opcode trap when using ONNX-dependent features. The affected features are PaddleOCR, layout detection, embeddings, reranking, auto-rotate, and transcription. All other Kreuzberg functionality (text extraction, Tesseract OCR, chunking, metadata, etc.) works normally on any x86_64 CPU. ARM platforms (aarch64) are unaffected.
Windows — ONNX Runtime required for Go, Elixir, and C/C++
Go, Elixir, and C/C++ bindings on Windows link against ONNX Runtime dynamically. You must have onnxruntime.dll on your PATH at runtime. Download it from the ONNX Runtime releases (for example onnxruntime-win-x64-1.24.1.zip). Python, TypeScript, Java, C#, Ruby, PHP, and Wasm are unaffected.
Choose your language¶
- Python
- TypeScript (Node.js / Bun)
- TypeScript (Browser / Edge)
- Rust
- Go
- Java
- :fontawesome-brands-kotlin:{ .lg .middle } Kotlin Android
- Ruby
- Swift
- C# / .NET
- PHP
- Elixir
- R
- C / C++
- :material-language-dart:{ .lg .middle } Dart / Flutter
- :material-language-zig:{ .lg .middle } Zig
System requirements¶
Only relevant if building from source or enabling OCR:
| Dependency | When you need it |
|---|---|
| AVX/AVX2 CPU instructions | Required for ONNX Runtime features (PaddleOCR, layout detection, embeddings, reranking, auto-rotate, transcription) on x86_64 |
Rust toolchain (rustup) |
Building any native binding from source |
| C/C++ compiler | Building native bindings (Xcode command-line tools / build-essential / MSVC) |
| Tesseract OCR | Optional — brew install tesseract / apt install tesseract-ocr |
| libheif (HEIC / HEIF / AVIF) | Optional — brew install libheif / apt install libheif-dev / dnf install libheif-devel |
PDF extraction uses pdf_oxide and has no external PDF runtime dependency.
The Wasm package (@kreuzberg/wasm) has zero system dependencies.
HEIF / HEIC / AVIF support¶
Pixel decoding for Apple HEIC photos, HEIF still images, AVIF, HEIC sequences
(.heics), and AVCS requires the heic Cargo feature plus the system
libheif library (with libde265 for HEVC and libaom for AV1):
- macOS:
brew install libheif - Debian / Ubuntu:
apt install libheif-dev - Fedora:
dnf install libheif-devel - Windows (vcpkg):
vcpkg install libheif[hevc,aom]:x64-windows
Enable the feature when building from source:
heic is included in the full aggregate feature. HEIC pixel decoding is
not available on wasm-target or android-target (libheif is a C library
with no working WASM/Android build story). EXIF metadata extraction from HEIC
/ HEIF / AVIF works on every target via the pure-Rust nom-exif
integration.
GPU Acceleration¶
Kreuzberg bundles a CPU-only ONNX Runtime — ML features (PaddleOCR, layout detection, embeddings, reranking, auto-rotate, transcription) work out of the box on CPU.
For GPU acceleration, install a GPU-enabled ONNX Runtime and set ORT_DYLIB_PATH:
| Platform | Install | Set ORT_DYLIB_PATH |
|---|---|---|
| Linux (CUDA) | Download from ONNX Runtime releases | export ORT_DYLIB_PATH=/path/to/libonnxruntime.so |
| Python (any OS) | pip install onnxruntime-gpu |
Point at the pip package's capi/ directory |
| macOS (CoreML) | Works with bundled ORT — no extra setup needed | — |
See AccelerationConfig and ORT_DYLIB_PATH for details.
Language-specific notes¶
Edge cases and alternative install methods where they come up.
TypeScript¶
Two npm packages target different runtimes:
| Package | Best for | Performance |
|---|---|---|
@kreuzberg/node |
Node.js, Bun — server-side apps | Native (100%) |
@kreuzberg/wasm |
Browsers, Deno, Cloudflare Workers | Wasm (~60-80%) |
Both work with pnpm (pnpm add) and Yarn (yarn add) as well.
pnpm workspaces
In monorepos, add this to your root .npmrc so platform-specific optional deps resolve correctly:
Wasm — Browser usage
<script type="module">
import { initWasm, extractFromFile } from "@kreuzberg/wasm";
await initWasm();
const input = document.getElementById("file");
input.addEventListener("change", async (e) => {
const result = await extractFromFile(e.target.files[0]);
console.log(result.content);
});
</script>
<input type="file" id="file" />
Wasm — Deno
Wasm — Cloudflare Workers
import { initWasm, extractBytes } from "@kreuzberg/wasm";
export default {
async fetch(request: Request): Promise<Response> {
await initWasm();
const bytes = new Uint8Array(await request.arrayBuffer());
const result = await extractBytes(bytes, "application/pdf");
return Response.json({ content: result.content });
},
};
Supported runtimes: Chrome 74+, Firefox 79+, Safari 14+, Edge 79+, Node.js 22+, Deno 1.35+, Cloudflare Workers.
Wasm Platform Limitations
The Wasm binding does not support:
- Layout detection (RT-DETR model inference requires ONNX Runtime unavailable in WebAssembly)
- PaddleOCR, embeddings, reranking, auto-rotate, and transcription inference (all require ONNX Runtime)
- LLM/VLM features (liter-llm is not part of the
wasm-targetfeature set) - Hardware acceleration config (single-threaded WASM, no GPU access)
- Native server features (
api,mcp, CLI binary) - Browser filesystem paths (
extractBytes/extractFromFileare portable; path APIs require Node/Deno/Bun filesystem access) - Email codepage config (EmailConfig not available)
Pure-Rust extraction formats, OCR via Tesseract WASM, chunking, metadata, tables, language detection, SVG handling, redaction, summarization, QR-code detection, and image extraction work in WASM. See the WASM API Reference for details.
Java¶
Requires Java 25+ (FFM/Panama API). Native libraries are bundled in the JAR.
Elixir¶
Add to mix.exs:
Ships prebuilt NIF binaries via RustlerPrecompiled. Falls back to compiling from source if no prebuilt matches your platform (requires Rust).
Windows
The Windows NIF links against ONNX Runtime dynamically. onnxruntime.dll must be on your PATH at runtime — see the note at the top of this page.
Go¶
Windows
The Go binding links against ONNX Runtime dynamically on Windows. onnxruntime.dll must be on your PATH at runtime — see the note at the top of this page.
Windows feature limitations
The Go and C/C++ bindings on Windows (MinGW/GNU target) do not include ORT-dependent inference features: PaddleOCR, layout detection, embeddings, reranking, auto-rotate, or transcription. Tesseract OCR and non-ORT features work normally. These limitations apply only to Windows; Linux and macOS builds include the full feature set.
Rust¶
Enable features selectively in Cargo.toml:
[dependencies]
kreuzberg = { version = "5", features = ["pdf", "ocr", "chunking"] }
# Default features are tokio-runtime + simd-utf8; format and analysis features are opt-in.
C / C++¶
Build the FFI library from source:
This produces libkreuzberg_ffi.a and a header at crates/kreuzberg-ffi/kreuzberg.h. Link into your project:
HEADER_DIR = path/to/crates/kreuzberg-ffi
LIBDIR = path/to/target/release
CFLAGS = -Wall -Wextra -I$(HEADER_DIR)
LDFLAGS = -L$(LIBDIR) -lkreuzberg_ffi -lpthread -ldl -lm
my_app: my_app.c
$(CC) $(CFLAGS) -o $@ $< $(LDFLAGS)
Platform-specific linker flags
macOS: add -framework CoreFoundation -framework Security
Windows: add -lws2_32 -luserenv -lbcrypt
Windows
The Windows FFI library links against ONNX Runtime dynamically. onnxruntime.dll must be on your PATH at runtime — see the note at the top of this page.
Dart / Flutter¶
Pure-Dart and Flutter consumers share the same package. Dart SDK 3.0 or higher is required. Flutter is supported on macOS, iOS, Android, Linux, and Windows; Flutter Web is not supported because the runtime is a native dynamic library delivered via flutter_rust_bridge. For Flutter projects use flutter pub add kreuzberg instead of dart pub add kreuzberg.
Kotlin¶
Kotlin/JVM consumers use the Java artifact (dev.kreuzberg:kreuzberg) directly; Kotlin interoperates with the generated Java records and static facade.
Kotlin Android uses the Android AAR (dev.kreuzberg:kreuzberg-android). It embeds JNI libraries for arm64-v8a and x86_64, targets Android API 21+, and uses the android-target feature set, which excludes ORT-dependent inference features.
Swift¶
Swift Package Manager from swift-tools-version: 6.0 upward. Targets macOS 13+ and iOS 16+; Linux is not currently declared in Package.swift. Once the package ships its binaryTarget, no manual cargo build is needed; in the interim, building the library locally requires cargo build -p kreuzberg-swift against the workspace.
Zig¶
Requires Zig 0.16.0 or higher (declared via minimum_zig_version in build.zig.zon). The Zig binding consumes the C FFI surface from kreuzberg-ffi via linkSystemLibrary; the build expects the consumer to provide a search path to the prebuilt libkreuzberg_ffi and the C header kreuzberg.h. The zig fetch command above pins the source archive in build.zig.zon; wire it into build.zig via b.dependency("kreuzberg", ...).
Development setup¶
For working on the Kreuzberg repository itself:
task setup # installs all language toolchains
task lint # linters across all languages
task dev:test # full test suite
See Contributing for conventions and expectations.