Version v1.9
You're viewing docs for v1.9. If you are cloning the repository, make sure to check out this release: git checkout v1.9
Quickstart¶
Install Cactus and run your first on-device AI completion.
Installation¶
Platform Integration¶
Platform Integration¶
Platform Integration¶
Homebrew (macOS):
From Source (macOS):
From Source (Linux):
Include the Cactus header in your project:
See the Cactus repository for CMake build instructions.
Your First Completion¶
import { useCactusLM } from 'cactus-react-native';
const App = () => {
const cactusLM = useCactusLM();
useEffect(() => {
if (!cactusLM.isDownloaded) {
cactusLM.download();
}
}, []);
const handleGenerate = () => {
cactusLM.complete({
messages: [{ role: 'user', content: 'What is the capital of France?' }],
});
};
if (cactusLM.isDownloading) {
return <Text>Downloading: {Math.round(cactusLM.downloadProgress * 100)}%</Text>;
}
return (
<>
<Button onPress={handleGenerate} title="Generate" />
<Text>{cactusLM.completion}</Text>
</>
);
};
use cactus_sys::*;
use std::ffi::CString;
unsafe {
let model_path = CString::new("path/to/weight/folder").unwrap();
let model = cactus_init(model_path.as_ptr(), std::ptr::null(), false);
let messages = CString::new(
r#"[{"role": "user", "content": "What is the capital of France?"}]"#
).unwrap();
let mut response = vec![0u8; 4096];
cactus_complete(
model, messages.as_ptr(),
response.as_mut_ptr() as *mut i8, 4096,
std::ptr::null(), std::ptr::null(),
None, std::ptr::null_mut(),
);
println!("{}", String::from_utf8_lossy(&response));
cactus_destroy(model);
}
#include <cactus.h>
cactus_model_t model = cactus_init(
"path/to/weight/folder",
"path/to/rag/documents",
false
);
const char* messages = R"([
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
])";
char response[4096];
int result = cactus_complete(
model, messages, response, sizeof(response),
nullptr, nullptr, nullptr, nullptr
);
Supported Models¶
- LLMs: Gemma-3 (270M, FunctionGemma-270M, 1B), LiquidAI LFM2 (350M, 2.6B) / LFM2.5 (1.2B-Instruct, 1.2B-Thinking) / LFM2-8B-A1B, Qwen3 (0.6B, 1.7B) (completion, tools, embeddings)
- Vision: LFM2-VL, LFM2.5-VL (with Apple NPU), Qwen3.5 (0.8B, 2B)
- Transcription: Whisper (Tiny/Base/Small/Medium with Apple NPU), Parakeet (CTC-0.6B/CTC-1.1B/TDT-0.6B-v3 with Apple NPU), Moonshine-Base
- VAD: Silero VAD for voice activity detection
- Embeddings: Nomic-Embed, Qwen3-Embedding
See the full list on HuggingFace.
Next Steps¶
- Engine API -- Full inference API reference
- Graph API -- Zero-copy computation graph for custom models
- Fine-tuning & Deployment -- Convert and deploy custom fine-tunes
- Choose Your SDK -- Help picking the right SDK for your project