Cactus for Android & Kotlin Multiplatform¶

Run AI models on-device with a simple Kotlin API.

Model weights: Pre-converted weights for all supported models at huggingface.co/Cactus-Compute.

Building¶

git clone https://github.com/cactus-compute/cactus && cd cactus && source ./setup
cactus build --android

Build output: android/libcactus.so (and android/libcactus.a)

see the main README.md for how to use CLI & download weight

Vendored libcurl (device builds)¶

To bundle libcurl locally for Android device testing, place artifacts using:

libs/curl/android/arm64-v8a/libcurl.a and libs/curl/include/curl/*.h

The build auto-detects libs/curl. You can override with:

CACTUS_CURL_ROOT=/absolute/path/to/curl cactus build --android

Integration¶

Android-only¶

Copy libcactus.so to app/src/main/jniLibs/arm64-v8a/
Copy Cactus.kt to app/src/main/java/com/cactus/

Kotlin Multiplatform¶

Source files:

File	Copy to
`Cactus.common.kt`	`shared/src/commonMain/kotlin/com/cactus/`
`Cactus.android.kt`	`shared/src/androidMain/kotlin/com/cactus/`
`Cactus.ios.kt`	`shared/src/iosMain/kotlin/com/cactus/`
`cactus.def`	`shared/src/nativeInterop/cinterop/`

Binary files:

Platform	Location
Android	`libcactus.so` → `app/src/main/jniLibs/arm64-v8a/`
iOS	`libcactus-device.a` → link via cinterop

build.gradle.kts:

kotlin {
    androidTarget()

    listOf(iosArm64(), iosSimulatorArm64()).forEach {
        it.compilations.getByName("main") {
            cinterops {
                create("cactus") {
                    defFile("src/nativeInterop/cinterop/cactus.def")
                    includeDirs("/path/to/cactus/ffi")
                }
            }
        }
        it.binaries.framework {
            linkerOpts("-L/path/to/apple", "-lcactus-device")
        }
    }

    sourceSets {
        commonMain.dependencies {
            implementation("org.jetbrains.kotlinx:kotlinx-serialization-json:1.6.0")
        }
    }
}

Usage¶

Handles are plain Long values (C pointers). All functions are top-level.

Basic Completion¶

import com.cactus.*

val model = cactusInit("/path/to/model", null, false)
val messages = """[{"role":"user","content":"What is the capital of France?"}]"""
val resultJson = cactusComplete(model, messages, null, null, null)
println(resultJson)
cactusDestroy(model)

For vision models (LFM2-VL, LFM2.5-VL), add "images": ["path/to/image.png"] to any message. See Engine API for details.

Completion with Options and Streaming¶

import com.cactus.*

val options = """{"max_tokens":256,"temperature":0.7}"""

val resultJson = cactusComplete(model, messages, options, null) { token, _ ->
    print(token)
}
println(resultJson)

Prefill¶

Pre-processes input text and populates the KV cache without generating output tokens. This reduces latency for subsequent calls to cactusComplete.

fun cactusPrefill(
    model: Long,
    messagesJson: String,
    optionsJson: String?,
    toolsJson: String?
): String

val tools = """[
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string", "description": "City, State, Country"}
                },
                "required": ["location"]
            }
        }
    }
]"""

val messages = """[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the weather in Paris?"},
    {"role": "assistant", "content": "<|tool_call_start|>get_weather(location=\"Paris\")<|tool_call_end|>"},
    {"role": "tool", "content": "{\"name\": \"get_weather\", \"content\": \"Sunny, 72°F\"}"},
    {"role": "assistant", "content": "It's sunny and 72°F in Paris!"}
]"""

val resultJson = cactusPrefill(model, messages, null, tools)

val completionMessages = """[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the weather in Paris?"},
    {"role": "assistant", "content": "<|tool_call_start|>get_weather(location=\"Paris\")<|tool_call_end|>"},
    {"role": "tool", "content": "{\"name\": \"get_weather\", \"content\": \"Sunny, 72°F\"}"},
    {"role": "assistant", "content": "It's sunny and 72°F in Paris!"},
    {"role": "user", "content": "What about SF?"}
]"""

val completion = cactusComplete(model, completionMessages, null, tools, null)

Response format:

{
    "success": true,
    "error": null,
    "prefill_tokens": 25,
    "prefill_tps": 166.1,
    "total_time_ms": 150.5,
    "ram_usage_mb": 245.67
}

Audio Transcription¶

import com.cactus.*

// From file
val resultJson = cactusTranscribe(model, "/path/to/audio.wav", null, null, null, null)
println(resultJson)

// From PCM data (16 kHz mono)
val pcmData: ByteArray = ...
val resultJson2 = cactusTranscribe(model, null, null, null, null, pcmData)
println(resultJson2)

segments contains timestamps (seconds): phrase-level for Whisper, word-level for Parakeet TDT, one segment per transcription window for Parakeet CTC and Moonshine (consecutive VAD speech regions up to 30s).

import org.json.JSONObject

val result = JSONObject(resultJson)
val segments = result.getJSONArray("segments")
for (i in 0 until segments.length()) {
    val seg = segments.getJSONObject(i)
    println("[${seg.getDouble("start")}s - ${seg.getDouble("end")}s] ${seg.getString("text")}")
}

Custom vocabulary biases the decoder toward domain-specific words (supported for Whisper and Moonshine models). Pass custom_vocabulary and vocabulary_boost in the options JSON:

val options = """{"custom_vocabulary": ["Omeprazole", "HIPAA", "Cactus"], "vocabulary_boost": 3.0}"""
val result = cactusTranscribe(model, "/path/to/audio.wav", "", options, null, null)

Streaming Transcription¶

val stream = cactusStreamTranscribeStart(model, null)
val partial = cactusStreamTranscribeProcess(stream, audioChunk)
val final_  = cactusStreamTranscribeStop(stream)

Streaming also accepts custom_vocabulary in the options passed to cactusStreamTranscribeStart. The bias is applied for the lifetime of the stream session.

Embeddings¶

val embedding      = cactusEmbed(model, "Hello, world!", true)   // FloatArray
val imageEmbedding = cactusImageEmbed(model, "/path/to/image.jpg")
val audioEmbedding = cactusAudioEmbed(model, "/path/to/audio.wav")

Tokenization¶

val tokens = cactusTokenize(model, "Hello, world!")  // IntArray
val scores = cactusScoreWindow(model, tokens, 0, tokens.size, 512)

VAD¶

val result = cactusVad(model, "/path/to/audio.wav", null, null)

Diarize¶

val result = cactusDiarize(model, "/path/to/audio.wav", null, null)

Embed Speaker¶

val result = cactusEmbedSpeaker(model, "/path/to/audio.wav", null, null)

RAG¶

val result = cactusRagQuery(model, "What is machine learning?", 5)

Vector Index¶

val index = cactusIndexInit("/path/to/index", 3)

cactusIndexAdd(
    index,
    intArrayOf(1, 2),
    arrayOf("Document 1", "Document 2"),
    arrayOf(floatArrayOf(0.1f, 0.2f, 0.3f), floatArrayOf(0.4f, 0.5f, 0.6f)),
    null
)

val resultsJson = cactusIndexQuery(index, floatArrayOf(0.1f, 0.2f, 0.3f), null)
cactusIndexDelete(index, intArrayOf(2))
cactusIndexCompact(index)
cactusIndexDestroy(index)

API Reference¶

All functions are top-level and mirror the C FFI directly. Handles are Long values.

Init / Lifecycle¶

fun cactusInit(modelPath: String, corpusDir: String?, cacheIndex: Boolean): Long  // throws RuntimeException
fun cactusDestroy(model: Long)
fun cactusReset(model: Long)
fun cactusStop(model: Long)
fun cactusGetLastError(): String

Prefill¶

fun cactusPrefill(
    model: Long,
    messagesJson: String,
    optionsJson: String?,
    toolsJson: String?
): String

Completion¶

fun cactusComplete(
    model: Long,
    messagesJson: String,
    optionsJson: String?,
    toolsJson: String?,
    callback: CactusTokenCallback?
): String

Transcription¶

fun cactusTranscribe(
    model: Long,
    audioPath: String?,
    prompt: String?,
    optionsJson: String?,
    callback: CactusTokenCallback?,
    pcmData: ByteArray?
): String

fun cactusStreamTranscribeStart(model: Long, optionsJson: String?): Long  // throws RuntimeException
fun cactusStreamTranscribeProcess(stream: Long, pcmData: ByteArray): String
fun cactusStreamTranscribeStop(stream: Long): String

Embeddings¶

fun cactusEmbed(model: Long, text: String, normalize: Boolean): FloatArray
fun cactusImageEmbed(model: Long, imagePath: String): FloatArray
fun cactusAudioEmbed(model: Long, audioPath: String): FloatArray

Tokenization / Scoring¶

fun cactusTokenize(model: Long, text: String): IntArray
fun cactusScoreWindow(model: Long, tokens: IntArray, start: Int, end: Int, context: Int): String

Detect Language¶

fun cactusDetectLanguage(model: Long, audioPath: String?, optionsJson: String?, pcmData: ByteArray?): String

VAD / RAG¶

fun cactusVad(model: Long, audioPath: String?, optionsJson: String?, pcmData: ByteArray?): String
fun cactusRagQuery(model: Long, query: String, topK: Int): String

Vector Index¶

fun cactusIndexInit(indexDir: String, embeddingDim: Int): Long  // throws RuntimeException
fun cactusIndexDestroy(index: Long)
fun cactusIndexAdd(index: Long, ids: IntArray, documents: Array<String>, embeddings: Array<FloatArray>, metadatas: Array<String>?): Int
fun cactusIndexDelete(index: Long, ids: IntArray): Int
fun cactusIndexGet(index: Long, ids: IntArray): String
fun cactusIndexQuery(index: Long, embedding: FloatArray, optionsJson: String?): String
fun cactusIndexCompact(index: Long): Int

Logging¶

fun cactusLogSetLevel(level: Int)  // 0=DEBUG 1=INFO 2=WARN 3=ERROR 4=NONE
fun cactusLogSetCallback(callback: CactusLogCallback?)

Telemetry¶

fun cactusSetTelemetryEnvironment(cacheDir: String)
fun cactusSetAppId(appId: String)
fun cactusTelemetryFlush()
fun cactusTelemetryShutdown()

Types¶

fun interface CactusTokenCallback {
    fun onToken(token: String, tokenId: Int)
}

fun interface CactusLogCallback {
    fun onLog(level: Int, component: String, message: String)
}

Requirements¶

Android API 21+ / arm64-v8a
iOS 13+ / arm64 (KMP only)

Cactus for Android & Kotlin Multiplatform¶

Building¶

Vendored libcurl (device builds)¶

Integration¶

Android-only¶

Kotlin Multiplatform¶

Usage¶

Basic Completion¶

Completion with Options and Streaming¶

Prefill¶

Audio Transcription¶

Streaming Transcription¶

Embeddings¶

Tokenization¶

VAD¶

Diarize¶

Embed Speaker¶

RAG¶

Vector Index¶

API Reference¶

Init / Lifecycle¶

Prefill¶

Completion¶

Transcription¶

Embeddings¶

Tokenization / Scoring¶

Detect Language¶

VAD / RAG¶

Vector Index¶

Logging¶

Telemetry¶

Types¶

Requirements¶

See Also¶