Cactus for Flutter¶

Run AI models on-device with dart:ffi direct bindings for iOS, macOS, and Android.

Model weights: Pre-converted weights for all supported models at huggingface.co/Cactus-Compute.

Building¶

git clone https://github.com/cactus-compute/cactus && cd cactus && source ./setup
cactus build --flutter

Build output:

File	Platform
`libcactus.so`	Android (arm64-v8a)
`cactus-ios.xcframework`	iOS
`cactus-macos.xcframework`	macOS

See the main README.md for how to use CLI & download weights

Integration¶

Android¶

Copy libcactus.so to android/app/src/main/jniLibs/arm64-v8a/
Copy cactus.dart to your lib/ folder

iOS¶

Copy cactus-ios.xcframework to your ios/ folder
Open ios/Runner.xcworkspace in Xcode
Drag the xcframework into the project
In Runner target > General > "Frameworks, Libraries, and Embedded Content", set to "Embed & Sign"
Copy cactus.dart to your lib/ folder

macOS¶

Copy cactus-macos.xcframework to your macos/ folder
Open macos/Runner.xcworkspace in Xcode
Drag the xcframework into the project
In Runner target > General > "Frameworks, Libraries, and Embedded Content", set to "Embed & Sign"
Copy cactus.dart to your lib/ folder

Usage¶

Handles are typed as CactusModelT, CactusIndexT, and CactusStreamTranscribeT (all Pointer<Void> aliases). All functions are top-level.

Basic Completion¶

import 'cactus.dart';

final model = cactusInit('/path/to/model', null, false);
final messages = '[{"role":"user","content":"What is the capital of France?"}]';
final resultJson = cactusComplete(model, messages, null, null, null);
print(resultJson);
cactusDestroy(model);

For vision models (LFM2-VL-450M, LFM2.5-VL-1.6B), add "images": ["path/to/image.png"] to any message. See Engine API for details.

Completion with Options and Streaming¶

import 'cactus.dart';
import 'dart:io';

final options = '{"max_tokens":256,"temperature":0.7}';

final resultJson = cactusComplete(model, messages, options, null, (token, tokenId) {
  stdout.write(token);
});
print(resultJson);

Prefill¶

Pre-processes input text and populates the KV cache without generating output tokens. This reduces latency for subsequent calls to cactusComplete.

String cactusPrefill(
  CactusModelT model,
  String messagesJson,
  String? optionsJson,
  String? toolsJson,
)

final tools = '[{"type":"function","function":{"name":"get_weather","description":"Get weather for a location","parameters":{"type":"object","properties":{"location":{"type":"string"}},"required":["location"]}}}]';

final messages = '[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"What is the weather in Paris?"}]';

final resultJson = cactusPrefill(model, messages, null, tools);

final completionMessages = '[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"What is the weather in Paris?"},{"role":"user","content":"What about SF?"}]';

final completion = cactusComplete(model, completionMessages, null, tools, null);

Response format:

{
    "success": true,
    "error": null,
    "prefill_tokens": 25,
    "prefill_tps": 166.1,
    "total_time_ms": 150.5,
    "ram_usage_mb": 245.67
}

Audio Transcription¶

import 'cactus.dart';
import 'dart:typed_data';

// From file
final resultJson = cactusTranscribe(model, '/path/to/audio.wav', null, null, null, null);
print(resultJson);

// From PCM data (16 kHz mono)
final pcmData = Uint8List.fromList([...]);
final resultJson2 = cactusTranscribe(model, null, null, null, null, pcmData);
print(resultJson2);

segments contains timestamps (seconds): phrase-level for Whisper, word-level for Parakeet TDT, one segment per transcription window for Parakeet CTC and Moonshine (consecutive VAD speech regions up to 30s).

import 'dart:convert';

final result = jsonDecode(resultJson) as Map<String, dynamic>;
for (final seg in result['segments'] as List) {
  print('[${seg['start']}s - ${seg['end']}s] ${seg['text']}');
}

Custom vocabulary biases the decoder toward domain-specific words (supported for Whisper and Moonshine models). Pass custom_vocabulary and vocabulary_boost in the options JSON:

final options = '{"custom_vocabulary": ["Omeprazole", "HIPAA", "Cactus"], "vocabulary_boost": 3.0}';
final result = cactusTranscribe(model, '/path/to/audio.wav', '', options, null, null);

Streaming Transcription¶

import 'cactus.dart';
import 'dart:typed_data';

final stream = cactusStreamTranscribeStart(model, null);

final Uint8List audioChunk = ...;
final partialJson = cactusStreamTranscribeProcess(stream, audioChunk);
print(partialJson);

final finalJson = cactusStreamTranscribeStop(stream);
print(finalJson);

Streaming also accepts custom_vocabulary in the options passed to cactusStreamTranscribeStart. The bias is applied for the lifetime of the stream session.

Embeddings¶

import 'cactus.dart';
import 'dart:typed_data';

final Float32List embedding      = cactusEmbed(model, 'Hello, world!', true);
final Float32List imageEmbedding = cactusImageEmbed(model, '/path/to/image.jpg');
final Float32List audioEmbedding = cactusAudioEmbed(model, '/path/to/audio.wav');

Tokenization¶

import 'cactus.dart';

final List<int> tokens = cactusTokenize(model, 'Hello, world!');
final String scores = cactusScoreWindow(model, tokens, 0, tokens.length, 512);

Language Detection¶

import 'cactus.dart';
import 'dart:typed_data';

// From file
final resultJson = cactusDetectLanguage(model, '/path/to/audio.wav', null, null);
print(resultJson);

// From PCM data (16 kHz mono)
final Uint8List pcmData = ...;
final resultJson2 = cactusDetectLanguage(model, null, null, pcmData);
print(resultJson2);

VAD¶

import 'cactus.dart';

final String vadJson = cactusVad(model, '/path/to/audio.wav', null, null);
print(vadJson);

Diarize¶

import 'cactus.dart';

final String diarizeJson = cactusDiarize(model, '/path/to/audio.wav', null, null);
print(diarizeJson);

Embed Speaker¶

import 'cactus.dart';

final String embedJson = cactusEmbedSpeaker(model, '/path/to/audio.wav', null, null);
print(embedJson);

RAG¶

import 'cactus.dart';

final String result = cactusRagQuery(model, 'What is machine learning?', 5);
print(result);

Vector Index¶

import 'cactus.dart';

final embDim = 4;
final index = cactusIndexInit('/path/to/index', embDim);

cactusIndexAdd(
  index,
  [1, 2],
  ['Document 1', 'Document 2'],
  [[0.1, 0.2, 0.3, 0.4], [0.5, 0.6, 0.7, 0.8]],
  null,
);

final resultsJson = cactusIndexQuery(index, [0.1, 0.2, 0.3, 0.4], null);
final getJson = cactusIndexGet(index, [1, 2]);

cactusIndexDelete(index, [2]);
cactusIndexCompact(index);
cactusIndexDestroy(index);

API Reference¶

All functions are top-level and mirror the C FFI directly. Functions that return a value throw Exception on failure;

Types¶

typedef CactusModelT            = Pointer<Void>;
typedef CactusIndexT            = Pointer<Void>;
typedef CactusStreamTranscribeT = Pointer<Void>;

Init / Lifecycle¶

CactusModelT cactusInit(String modelPath, String? corpusDir, bool cacheIndex)
void cactusDestroy(CactusModelT model)
void cactusReset(CactusModelT model)
void cactusStop(CactusModelT model)
String cactusGetLastError()

Prefill¶

String cactusPrefill(
  CactusModelT model,
  String messagesJson,
  String? optionsJson,
  String? toolsJson,
)

Completion¶

String cactusComplete(
  CactusModelT model,
  String messagesJson,
  String? optionsJson,
  String? toolsJson,
  void Function(String token, int tokenId)? callback,
)

Transcription¶

String cactusTranscribe(
  CactusModelT model,
  String? audioPath,
  String? prompt,
  String? optionsJson,
  void Function(String token, int tokenId)? callback,
  Uint8List? pcmData,
)

CactusStreamTranscribeT cactusStreamTranscribeStart(CactusModelT model, String? optionsJson)
String cactusStreamTranscribeProcess(CactusStreamTranscribeT stream, Uint8List pcmData)
String cactusStreamTranscribeStop(CactusStreamTranscribeT stream)

Embeddings¶

Float32List cactusEmbed(CactusModelT model, String text, bool normalize)
Float32List cactusImageEmbed(CactusModelT model, String imagePath)
Float32List cactusAudioEmbed(CactusModelT model, String audioPath)

Tokenization / Scoring¶

List<int> cactusTokenize(CactusModelT model, String text)
String cactusScoreWindow(CactusModelT model, List<int> tokens, int start, int end, int context)

Detect Language¶

String cactusDetectLanguage(CactusModelT model, String? audioPath, String? optionsJson, Uint8List? pcmData)

VAD / RAG¶

String cactusVad(CactusModelT model, String? audioPath, String? optionsJson, Uint8List? pcmData)
String cactusRagQuery(CactusModelT model, String query, int topK)

Vector Index¶

CactusIndexT cactusIndexInit(String indexDir, int embeddingDim)
void cactusIndexDestroy(CactusIndexT index)
int cactusIndexAdd(CactusIndexT index, List<int> ids, List<String> documents, List<List<double>> embeddings, List<String>? metadatas)
int cactusIndexDelete(CactusIndexT index, List<int> ids)
String cactusIndexGet(CactusIndexT index, List<int> ids)
String cactusIndexQuery(CactusIndexT index, List<double> embedding, String? optionsJson)
int cactusIndexCompact(CactusIndexT index)

Logging¶

void cactusLogSetLevel(int level)  // 0=DEBUG 1=INFO 2=WARN 3=ERROR 4=NONE
void cactusLogSetCallback(void Function(int level, String component, String message)? onLog)

Telemetry¶

void cactusSetTelemetryEnvironment(String cacheLocation)
void cactusSetAppId(String appId)
void cactusTelemetryFlush()
void cactusTelemetryShutdown()

Bundling Model Weights¶

Models must be accessible via file path at runtime.

Android¶

Copy from assets to internal storage on first launch:

import 'package:flutter/services.dart';
import 'package:path_provider/path_provider.dart';
import 'dart:io';

Future<String> getModelPath() async {
  final dir = await getApplicationDocumentsDirectory();
  final modelFile = File('${dir.path}/model');

  if (!await modelFile.exists()) {
    final data = await rootBundle.load('assets/model');
    await modelFile.writeAsBytes(data.buffer.asUint8List());
  }

  return modelFile.path;
}

iOS/macOS¶

Add model to bundle and access via path:

import 'dart:io';

final path = '${Directory.current.path}/model';

Requirements¶

Flutter 3.0+
Dart 2.17+
iOS 13.0+ / macOS 13.0+
Android API 21+ / arm64-v8a

Cactus for Flutter¶

Building¶

Integration¶

Android¶

iOS¶

macOS¶

Usage¶

Basic Completion¶

Completion with Options and Streaming¶

Prefill¶

Audio Transcription¶

Streaming Transcription¶

Embeddings¶

Tokenization¶

Language Detection¶

VAD¶

Diarize¶

Embed Speaker¶

RAG¶

Vector Index¶

API Reference¶

Types¶

Init / Lifecycle¶

Prefill¶

Completion¶

Transcription¶

Embeddings¶

Tokenization / Scoring¶

Detect Language¶

VAD / RAG¶

Vector Index¶

Logging¶

Telemetry¶

Bundling Model Weights¶

Android¶

iOS/macOS¶

Requirements¶

See Also¶