On-Device AI Models

Every model runs entirely on your iPhone. Choose what you need, download only what you want, and use them offline forever.

LiberaGPT uses compact, open-source AI models that run entirely on your iPhone. Download only the models you need. Each model is optimised for the Neural Engine with quantization that balances quality and speed.

Language Models (GGUF)

Seven quantised models (Q4_K) optimised for on-device inference via llama.cpp. Download what you need in Settings. All models run with GPU acceleration on iPhone 17 Pro tier devices.

Gemma 3 1B Instruct

1 billion parameters · 32K context · Q4_K_M quantization · 806 MB download · ~50 tok/sec

Google's latest compact instruction model. Fast inference with minimal battery impact. Ideal for quick answers, summarization, and everyday tasks. GGUF format with GPU acceleration support.

Use Case: Fast responses, summarization, general chat
Licence: Gemma Terms of Use
Gemma Docs →
© 2025 Google

SmolLM3 3B Instruct

3 billion parameters · 128K context · Q4_K_M quantization · 1.9 GB download · ~25 tok/sec

HuggingFace's compact yet capable model with massive context window. Extended conversations, long document analysis, and detailed reasoning. Balanced performance for most tasks.

Use Case: Long context, extended conversations, document analysis
Licence: Apache 2.0
HuggingFace Model →
© 2025 HuggingFace

Phi-4 Mini 3.8B Instruct

3.8 billion parameters · 128K context · Q4_K_M quantization · 2.5 GB download · ~18 tok/sec

Microsoft's latest Phi generation. Strong reasoning capabilities, instruction following, and language understanding. Best-in-class quality for its size. Deep Mode recommended.

Use Case: Deep reasoning, complex instructions, high-quality responses
Licence: MIT
HuggingFace Model →
© 2025 Microsoft

StableLM Zephyr 1.6B

1.6 billion parameters · 4K context · Q4_K_S quantization · 989 MB download · ~28 tok/sec

Stability AI's lightweight instruction model. Excellent speed-to-quality ratio for everyday tasks. Minimal memory footprint, ideal for older devices or battery-sensitive workflows.

Use Case: Fast Mode, battery efficiency, older devices
Licence: Apache 2.0
HuggingFace Model →
© 2024 Stability AI

EXAONE Deep 2.4B

2.4 billion parameters · 32K context · Q4_K_M quantization · 1.6 GB download · ~22 tok/sec

LG AI Research's bilingual model (English/Korean). Strong multilingual capabilities and reasoning. Optimised for technical and analytical tasks with extended context support.

Use Case: Multilingual (Korean), technical tasks, analytical reasoning
Licence: Apache 2.0
HuggingFace Model →
© 2024 LG AI Research

EXAONE 4.0 1.2B Instruct

1.2 billion parameters · 65K context · Q4_K_M quantization · 851 MB download · ~30 tok/sec

LG AI's newest lightweight model with massive 65K context window. Excellent for document-heavy workflows and extended conversations. Fast inference with minimal storage footprint.

Use Case: Very long context, document analysis, extended sessions
Licence: Apache 2.0
HuggingFace Model →
© 2025 LG AI Research

AceInstruct 1.5B

1.5 billion parameters · 128K context · Q4_K_M quantization · 1.2 GB download · ~26 tok/sec

Instruction-tuned variant with strong following capabilities. Reliable for structured outputs, task execution, and complex multi-step instructions. Good balance of size and capability.

Use Case: Instruction following, structured tasks, multi-step workflows
Licence: Apache 2.0
HuggingFace Model →
© 2024 TinyLlama Team

Semantic Search (Planned)

Document retrieval system for RAG. Currently using hash-based fallback — semantic embeddings integration planned for future release.

all-MiniLM-L6-v2 (Planned)

Sentence embeddings · 384 dimensions · 22 million parameters · 90 MB

Compact sentence transformer for generating dense vector embeddings. Will convert imported documents into semantically searchable vectors. Planned integration with CoreML and local SQLite-vec storage.

⚠️ Current Implementation: Document retrieval uses deterministic hashing for development. Semantic search capabilities coming in future update.
Planned Use: Document search, context retrieval, RAG grounding
Licence: Apache 2.0
Sentence-BERT Docs →HuggingFace Model →
© 2019-2024 UKP Lab

Model Selection Philosophy

Every model in LiberaGPT is chosen for a specific purpose: speed, accuracy, context length, or specialisation. You control which models to download and use. Lightweight models provide fast, efficient responses for everyday tasks. Larger models offer more power when you need it. Voice models enable natural interaction. Embedding models make your documents searchable.

All models run entirely offline, all data stays local, and you're never locked into a single provider. This is honest AI: you see what's running, you choose what to install, and you understand the trade-offs.