LLMs available
Each of these LLM models is provided in quantized format (Q5_K_M), which trades some precision for smaller memory footprint and faster inference, making them GPU-friendly.
-
EXAONE-3.5-24B-Instruct-Q5_K_M:
- Large-scale 24B parameter instruct model
- Optimized quantization (Q5_K_M) for efficiency
- General-purpose reasoning and task completion
-
Qwen2.5-7B-Instruct-Q5_K_M:
- 7B parameter instruct model from Alibaba’s Qwen series
- Balanced for chat, summarization, and Q&A
- Quantized for faster inference with moderate memory usage
- Llama-3.2-3B-Instruct-Q5_K_M
-
Meta’s Llama 3.2 model (3B parameters)
- Lightweight, efficient for instruction following
- Good choice for smaller tasks and resource-limited runs
-
Qwen2.5-Coder-3B-Instruct-Q5_K_M
- Specialized 3B model for code understanding & generation
- Optimized for programming tasks (Python, JS, etc.)
- Small footprint for cost-effective code inference
-
DeepSeek-R1-Distill-Llama-8B-Q5_K_M
- Distilled 8B parameter model, derived from Llama
- Balanced between speed and quality
- Great for chatbots and general AI applications
- Qwen2-0.5B-Instruct-Q5_K_M
-
Tiny 0.5B instruct model
- Ultra-lightweight, very fast and cheap to run
- Suitable for simple Q&A or rule-based inference
Available from: