Skip to main content

LLMs available

Each of these LLM models is provided in quantized format (Q5_K_M), which trades some precision for smaller memory footprint and faster inference, making them GPU-friendly.

  • EXAONE-3.5-24B-Instruct-Q5_K_M:

    • Large-scale 24B parameter instruct model
    • Optimized quantization (Q5_K_M) for efficiency
    • General-purpose reasoning and task completion
  • Qwen2.5-7B-Instruct-Q5_K_M:

    • 7B parameter instruct model from Alibaba’s Qwen series
    • Balanced for chat, summarization, and Q&A
    • Quantized for faster inference with moderate memory usage
    • Llama-3.2-3B-Instruct-Q5_K_M
  • Meta’s Llama 3.2 model (3B parameters)

    • Lightweight, efficient for instruction following
    • Good choice for smaller tasks and resource-limited runs
  • Qwen2.5-Coder-3B-Instruct-Q5_K_M

    • Specialized 3B model for code understanding & generation
    • Optimized for programming tasks (Python, JS, etc.)
    • Small footprint for cost-effective code inference
  • DeepSeek-R1-Distill-Llama-8B-Q5_K_M

    • Distilled 8B parameter model, derived from Llama
    • Balanced between speed and quality
    • Great for chatbots and general AI applications
    • Qwen2-0.5B-Instruct-Q5_K_M
  • Tiny 0.5B instruct model

    • Ultra-lightweight, very fast and cheap to run
    • Suitable for simple Q&A or rule-based inference

Available from: