Run LLMs natively in Ruby with Rust + GPU support
Red Candle is a Ruby gem that lets you run Llama 2, Llama 3, Mistral, and Gemma large language models directly inside your Ruby process using Rust.
It uses Magnus to interface with Hugging Face’s Candle crate, giving your Ruby app direct access to the LLM through FFI, with no Python or separate server needed.
Red Candle supports:
- Chat completions (including streaming)
- Embeddings
- Reranking
- Named entity recognition
- Hardware acceleration (Metal and CUDA)
- Both safetensors and gguf quantized model formats
Post a comment