ORA MODELS
Compressed Models,
Ready to Deploy.
All models are compressed using OraCompress and verified against baseline benchmarks. Available on Hugging Face.
ORA-Llama-47B
From: Meta Llama 3.1 70B
LanguageInstruction-tuned
47B
Parameters
4.1×
Throughput vs. original
72%
Lower cost per token
1 GPU
vs. 4 GPUs
Compatible with:vLLMllama.cpp
ORA-Qwen-3.5 9B
From: Qwen 3.5 9B
LanguageMultilingual
5.7 GB
Memory footprint
70%
Smaller than original
3.9-bit
Mixed quantization
5%
Accuracy drop
Compatible with:vLLMllama.cpp