ORA MODELS

Compressed Models,Ready to Deploy.

All models are compressed using OraCompress and verified against baseline benchmarks. Available on Hugging Face.

ORA-Llama-47B

From: Meta Llama 3.1 70B

Available
LanguageInstruction-tuned

47B

Parameters

4.1×

Throughput vs. original

72%

Lower cost per token

1 GPU

vs. 4 GPUs

Compatible with:vLLMllama.cpp

ORA-Qwen-3.5 9B

From: Qwen 3.5 9B

Available
LanguageMultilingual

5.7 GB

Memory footprint

70%

Smaller than original

3.9-bit

Mixed quantization

5%

Accuracy drop

Compatible with:vLLMllama.cpp