Machine Learning

Advance the Science of Model Compression.

Develop new compression methods and ship them. Read the literature, prototype, benchmark on real models, integrate into our pipeline, and iterate with customers running compressed models in production.

Machine LearningFull-timeVienna / Remote

THE ROLE

About the Role

This is an applied research role. You'll develop new compression methods and ship them, not write papers about them. The cycle is short: read the literature, prototype, benchmark on real models, integrate into our pipeline, iterate with customers running compressed models in production.

You'll own significant technical scope from day one. Expect to work across the stack: pruning algorithms, quantization, evaluation infrastructure, and the production code that customers actually use.

WHAT YOU'LL DO

Your impact

  • Improve and extend our structural pruning algorithm to new architectures (MoE, multimodal, vision-language).
  • Combine pruning with quantization (NVFP4/FP8/INT4, sub-4 bit mixed precision) in our compression pipeline.
  • Expand and improve our model retraining pipeline (SFT, GKD, DPO, GRPO).
  • Compress customer models (Llama, Qwen, Gemma, and proprietary fine-tunes) for cloud and edge deployment.
  • Hardware-aware optimization for different accelerator targets (A100/H100/B300 and edge hardware).

WHAT YOU BRING

What we're looking for

  • PhD in computer science, machine learning, or equivalent.
  • Published work on quantization, pruning, or LLM training.
  • Production-grade Python code (not just Jupyter notebooks). You write code others can read and run.
  • Experience taking a method from paper to a working system on real models.
  • Comfort working with LLMs, GPUs, and evaluating benchmarks.
  • You ship. You finish things.

NICE TO HAVE

  • Open-source contributions to ML infrastructure (vLLM, llama.cpp, transformers, TensorRT-LLM, bitsandbytes, GPTQ/AWQ implementations).
  • Experience with MoE architectures or multimodal models (Qwen Omni).
  • Background in kernel optimization.

PRACTICAL

  • Vienna-based. Hybrid or fully remote.
  • Working language is English.
  • We sponsor visas and support relocation.
  • Compensation: €70–120k base + equity. Austrian minimum disclosed per Kollektivvertrag: €43,456/year.
  • We don't require writing publications, but we support presenting work at venues when it fits the company and the project.

Interested? Let's talk.

Send your resume and a brief note about your work to info@oracomputing.com

Apply Now →