LoRA and QLoRA on 70B-class models, no quantization tax.
Llama 3.3 70B, Mistral Large, Qwen 2.5 72B — load them in fp16 on a single Apple Silicon Studio and train adapters in MLX. A 10K-example LoRA finishes in an evening.
Talk to an engineer →Austin, Texas · Apple Silicon Training Cluster
Per-hour pricing, US data residency, no spot-market roulette.
Frameworks we train with — and the models we train every week
What it's for
Llama 3.3 70B, Mistral Large, Qwen 2.5 72B — load them in fp16 on a single Apple Silicon Studio and train adapters in MLX. A 10K-example LoRA finishes in an evening.
Talk to an engineer →Mistral 7B, Qwen 2.5 7B, Phi-4, Gemma 2 9B — full supervised fine-tuning in MLX, including every parameter, every optimizer state, every gradient. 50K examples in well under an afternoon.
Talk to an engineer →Preference-tuning loops on the same machine your SFT lived on. We maintain the MLX training recipes and the eval glue. You bring the preference pairs.
Send us a brief →Why Deliany
Memory-bound runs above 32B don't have to be quantized or sharded. Apple Silicon's unified memory architecture lets the whole model live in one coherent address space — and the math gets very friendly very quickly.
MLX ships with mixed precision, gradient checkpointing, distributed training, and PEFT helpers. We maintain the production recipes for every flow we run.
Every account gets a shared Slack Connect channel that routes to the engineers who maintain the recipes and own the fleet. No tier-1 scripts. No bots.
Hourly, monthly, or per-run. Quoted in plain English. No spot market. No commit minimums. No surprise invoices.
How it works
Spin up an Apple Silicon node or a Studio. SSH in, run your MLX script, pull the adapter down when it's done. Resumable checkpoints and HuggingFace cache mounts standard.
Talk to an engineer →Hand us a JSONL and a one-line training config (we have a starter for every flow). We run, monitor, eval on a held-out slice, and hand back a tested adapter with a brief writeup. Flat per-run pricing.
Send us a brief →Start a fine-tune today
Pick a tier or send us a dataset. Self-serve nodes spin up in under an hour. Managed runs typically finish the same day.