mlxoptiq 0.0.6
News Source : Pypi.org
News Summary
- OptIQ turns "uniform 4-bit" into a data-driven, per-layer budget.
- Sensitive layers stay at 8-bit; the rest get 4- bit.
- This works for both model weights and the KV cache at serving time.
- Comes with an OpenAI-compatible server (optiq serve) that wraps mlx_lm.server with KV path built in.
- Pre-built OptIQ-quantized models on HuggingFace: Models.
- Use a pre-built model (stock mlx-lm, no OptIQ code required):Serve with mixed-precision KV (new in v0.0.5):Convert a new model:Weight quantization — GSM8K vs uniform 4-bits:KV-cache serving — decode tok/s at 64k context (Apple M3 Max 36GB):Full tables, methodology, and per- layer configs on the Results page.
A required part of this site couldnt load. This may be due to a browser extension, network issues, or browser settings. Please check your connection, disable any ad blockers, or try using a diffe [+12 chars]
Never miss a story from us, subscribe to our newsletter