mlxoptiq 0.0.6

Image for article mlxoptiq 0.0.6
News Source : Pypi.org

News Summary

  • OptIQ turns "uniform 4-bit" into a data-driven, per-layer budget.
  • Sensitive layers stay at 8-bit; the rest get 4- bit.
  • This works for both model weights and the KV cache at serving time.
  • Comes with an OpenAI-compatible server (optiq serve) that wraps mlx_lm.server with KV path built in.
  • Pre-built OptIQ-quantized models on HuggingFace: Models.
  • Use a pre-built model (stock mlx-lm, no OptIQ code required):Serve with mixed-precision KV (new in v0.0.5):Convert a new model:Weight quantization — GSM8K vs uniform 4-bits:KV-cache serving — decode tok/s at 64k context (Apple M3 Max 36GB):Full tables, methodology, and per- layer configs on the Results page.
A required part of this site couldnt load. This may be due to a browser extension, network issues, or browser settings. Please check your connection, disable any ad blockers, or try using a diffe [+12 chars]

Must read Articles