RTX Pro 6000, ends up taking ~66GB when running the MXFP4 native quant with llam...

		embedding-shape 2 days ago \| parent \| context \| favorite \| on: Mistral releases Devstral2 and Mistral Vibe CLI RTX Pro 6000, ends up taking ~66GB when running the MXFP4 native quant with llama-server/llama.cpp and max context, as an example. Guess you could do it with two 5090s with slightly less context, or different software aimed at memory usage efficiency.

That has 96GB GDDR7 ECC, to save people looking it up.