But this is just the SFT - "distilled" model, not the one optimized with RL, rig... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		cpldcpu 9 months ago \| parent \| context \| favorite \| on: Magistral — the first reasoning model by Mistral A... But this is just the SFT - "distilled" model, not the one optimized with RL, right?

danielhanchen 9 months ago [–]

Oh I think it's SFT + RL as mentioned in the paper - they said combining both is actually more performant than just RL

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact