I don't follow closely all these benchmarks but I would love to have some idea of the status of models for these specific use cases. Average intelligence is close for each mainstream models, but on writing, design, coding, search, there is still some gaps.
Even if it's not benchmark, a vibe test from a trusted professionnal with a close use case to mine would suffice.
Your point about ecosystem is true, I just switched main main provider from OpenAI to Anthropic because they continue to prove they have a good concrete vision about AI
I don't follow closely all these benchmarks but I would love to have some idea of the status of models for these specific use cases. Average intelligence is close for each mainstream models, but on writing, design, coding, search, there is still some gaps.
Even if it's not benchmark, a vibe test from a trusted professionnal with a close use case to mine would suffice.
Your point about ecosystem is true, I just switched main main provider from OpenAI to Anthropic because they continue to prove they have a good concrete vision about AI
Would be nice to include similar sized open (source/weights) ones.
Just tried devstral 2 (123B from Mistral) it scored 76% ... Disappointing
That's true until you try to use them for a real task. Then the differences become clear as day.