Metric | DeepSeek-Coder-V2-0724 | DeepSeek-V2.5 |
---|---|---|
AlpacaEval 2.0 | 44.5 | 50.5 |
ArenaHard | 66.3 | 76.2 |
AlignBench | 7.91 | 8.04 |
MT-Bench | 8.91 | 9.02 |
HumanEval python | 87.2 | 89 |
HumanEval Multi | 74.8 | 73.8 |
LiveCodeBench(01-09) | 39.7 | 41.8 |
Aider | 72.9 | 72.2 |
SWE-verified | 19 | 16.8 |
DS-FIM-Eval | 73.2 | 78.3 |
DS-Arena-Code | 49.5 | 63.1 |
Option+O
, write your instructions, and see the output in diff style.