This is an automated archive made by the Lemmit Bot.
The original was posted on /r/singularity by /u/UnknownEssence on 2024-09-24 18:19:24+00:00.
GEMINI 1.5 PRO:
Capability | Benchmark | May 2024 | Sep 2024 |
---|---|---|---|
General | MMLU-Pro | 69.0% | 75.8% |
Code | Natural2Code | 82.6% | 85.4% |
Math | MATH | 67.7% | 86.5% |
HiddenMath | 28.0% | 52.0% | |
Reasoning | GPQA (diamond) | 46.0% | 59.1% |
Multilingual | WMT23 | 75.3 | 75.1 |
Long Context | MRCR (1M) | 70.5% | 82.6% |
Image | MMMU | 62.2% | 65.9% |
Vibe-Eval (Reka) | 48.9% | 53.9% | |
MathVista | 63.9% | 68.1% | |
Audio | FLEURS (55 lang) | 6.5% | 6.7% |
Video | Video-MME | 77.9% | 78.6% |
Safety | XSTest | 88.4% | 98.8% |
GEMINI 1.5 FLASH:
Capability | Benchmark | May 2024 | Sep 2024 |
---|---|---|---|
General | MMLU-Pro | 59.1% | 67.3% |
Code | Natural2Code | 77.2% | 79.8% |
Math | MATH | 54.9% | 77.9% |
HiddenMath | 20.3% | 47.2% | |
Reasoning | GPQA (diamond) | 41.4% | 51.0% |
Multilingual | WMT23 | 74.1 | 73.9 |
Long Context | MRCR (1M) | 70.1% | 71.9% |
Image | MMMU | 56.1% | 62.3% |
Vibe-Eval (Reka) | 44.8% | 48.9% | |
MathVista | 58.4% | 65.8% | |
Audio | FLEURS (55 lang) | 9.8% | 9.6% |
Video | Video-MME | 74.7% | 76.1% |
Safety | XSTest | 86.9% | 97.0% |
You must log in or register to comment.