1 year ago 1 year ago

Google’s experimental models outperform even OpenAI’s GPT-4o

1 min

by Olsi 1 year ago1 year ago

10views

Chatbot Arena is an open platform for public AI benchmarking. Over the past two years, OpenAI’s models have remained at the top of most artificial intelligence benchmarks. In some categories, Google’s Gemini models and Anthropic’s Claude models scored better than OpenAI’s models, but overall OpenAI’s models remained comfortably on top. However now, Chatbot Arena has revealed a new experimental model from Google called Gemini-Exp-1114 that has been tested with over 6,000 community votes over the past week and is now joint 1st with ChatGPT- 4o-latest (2024-09-03) by OpenAI. Compared to the last Gemini model, the overall score in Arena increased from 1301 to 1344. It is important to note that the score of this new model even surpasses OpenAI’s o1-preview model.

According to Chatbot Arena, Gemini-Exp-1114 is now ranked No. 1 of the Vision leaderboard. It is also ranked No. 1 in the following categories:

Mathematics
Creative writing
Longer question
Follow instructions
Multiple shifts
Difficult prompts

This new model is ranked #3 in coding and Hard Prompts with Style Control. OpenAI’s o1-preview model leads the coding and style checking category. When Gemini is compared to other similar AI models in terms of the overall win rate table, it wins by 50% against GPT-4o-latest, 56% against o1-preview, and 62% against Claude-3.5-Sonnet. Last September, Google released the refreshed Gemini 1.5 series models that offer a ~7% increase in MMLU-Pro, ~20% improvement in MATH and HiddenMath benchmarks, and ~2-7% improvements in vision and code use cases. The overall usefulness of the model responses has also been improved. Google claims the new model responds in a more concise style. Also, the default output length of updated models is ~5-20% shorter than previous models.

Developers can test this model in Google AI Studio right now, and it will soon be available through the API.