Google has released three new models at once: Gemini-2.0-Pro is free, has an outstanding score and ranks first, and is suitable for coding and processing complex prompts!

The story of Gemi n i 2.0 is accelerating.

The Flash Thinking Experimental version in December brought developers a working model with low latency and high performance.

Earlier this year, 2.0 Flash Thinking Experimental was updated in the Google AI Studio to further improve performance by combining the speed of Flash with enhanced inference capabilities.

Last week, the updated version 2.0 Flash was fully launched on the Gemini desktop and mobile apps.

Today, three new members have been unveiled at the same time: the experimental version of Gemini 2.0 Pro, which has so far performed best in coding and complex prompts, the cost-effective 2.0 Flash-Lite, and the thinking-enhanced version 2.0 Flash Thinking.

Gemini 2.0 Pro ranks first in all categories. Gemini-2.0-Flash ranks in the top three in coding, math, and puzzles. Flash-lite ranks in the top ten in all categories.

A comparison chart of the three models’ abilities:

All models support multimodal input and output text.

More modal abilities are on the way. Model strength chart in the coding arena

Win rate heat map

Google treats free users better than OpenAI treats Plus users. Free access to Gemini 2.0 Pro Experimental in AI Studio:

Click to play

Deepseek service always displays an error waiting… Remember that the first inference-free model was also 2.0 Flash Thinking, which was used in Google aistudio.

In addition, there is the web version of Gemini:

There is also a connected inference model (so why separate it…)

Google released the experimental version of Gemini 2.0 Pro, and the improvement in official benchmark tests is quite eye-catching.

It has the most powerful coding capabilities and the ability to process complex prompts, and has a better ability to understand and reason about world knowledge than any model released by Google so far.

It has the largest context window (200k, and my long context is a relatively big advantage of the Gemini model), which enables it to comprehensively analyze and understand a large amount of information, and to call tools such as Google search and code execution.

In the MATH test, it achieved 91.8%, an increase of about 5 percentage points over version 1.5. GPQA reasoning ability reached 64.7%, and SimpleQA world knowledge test even reached 44.3%.

The most notable is the programming ability. It achieved 36.0% in the LiveCodeBench test, and the Bird-SQL conversion accuracy exceeded 59.3%. Coupled with the super-large context window of 2 million tokens, it is enough to handle the most complex code analysis tasks.

You can try it out in the cursor.

The multi-language understanding ability is also impressive, with a Global MMLU test score of 86.5%. Image understanding MMMU is 72.7%, and video analysis ability is 71.9%.

Gemini 2.0 Flash-Lite is an interesting balance.

It maintains the speed and cost of 1.5 Flash, but brings better performance. The context window with 1 million tokens allows it to process more information.

The most practical thing is its price/performance ratio: caption generation for 40,000 photos costs less than $1. This makes AI more down-to-earth.

Blogger Shrivastava mentioned: Gemini 2.0 Pro encoding is crazy!

Tip: use Three.js to create a solar system simulation. Add a time scale, a focus drop-down menu, show orbits and show labels. Create everything in one file so I can paste it into an online editor and view the output.

In addition, some users mentioned that Gemini 2.0 Flash produced better results in one of his own paradox tests:

Finally, Google mentioned that the security of Gemini 2.0, not just the patch, is at the core of the design from the beginning.

Let the model learn to be self-critical. Use reinforcement learning to let Gemini evaluate its own answers and provide more accurate feedback. This makes it more robust when dealing with sensitive topics.

The automated red team testing is interesting. It is specifically designed to prevent the injection of indirect prompt words, which is like equipping the AI with an immune system to prevent someone from hiding malicious commands in the data.

Uncategorized

What can Deepseek achieve? Even OpenAI can’t do it?

Byzddeepseeker February 10, 2025February 10, 2025

The true value of DeepSeek is underestimated! DeepSeek-R1 has undoubtedly brought a new wave of enthusiasm to the market. Not only are the relevant so-called beneficiary targets rising sharply, but some people have even developed DeepSeek-related courses and software in an attempt to make money from it. We believe that although these phenomena have a…

Uncategorized

DeepSeek-R1-0528 Update: Deeper Thinking, Stronger Reasoning

Byzddeepseeker May 29, 2025May 29, 2025

The DeepSeek R1 model has undergone a minor version upgrade, with the current version being DeepSeek-R1-0528. When you enter the DeepSeek webpage or app, enable the “Deep Thinking” feature in the dialogue interface to experience the latest version. The DeepSeek-R1-0528 model weights have been uploaded to HuggingFace Over the past four months, DeepSeek-R1 has undergone…

Uncategorized

DeepSeek has done it! OpenAI admits closed source mistake, leading edge advantage becomes smaller

Byzddeepseeker February 2, 2025February 2, 2025

After OpenAI released the o3-mini model, its CEO Sam Altman, Chief Research Officer Mark Chen, Chief Product Officer Kevin Weil; Vice President of Engineering Srinivas Narayanan, Head of API Research Michelle Pokrass, and Head of Research Hongyu Ren, conducted an online technical Q&A on reddit, one of the world’s largest comprehensive forums. The main topics…

Uncategorized

A comprehensive comparison of OpenAI’s newly released o3-mini and DeepSeek R1

Byzddeepseeker February 1, 2025February 1, 2025

OpenAI has released its latest inference model, o3-mini, which is optimized for fields such as science, mathematics, and programming, providing faster response, higher accuracy, and lower cost. Compared with its predecessor o1-mini, o3-mini has significantly improved its inference capabilities, especially in solving complex problems. Testers prefer o3-mini’s answers by 56%, and the error rate has…

Uncategorized

Altman: We were wrong about open source AI! DeepSeek has made OpenAI less advantageous, and the next one is GPT-5

Byzddeepseeker February 1, 2025February 1, 2025

o3-mini arrived late at night, and OpenAI finally revealed its latest trump card. During a Reddit AMA Q&A, Altman deeply confessed that he had stood on the wrong side of the open source AI. He said that the internal strategy of open source is being considered, and the model will continue to be developed, but…

Uncategorized

Breaking news! DeepSeek researcher reveals online: R1 training only took two to three weeks, and a powerful evolution of R1 zero was observed during the Chinese New Year holiday

Byzddeepseeker February 4, 2025February 4, 2025

Breaking news! DeepSeek researcher reveals online: R1 training only took two to three weeks, and a powerful evolution of R1 zero was observed during the Chinese New Year holiday Just now, we noticed that DeepSeek researcher Daya Guo responded to netizens’ questions about DeepSeek R1 and the company’s plans going forward. We can only say…

Similar Posts

Leave a Reply Cancel reply