data:image/s3,"s3://crabby-images/5e615/5e61501bffcf13626078c91ddea3e2518a17614e" alt=""
The story of Gemini 2.0 is accelerating.
The Flash Thinking Experimental version in December brought developers a working model with low latency and high performance.
Earlier this year, 2.0 Flash Thinking Experimental was updated in the Google AI Studio to further improve performance by combining the speed of Flash with enhanced inference capabilities.
Last week, the updated version 2.0 Flash was fully launched on the Gemini desktop and mobile apps.
Today, three new members have been unveiled at the same time: the experimental version of Gemini 2.0 Pro, which has so far performed best in coding and complex prompts, the cost-effective 2.0 Flash-Lite, and the thinking-enhanced version 2.0 Flash Thinking.
Gemini 2.0 Pro ranks first in all categories. Gemini-2.0-Flash ranks in the top three in coding, math, and puzzles. Flash-lite ranks in the top ten in all categories.
data:image/s3,"s3://crabby-images/b0e20/b0e201296a51a00600174c05d7bcb79b2d8feb48" alt=""
data:image/s3,"s3://crabby-images/374c1/374c186bde4ce61e1a19a0fa941d4acb2c2fe82e" alt=""
A comparison chart of the three models’ abilities:
data:image/s3,"s3://crabby-images/9b810/9b8105fb13046c0599d7b671a653983c592cc57b" alt=""
All models support multimodal input and output text.
More modal abilities are on the way. Model strength chart in the coding arena
data:image/s3,"s3://crabby-images/29b3d/29b3d3a11614c8b3d376ecf57a5ba4fcb720b858" alt=""
Win rate heat map
data:image/s3,"s3://crabby-images/af509/af509e280afd7d3559440451e38bc4189c11d99a" alt=""
Google treats free users better than OpenAI treats Plus users. Free access to Gemini 2.0 Pro Experimental in AI Studio:
data:image/s3,"s3://crabby-images/0cea2/0cea25c6d752acf820a963c9283ceeaf1eb0e0ee" alt=""
Deepseek service always displays an error waiting… Remember that the first inference-free model was also 2.0 Flash Thinking, which was used in Google aistudio.
data:image/s3,"s3://crabby-images/6f2e9/6f2e9a8be45c0ac509e7c57950565479a50af91c" alt=""
In addition, there is the web version of Gemini:
There is also a connected inference model (so why separate it…)
data:image/s3,"s3://crabby-images/b5a8a/b5a8a58d6bfd7c5633554e935a45b89561410f27" alt=""
Google released the experimental version of Gemini 2.0 Pro, and the improvement in official benchmark tests is quite eye-catching.
data:image/s3,"s3://crabby-images/f250f/f250fe27c682d628c1b0519d692b98c873a1207d" alt=""
It has the most powerful coding capabilities and the ability to process complex prompts, and has a better ability to understand and reason about world knowledge than any model released by Google so far.
It has the largest context window (200k, and my long context is a relatively big advantage of the Gemini model), which enables it to comprehensively analyze and understand a large amount of information, and to call tools such as Google search and code execution.
In the MATH test, it achieved 91.8%, an increase of about 5 percentage points over version 1.5. GPQA reasoning ability reached 64.7%, and SimpleQA world knowledge test even reached 44.3%.
The most notable is the programming ability. It achieved 36.0% in the LiveCodeBench test, and the Bird-SQL conversion accuracy exceeded 59.3%. Coupled with the super-large context window of 2 million tokens, it is enough to handle the most complex code analysis tasks.
data:image/s3,"s3://crabby-images/da21d/da21d216f2040003f8634f420dfe4bc791a59c6b" alt=""
You can try it out in the cursor.
The multi-language understanding ability is also impressive, with a Global MMLU test score of 86.5%. Image understanding MMMU is 72.7%, and video analysis ability is 71.9%.
Gemini 2.0 Flash-Lite is an interesting balance.
It maintains the speed and cost of 1.5 Flash, but brings better performance. The context window with 1 million tokens allows it to process more information.
The most practical thing is its price/performance ratio: caption generation for 40,000 photos costs less than $1. This makes AI more down-to-earth.
data:image/s3,"s3://crabby-images/57e90/57e9059af75451e400d7c2f9116edab6c085b5fb" alt=""
Blogger Shrivastava mentioned: Gemini 2.0 Pro encoding is crazy!
Tip: use Three.js to create a solar system simulation. Add a time scale, a focus drop-down menu, show orbits and show labels. Create everything in one file so I can paste it into an online editor and view the output.
data:image/s3,"s3://crabby-images/d91ba/d91badcf081d30d97ba7f2be4683e89e99242a50" alt=""
In addition, some users mentioned that Gemini 2.0 Flash produced better results in one of his own paradox tests:
data:image/s3,"s3://crabby-images/5d8d2/5d8d2894f24e38d22eeb1505a0ce2e257f2f09e5" alt=""
Finally, Google mentioned that the security of Gemini 2.0, not just the patch, is at the core of the design from the beginning.
Let the model learn to be self-critical. Use reinforcement learning to let Gemini evaluate its own answers and provide more accurate feedback. This makes it more robust when dealing with sensitive topics.
The automated red team testing is interesting. It is specifically designed to prevent the injection of indirect prompt words, which is like equipping the AI with an immune system to prevent someone from hiding malicious commands in the data.