Welcome to the new age of AI. Wait, you’re probably assuming we were already basking in the glow of artificial intelligence? Yes and no, at least according to Google. The tech monolith behind Gemini is ready to show us Gemini 2.0, which Google said will power agents to operate your phone or device and all its apps on your behalf with “agentic” AI.
Thank you for reading this post, don't forget to subscribe!What makes 2.0 a full upgrade over the first instance of Gemini introduced in December last year (though it took several months to come to Android and iPhone with Gemini 1.5)? For one, it’s supposed to offer faster responses based on your prompts. While Gemini 1.5 can generate AI images with the company’s Imagen 3 model, the new version will also have AI audio output. The most important aspect of 2.0, Google said, is that it’s made for AI agents.
If you haven’t heard the buzzword before, think of agents—or “agentification”—as multiple AI models communicating with each other. The idea is that the AI can essentially take over your phone at your request. If you want the AI to look into your emails, draw out your reservation for your date, and then put it into your calendar, an AI agent should be able to handle that.
This “universal assistant,” as Google calls it, starts with Gemini 2.0 Flash, which should be available starting Wednesday to all Gemini users. If you have Gemini Advanced, you get a new tool called Deep Research, a kind of AI agent that does all your internet-based research for you and then generates a large book report. Google’s tool is supposed to allow users to “do their own research” and generate reports in a few minutes. We’ll have to wait and see how many students try to pass these reports off as their own work.
Deep Research is technically an agent, but Google said more developers are also working on getting agentic AI working in their own apps. As far as examples go, the Mountain View, California company showed off how a Gemini 2.0-based agentic AI could create, evaluate, merge, and execute code on the fly. It’s a new “Jules” tool similar to Microsoft’s ongoing GitHub Copilot.
The company also promoted a video of Gemini 2.0 interacting with several Supercell mobile games like Clash of Clans. The AI could read the screen and offer advice about the current meta for Squad Busters. It could also remind players to complete their daily challenges to earn that sweet, sweet in-game currency. Is that all too exciting? No, not necessarily. The AI’s coaching seemed surface-level, not offering any advice or strategy beyond building picks you could look up yourself between matches.
The real “agentic” AI may be the long-awaited Project Astra from Google DeepMind. Gizmodo used it in a past iteration at Google I/O earlier this year. The tool is akin to Gemini Live but with far more vision and interpretation capabilities with your phone’s camera. The new iteration is supposed to result in better, more conversational dialogue. It’s also supposed to remember your conversations, and now it can operate with Google Search, Google Lens, and Google Maps.
There’s still no word on when Astra will be available to more users. For now, the feature is simply in the testing phase, and parts of it will likely be molded into various Gemini products in the future.
Google’s ‘Deep Research’ Model Is Supposed to Make You Feel Like a Google Search Pro with Gemini 2.0
The internet rabbit hole is deep, but Google says it has a new tool for digging for you. If you can trust everything, it rips from the net. The company’s Deep Research tool first creates a “research plan” that’s essentially an outline of an overall report. Then, it runs through a list of websites it finds applicable before laying it all out in a multi-page report, complete with some tables and graphs. It displays where it got its information at the very bottom.
The tool is available to all Gemini Advanced users in English starting today. It’s currently only available on desktop devices like Chromebooks or through the browser, though the mobile version should be available sometime next year.
This entire process can take some time. For instance, I asked Gemini Advanced to research Deep Mind’s history with Google before its acquisition in 2014. Gemini laid out its research plan, including DeepMind’s early stages and funding through its academic publications. Google says you need to select “1.5 Pro with Deep Research” from the drop-down menu, though it was not available on my account as of this writing. The AI chatbot claimed it knew what it was supposed to do, so I prompted it to finish the report; I started waiting… and waiting.
Google’s blog mentions that the tool should take “a few minutes” to refine its analysis. I prompted Gemini Advanced and asked how long the research would take. Gemini said it would take “6-12 hours to complete the report.”
The tool is obviously new, and the AI may not be accurate when it tells me the timetable, but I can’t help laughing. The AI takes about as long as it would take your average undergrad to create a last-minute report the night before it’s due.
Google’s Deep Research runs on the larger Gemini model with a 1 million token context window, but that doesn’t mean it’s necessarily accurate. Google’s AI Search can often pick out some websites that may not be the most accurate or may present sanitized information from untrustworthy sites. That’s why the Gemini tools remind users to “double-check” all their answers underneath the text prompt.
2024-12-11 19:24:54