11 annoying tasks Google Gemini will soon handle for you

Key Takeaways

Google’s Gemini enhances AI search capabilities, easily generates audio content & text with photos, handles large data like videos.
Gemini facilitates simplified Gmail usage by automating tasks, answering questions. Beta rolling out to Lab users in September.
Android users can use Google Gemini in more apps for live video searches, near real-time scam call detection & multimedia AI task handling.

From the time Alphabet CEO Sundar Pichai walked onto the annual Google I/O stage to the time the two-hour-long event wrapped up, the team would mention AI more than 120 times. That count, of course, is according to Gemini itself. The annual event held in California on May 14 was heavily focused on Gemini 1.5 Pro, Google’s latest update to the AI platform formerly known as Bard.

Google I/O 2024: The 13 biggest announcements from the show

Android 15 wasn’t the focus at all. Instead, it was AI, AI, AI.

The updates coming to Google Gemini focus on “making AI helpful for everyone,” as Pichai described. Key to the newest AI skills are the ability to mix and match text with audio, photos and video as well as the ability to now handle one million tokens (or two million, for developers). That will soon empower Gemini to use your phone’s camera to ask questions about your surroundings, have Gemini return that online order you didn’t like, or recognize scam calls on Android in real time, to name just a few of the on-stage demonstrations.

The one million token capability and faster Gemini 1.5 Pro is rolling out beginning today for Gemini Advanced subscribers, while other AI tricks from the I/O stage were just teasers of what’s currently under development.

If you missed the biggest announcements coming from Google’s largest developers conference, or perhaps tuned out after the first Taylor Swift joke, we’ve rounded up the biggest problems that Google’s AI will soon attempt to solve.

1 Searching the web when you don’t know exactly what to search for

You could soon search with video

AI Overviews - Search with Video (still) copy 2

Google

With the latest updates, Pichai says Gemini will even do the Googling for you. Rolling out today, searchers will be able to ask Google a question and have Gemini answer right in Search.

But perhaps the more powerful tool is the ability to search when you don’t have the right words to explain what you are looking for. In the coming weeks, Google is rolling out video capabilities in Search. In the demonstration, the company showed how you could use video to fix a record player or a film camera when you don’t even know what the name of the broken part is or why its not working.

Google

Google’s AI will soon power a more powerful web search that allows you to ask multiple questions in one. Multistep reasoning capabilities allow Search to answer multi-part questions. For example, the company demoed searching not just for a nearby yoga studio, but searching for specific characteristics, like studios that are beginner-friendly and within walking distance.

If you don’t know what to ask, Google says Search will soon get AI organization, rolling out to dining first. This means you can search for a place to spend your anniversary dinner, and Search will organize into different options to give you more ideas, like rooftop dining or historic places. While the organization is heading first to dining, it will soon also roll out books, music, shopping, hotels and more.

2 Ask about real world objects in real time

Give Gemini a live camera view and get real-time data

Google

Alphabet’s AI will soon help users search in the world around them, much like Google Search helps find things on the web. During I/O, the company demonstrated Project Astra, which uses live video to search the surroundings in real-time, tackling things like finding a specific book on your physical bookshelf to asking where you left your glasses.

During the demonstration, the feature worked both on a smartphone and using AR glasses. The demo also showed asking the AI questions in real-time, from locating a specific object to showing the AI code and asking what it does.

Did Google sneak a pair of A/R glasses into its I/O demo?

Despite no mention of them at all, Google may have dropped some big hardware news at its IO event. Could we see the return of Google Glass?

The beginnings of these video features will be rolling out to the Gemini app later this year.

3 Consolidate long-form content, even across multiple apps

Subscribers can feed the AI up to 1,500 PDF pages

Google

One of the biggest features arriving with Gemini 1.5 is the ability to handle long-form content, thanks to support for one million tokens for Gemini Advanced subscribers. (Developers will now be able to use up to two million tokens). Tokens indicate how much data the AI can handle at once, with the one million token limit meaning Gemini could summarize a PDF up to 1,500 pages or a video up to one hour long.

OpenAI finally has a ChatGPT desktop app. Mac users get first dibs

A Windows version will be launched “later this year,” according to OpenAI.

But the update doesn’t just bring the ability to handle large amounts of data, but the ability to work across multiple apps. For example, you can ask Gemini to summarize all the emails from your child’s school in Gmail, but it can also read the Google Meet board meeting and summarize that as well.

4 Transform large data into a new format

Turn your study notes into an auditory lecture

AI Overviews - Break it Down (still) copy 2

Google

Gemini’s large data summarization capabilities sound impressive, but Gemini will also be able to change the format of that data. It isn’t limited to summarizing text and then spitting out more text — it can tell you about those documents audibly.

Google is bringing homework help and a multimodal Gemini Nano to Android

Math and science questions could soon be trivial if you’ve got an Android phone.

According to the demo, you can even interrupt this summary to ask more questions. In the demo, this capability was used to consolidate multiple resources from a student to generate a study guide, take practice tests, or listen to an audible lecture on the topic.

5 Search your photos for answers

Gemini can use your photos to answer personalized questions

Google

Gemini’s enhanced search capabilities also extend to Photos. Yes, Google Photos already has a search box. But, instead of delivering multiple images of your car when you ask it for your license plate number, Gemini can soon jump straight to the answer, listing your license plate number instead of a hundred photos of your car that might contain the correct information.

Gemini will make searching your overwhelming Google Photos library suddenly easy

Searching through years of your personal photos might soon be easy as pie.

You can also soon ask it milestone questions, like when your child first learned to swim, and it will simply tell you the answer rather than displaying all photos of a swimming pool.

6 Generate more detailed photos, even with text

Generative photos, video and music also gets a major boost

Google

The Gemini updates also extend to its generative capabilities for images, video and music. A key update for images is the ability to handle text. AI typically can’t place text on an image without creating nonsensical, misspelled words. Google’s Senior Research Director Doug Eck says that the new Imagen 3 creates more detailed generative images with fewer distortions, but is also better at rendering text. (OpenAI similarly announced enhanced capabilities with text on images during its event yesterday.)

Video generation also gets a boost with Veo, the new generative video model. It delivers more tools like creating aerial images and timelapses, along with tools like extending the length of an existing video.

Google Keynote (Google I_O ‘24) 1-49-12 screenshot

How I joined the waitlist for Google’s Veo AI video tool

Google’s Veo takes text prompts and turns it into video, and you can sign up for its experimental tool waitlist today.

The photo and video capabilities, along with enhanced music AI, don’t yet have a launch date but are available to select creators through Google Labs, with a waitlist open now.

7 Summarize tasks in Gmail

Gemini can soon automate tasks for you

justin-morgan / Unsplash

Gmail’s AI integration is about to get a lot more advanced than simple reply suggestions. Rolling out to Google Lab users this September, Gemini will soon power tasks like asking your Gmail questions. It can also create rules for future emails, like adding a receipt sent to your email to an expense tracker in Sheets, then continuing to update that document with new Sheets.

9 Gmail settings I immediately change to improve my email experience on iPhone

If you’re using the Gmail app on iPhone, there are some tweaks and key settings you can do change the Gmail app and make it more useful.

Those features begin rolling out to Google Labs in September.

8 Answer questions or flag scammers inside Android apps

Android users can use Gemini within more key apps

Google

Gemini on Android builds the AI directly into the operating system, which allows Android users to work with the AI without leaving the app that they are in. The Gemini overlay will soon work in more Android apps. That enables tasks like asking a question in YouTube to get an answer generated from the video that you are watching. Gemini Advanced subscribers will also have access to “Ask this PDF,” a rollout coming in the next few months.

Gemini AI is Google’s new secret weapon against spam calls

Pixel phones are morphing into the bane of every phone scammers’ existence.

Part of this integrated Android AI experience is scam detection, where the AI listens to your calls and immediately alerts you if it suspects the caller is a scammer. Google says that this feature is currently in testing.

9 Let AI Agents to do the work for you

Gemini can handle more tasks like filling out forms with less input from you

Google/ Christina Darby

Gemini can already write your emails for you, but with Agents, Gemini can take more actions for you. During I/O, the company demonstrated how Gemini could help you return a pair of shoes by locating a receipt in your Gmail, filling out the return form for you, and even scheduling a package pickup. Or, it could help update your address after you move across all the different services that you use. The company says that the Agents work under your supervision but are able to reason, plan and think multiple steps ahead.

10 Aid in learning with LearnFM

LearnNM is a new model of Gemini specific for education

Google

Much of the demonstrations centered on how a student (or a parent of a student) can use AI for learning. LearnNM is an educational model of Gemini that’s designed specifically to help with homework, like creating a study guide or practice tests, or using the camera to help solve a math problem.

The ChatGPT app on an iPhone next to a MacBook

10 ChatGPT prompts to unlock the full power of OpenAI’s chatbot

Want to get the most out of ChatGPT? Try these prompts to unleash its full potential and make the AI work harder for you.

11 Customize the AI interaction with Gems

Like GPTs, Gemini can soon customize your interactions

Google

Another key I/O update will change the way that users can interact with Gemini. Gems are personalized forms of Gemini that are designed for specific interaction. Users can tell the program how they want it to act, say, to create a writing tutor or get peer review on software code. Gems are as simple as typing out how you want Gemini to act for you. But, Google will also create some pre-made Gems for common tasks, a feature that feels similar to ChatGPT’s range of custom GPTs.

The update is the latest in Google’s heavy commitment to AI this year. In 2024 alone, Google has renamed Bard to Gemini, created the Gemini Advanced subscription, created the first smartphone with AI built-in with the Pixel 8 Pro, and added image generation. The latest announcements at Google I/O make good on the company’s previous promises to bring the AI into Search.

The Pixel 8 Pro’s latest update allows users to record body temps. Here’s how

The Pixel 8 Pro’s Thermometer app can record body temps and random objects. We’ll show you how to use it, and why it might not be very accurate

Google Gemini, formerly Bard, is the company’s artificial intelligence platform that includes not just a browser chatbot but integration into various Google tools, from helping write emails to working in Sheets. Gemini is multimodal, which means the AI can understand written text as well as images, video, code and audio.