Gemini API I/O updates – Google Developers Blog


The Gemini API provides builders a streamlined approach to construct cutting edge programs with state of the art generative AI fashions. Google AI Studio simplifies this process of checking out the entire API features taking into account speedy prototyping and experimentation with textual content, symbol, or even video activates. When builders wish to check and construct at scale they are able to leverage the entire features to be had during the Gemini API.


New fashions to be had during the API

Gemini 2.5 Flash Preview – We’ve added a brand new 2.5 Flash preview (gemini-2.5-flash-preview-05-20) which is best over the former preview at reasoning, code, and lengthy context. This model of two.5 Flash is recently #2 at the LMarena leaderboard at the back of best 2.5 Professional. We’ve additionally stepped forward Flash cost-efficiency with this newest replace lowering the collection of tokens wanted for a similar efficiency, leading to 22% performance beneficial properties on our evals. Our function is to stay making improvements to in line with your comments, and make each normally to be had quickly.

Gemini 2.5 Professional and Flash text-to-speech (TTS) – We additionally introduced 2.5 Professional and Flash previews for text-to-speech (TTS) that strengthen local audio output for each unmarried and more than one audio system, throughout 24 languages. With those fashions, you’ll keep an eye on TTS expression and elegance, developing wealthy audio output. With multispeaker, you’ll generate conversations with more than one distinct voices for dynamic interactions.

Gemini 2.5 Flash local audio conversation – In preview, this type is to be had by the use of the Are living API to generate pure sounding voices for dialog, in over 30 distinct voices and 24+ languages. We’ve additionally added proactive audio so the type can distinguish between the speaker and background conversations, so it is aware of when to reply. As well as, the type responds accurately to a consumer’s emotional expression and tone. A separate considering type allows extra complicated queries. This now makes it conceivable so that you can construct conversational AI brokers and reports that really feel extra intuitive and pure, like improving name middle interactions, creating dynamic personas, crafting distinctive voice characters, and extra.

Lyria RealTime – Are living track era is now to be had within the Gemini API and Google AI Studio to create a continual flow of instrumental track the usage of textual content activates. With Lyria RealTime, we use WebSockets to ascertain a chronic, real-time conversation channel. The type often produces track in small, flowing chunks and adapts in line with inputs. Consider including a responsive soundtrack in your app or designing a brand new form of musical software! Check out Lyria RealTime with the PromptDJ-MIDI app in Google AI Studio.

Gemini 2.5 Professional Deep Suppose – We also are checking out an experimental reasoning mode for two.5 Professional. We’ve observed improbable efficiency with those Deep Pondering features for extremely complicated math and coding activates. We sit up for making it extensively to be had so that you can experiment with quickly.

Gemma 3n – Gemma 3n is a generative AI open type optimized to be used in on a regular basis units, comparable to telephones, laptops, and drugs. It might maintain textual content, audio and imaginative and prescient inputs. This type comprises inventions in parameter-efficient processing, together with In line with-Layer Embedding (PLE) parameter caching and a MatFormer type structure that gives the versatility to scale back compute and reminiscence necessities.


New capability within the API

Idea summaries

To assist builders perceive and debug type responses, we’ve added idea summaries for two.5 Professional and Flash within the Gemini API. We take the type’s uncooked mind and synthesize them right into a useful abstract with headers, related main points and power calls. The uncooked chain-of-thoughts in Google AI Studio has additionally been up to date with the brand new idea summaries.


Pondering budgets

We introduced 2.5 Flash with considering budgets to supply builders keep an eye on over how a lot fashions assume to stability efficiency, latency, and value for the apps they’re construction. We can be extending this capacity to two.5 Professional quickly.

from google import genai
from google.genai import varieties

consumer = genai.Shopper(api_key="GOOGLE_API_KEY")
recommended = "What's the sum of the primary 50 top numbers?"
reaction = consumer.fashions.generate_content(
  type="gemini-2.5-flash-preview-05-20",
  contents=recommended,
  config=varieties.GenerateContentConfig(
    thinking_config=varieties.ThinkingConfig(thinking_budget=1024,
      include_thoughts=True
    )
  )
)

for section in reaction.applicants[0].content material.portions:
  if no longer section.textual content:
    proceed
  if section.idea:
    print("Idea abstract:")
    print(section.textual content)
    print()
  else:
    print("Solution:")
    print(section.textual content)
    print()

Python

Pattern code to permit and retrieve idea summaries with out streaming, returning a last idea abstract with the reaction.

New URL Context software

We added a brand new experimental software, URL context, to retrieve extra context from hyperlinks that you simply supply. This can be utilized on its own or along with different gear comparable to Grounding with Google Search. This software is a key construction block for builders having a look to construct their very own model of study brokers with the Gemini API.

from google import genai
from google.genai.varieties import Software, GenerateContentConfig, GoogleSearch

consumer = genai.Shopper()
model_id = "gemini-2.5-flash-preview-05-20"

gear = []
gear.append(Software(url_context=varieties.UrlContext))
gear.append(Software(google_search=varieties.GoogleSearch))

reaction = consumer.fashions.generate_content(
    type=model_id,
    contents="Give me 3 day occasions agenda in line with YOUR_URL. Additionally let me know what must sorted taking into consideration climate and go back and forth.",
    config=GenerateContentConfig(
        gear=gear,
        response_modalities=["TEXT"],
    )
)

for every in reaction.applicants[0].content material.portions:
    print(every.textual content)
# get URLs retrieved for context
print(reaction.applicants[0].url_context_metadata)

Python

Pattern code for Grounding with Google Seek and URL Context

Pc use software

We are bringing Project Mariner’s browser keep an eye on features to the Gemini API by the use of a brand new laptop use software. To make it more uncomplicated for builders to make use of this software, we’re enabling the introduction of Cloud Run cases optimally configured for working browser keep an eye on brokers by the use of one click on from Google AI Studio. We’ve begun early checking out with corporations like Automation Any place, UiPath and Browserbase. Their precious comments shall be instrumental in refining its features for a broader experimental developer unencumber this summer time.


Enhancements to structured outputs

The Gemini API now has broader strengthen for JSON Schema, together with much-requested key phrases comparable to “$ref” (for references) and the ones enabling the definition of tuple-like buildings (e.g., prefixItems).


Video working out enhancements

The Gemini API now lets in YouTube video URLs or video uploads to be added to a recommended, enabling customers to to summarize, translate, or analyze the video content material. With this contemporary replace, the API helps video clipping, enabling flexibility in examining particular portions of a video. That is in particular really helpful for movies longer than 8 hours. We have now additionally added strengthen for dynamic frames in line with 2d (FPS), permitting 60 FPS for movies like video games or sports activities the place velocity is significant, and zero.1 FPS for movies the place velocity is much less of a concern. To assist customers save tokens, we’ve got additionally offered strengthen for three other video resolutions: prime (720p), same old (480p), and coffee (360p).


Async serve as calling

The cascaded structure within the Are living API now helps asynchronous serve as calling, making sure consumer conversations stay easy and uninterrupted. This implies your Are living agent can proceed producing responses even whilst it is busy executing purposes within the background, by way of merely including the conduct box to the serve as definition and surroundings it to NON-BLOCKING. Learn extra about this within the Gemini API developer documentation.


Batch API

We also are checking out a brand new API, which helps you to simply batch up your requests and get them again in a max 24 hour turnaround time. The API will come at part the cost of the interactive API and with a lot upper price limits. We are hoping to roll that out extra extensively later this summer time.


Get started construction

That’s a wrap on I/O for this 12 months! With the Gemini API and Google AI Studio, you’ll flip your concepts into truth, whether or not you are construction conversational AI brokers with natural-sounding audio or creating gear to research and generate code. As all the time, take a look at the Gemini API developer docs for the entire newest code samples and extra.

Discover this announcement and all Google I/O 2025 updates on io.google.



Source link

Leave a Comment