谷歌云 2026 开发者主题演讲:Get real! 自主时代的Agent
文章语言:
简
繁
EN
Share
Minutes
原文
会议摘要
Google's Gemini Enterprise Agent Platform enables developers to create autonomous AI agents for complex tasks, such as marathon planning. The platform offers tools for building, scaling, and optimizing agents, including development kits, scalable serverless agent runtime, and agent observability. It supports agent collaboration, context management, and debugging. The platform integrates with Google Docs and GIS skills, enhancing route planning and geographic data processing. It emphasizes security, governance, and innovation, encouraging developers to explore open-source resources and build with the platform.
会议速览
The keynote introduces the Gemini Enterprise Agent Platform, enabling the creation of autonomous agents using state-of-the-art models. It highlights tools for development, scaling, and governance, featuring a marathon planning demo. The platform supports collaboration among agents with shared context, enhancing ecosystem integration.
The dialogue highlights the transformative impact of Google's AI and cloud technologies, showcasing applications from data visualization to healthcare and language learning, emphasizing the acceleration of development processes and the potential for saving lives through innovative solutions.
The dialogue highlights the transformative impact of Genie 3 on development, showcasing its use in simulating complex scenarios like traffic situations for Waymo and planning a marathon. The focus is on demonstrating how Google enables the creation and production-grade operation of advanced agents, emphasizing the practical application of cutting-edge AI models.
Agent systems are utilized to plan, simulate, and assess the impact of a marathon on a city, considering traffic, emergency services, and economic benefits, by creating dynamic route simulations and evaluating outcomes against business and community needs.
The dialogue showcases a sophisticated simulation app designed to plan and optimize a nighttime marathon route in Las Vegas. Utilizing Angular and Flutter technologies, the app simulates traffic, hydration stations, and medical tents, demonstrating its capability to handle non-deterministic conditions. The presenter highlights the app's features, including its ability to generate detailed reports based on various simulations, showcasing its potential for enhancing event planning and execution.
Demonstrates creating a marathon planning agent using ADK and Google Cloud tools, enhancing with map skills, and simulating route generation, highlighting open-source deployment and collaborative agent design.
Dialogue highlights two key components enhancing agent collaboration: a universal protocol for capability advertisement and communication, and an agent registry for discovering compatible agents, streamlining task completion.
Dive into transforming unreliable agentic loops into a meticulously assessed network of experts capable of autonomously creating their own user interfaces.
A new system integrates real-time route evaluation, dynamic user interface creation, and seamless agent collaboration to enhance marathon planning. By leveraging Google’s a2UI for expressive interfaces and a2A protocol for agent connectivity, the system ensures efficient plan execution without requiring custom code or API contracts. This approach streamlines the planning process, allowing for immediate feedback and simulation of plans, thus improving the overall marathon planning experience.
A simulation showcasing the configuration of environment parameters and the spawning of runners as independent sessions, highlighting traffic problem monitoring and result reporting. It delves into the real-world running behaviors learned through Gemini Deep Research, noting that 78% of marathon runners decelerate slightly in the latter half.
The dialogue highlights the implementation of real-time evaluation, dynamic UI for agent rendering, and automatic connections using 8-way and agent registry. The focus is on the efficiency and self-management capabilities of agents, concluding with a casual transition to discussing non-work topics.
The dialogue emphasizes the evolution from stateless to stateful agents, leveraging context windows and memory banks for improved task performance. It highlights the use of architectural approaches and database tools like Spark or AlloyDB for efficient data retrieval, enhancing agent capabilities through retrieval-augmented generation. The focus is on managing and optimizing the information agents access to elevate user experience.
A method to enhance agent capabilities by incorporating session and memory management, along with integrating unstructured data through document processing and retrieval-augmented generation, is discussed. This approach enables agents to retain historical data, learn from past actions, and access relevant information for better decision-making.
The dialogue explores leveraging advanced AI technologies like auto embeddings and vector functions to optimize marathon route planning. It highlights the significance of managing context and incorporating local rules, demonstrating how these improvements lead to more efficient and rule-compliant route designs.
The dialogue highlights advancements in agent intelligence through context engineering and simulation technologies. It emphasizes the importance of managing context in Lll MS environments and introduces two key tools: Agent Observability for monitoring agent metrics and Gemini Cloud Assist for managing agents throughout their lifecycle, from design to optimization. This approach aims to streamline debugging and app development processes, making them faster and more efficient.
The dialogue discusses the process of debugging and optimizing autonomous systems using Agent Observability and Cloud Assist Investigations. It highlights how to identify and fix issues in simulator agents, particularly focusing on unexpected errors and high latency. The use of Gemini Cloud Assist for gathering logs and pinpointing code errors is emphasized, showcasing the integration of AI tools in troubleshooting and optimizing cloud-based applications.
Discusses using advanced agent tools to identify and resolve complex issues, such as exceeding token limits, through proactive code adjustments, highlighting the effectiveness of integrated debugging solutions in enhancing system reliability and performance.
Discusses the importance of starting small in AI system development, evolving architectures based on needs, and using iterative improvements akin to a marathon, rather than greenfield sprints, to build robust systems over time.
Demonstrates upgrading Cloud Run services to GKE for enhanced control and customization, integrating a fine-tuned Gemma 4 model within the same cluster, leveraging Cloud Assist for infrastructure modifications.
The dialogue highlights the seamless deployment of AI applications into Kubernetes, showcasing the converted app and Gemma inference server operating within the same GKE cluster, achieving a significant reduction in deployment time.
AI workload scaling issues, identified by Cloud Assist, led to the recommendation of deploying Luster storage for faster model loading, enhancing simulation performance.
The dialogue highlights a vision for an autonomous cloud environment where systems transform intent into action swiftly, proactively address issues with specific resolutions, and prioritize outcomes over fluctuating features. It envisions a collaborative approach where agents partner with users to implement change safely, showcasing a future of advanced cloud technology and agent interaction.
Discusses how Gemini Cloud Assist aids developers in focusing on app use cases by automating specific tasks. Highlights the capabilities of Gemini Enterprise in creating insights from data sources, enabling rich agentic experiences through various coding interfaces, and sharing agents securely at scale with governance controls, showcasing the versatility of agent platforms in modern development practices.
The dialogue demonstrates how developers can use no-code agents in Gemini Enterprise to streamline planning tasks, such as organizing a marathon. By creating and sharing agents, users can automate logistics, like ordering supplies, and improve efficiency across various applications.
The dialogue showcases how Gemini Enterprise facilitates marathon planning by creating a team of agents, enabling seamless integration of planning and ordering tasks. It highlights the capability of non-technical users to contribute effectively, emphasizing the tool's accessibility and collaborative potential across teams.
Discusses the importance of secure agent interactions, emphasizing the need for smarter platforms that enable conditional access. Highlights the role of agent identity in enforcing policies, ensuring agents access only necessary resources. Advocates for a collaborative approach in platform engineering to build more secure applications, particularly in the era of MCP and agents. Introduces the concept of an agent gateway for centralized policy management.
The dialogue covers implementing role-based access control for agents using identities and policies, emphasizing the importance of Zero Trust architecture. It also introduces unified cloud security platforms that scan code and infrastructure, using AI to understand risks, and presents tools like red and green agents for testing and fixing vulnerabilities, enhancing security in app development.
A demonstration showcases how AI agents, such as the red and green agents, proactively identify and mitigate real risks in deployed applications and APIs, guiding developers through actionable code fixes directly within their development tools, ensuring security without compromising developer velocity.
Highlights the open-source initiative and Google Cloud's tools enabling developers to build production-ready agents, emphasizing ease of use and comprehensive resources for practical implementation.
要点回答
Q:What is the Gemini Enterprise agent platform and what are its capabilities?
A:The Gemini Enterprise agent platform is a tool that allows developers to build autonomous agents that can proactively assist users and complete tasks independently. It is powered by Gemini models, including Pro and Flash models, and supports the integration of other models like Clad from the model garden. The platform provides capabilities for building, scaling, and optimizing agents, including an agent development kit (Go) for skills and tools, communication with Google Cloud services, scalable serverless runtime, unique agent identity, policies for connecting and collaborating with other agents, and agent observability for monitoring and management.
Q:How does the agent platform support the construction and operation of agents?
A:The agent platform supports the construction and operation of agents through several features: the agent development kit (Go) to build agents with necessary skills and tools; scalable serverless runtime to keep agents connected and personalize context with sessions; unique agent identity and policies for secure governance; an agent registry and collaboration protocol for agents to discover and work with each other; and observability for managing, monitoring, and optimizing agents based on production activity.
Q:What is the significance of the shared context across Gemini Enterprise Workspace and third-party agents?
A:The shared context across Gemini Enterprise Workspace and third-party agents is significant as it allows the entire ecosystem of agents to collaborate with each other using a common shared context. This context is provided by Google and can be used in both Gemini Enterprise Workspace and third-party agents, enhancing coordination and efficiency in their operations.
Q:What marathon planning example is used to demonstrate the capabilities of the agent platform?
A:The marathon planning example used to demonstrate the capabilities of the agent platform involves planning a marathon in Las Vegas, simulating its impact on the city, and assessing different scenarios to determine the best course of action. This involves creating simulations to understand potential impacts such as rerouting emergency services, boosting the local economy, and showcasing iconic landmarks. Three main agents are used: the planner to determine race routes, the evaluator to assess routes based on business and community requirements, and the simulator to take the route and create actors with randomized behaviors to see the net impact to the city.
Q:What is the role of the planner agent in the marathon simulation?
A:In the marathon simulation, the role of the planner agent is to determine the optimal routes for the marathon. It uses skills for handling geospatial data, mapping, and calculating viable routes to plan the path that the race will take. The planner agent's instructions and capabilities are designed to understand its role as the marathon planner and to coordinate with other agents involved in the simulation, such as those managing hydration stations, medical tents, and traffic flow.
Q:How is the simulation app showcasing the marathon planning powered?
A:The simulation app showcasing the marathon planning is powered by different technologies depending on the interface: Angular for the base and 3D animation, and Flutter for the sidebar and UI interactions. The app uses various Google products including Gemini, Firebase, and AI to render the simulation and provide a realistic race environment.
Q:What instructions did Gemini help create for the agents regarding marathon planning?
A:The agents were given instructions on marathon planning that were crafted with Gemini's help.
Q:How is the plan for a marathon grounded in real data?
A:The plan is grounded in real data by making the agent an expert with tools and skills using maps and races, achieved by adding tools and skills, and downloading pre-populated Python code designed by the agent designer.
Q:What tools and skills are added to the agent for map data and related tasks?
A:The agent is given access to mapping tools using Google Cloud MCP server for Google Maps and is provided with skills to answer map-related questions and find viable routes with starting and ending locations.
Q:What capabilities do the mapping skill and GIS skill provide to the agent?
A:The mapping skill allows the agent to selectively call Maps tools based on the skill being loaded, while the GIS skill handles geographic information and includes access to Python scripts for processing Geojson data and finding viable routes.
Q:What does the race director skill involve, and how was it created?
A:The race director skill involves using a Google Doc created by the race planning committee, which was converted to a scale using Gemini. This skill allows agents to plan and check marathon routes based on historical requirements.
Q:How long does it take to deploy the agent, and what does it include?
A:It takes 4 to 5 minutes to deploy the agent, which includes the skills necessary to create route plans for marathon participants.
Q:What is the purpose of the team of agents described in the speech?
A:The purpose of the team of agents is to distribute different responsibilities based on use case, access required, and capabilities needed, just like a team working together.
Q:What are the two components that facilitate agents to communicate with each other?
A:The two components that facilitate agents to communicate with each other are the A2A (agent to agent) protocol, which is a universal protocol for agents to advertise their capabilities and communicate, and the agent registry, which serves as a central directory resolving every agent's identity and their skill set.
Q:How does the dynamic UI in the system work, and what does it allow agents to do?
A:The dynamic UI in the system is built by the agents using an open standard created by Google, which allows them to create dynamic, expressive UI that moves beyond walls of text. This enables the agents to design and build the exact interface they need using a common design language.
Q:What is the function of the simulation agent in the marathon planning process?
A:The simulation agent leads the process of running actual plans, working with the planner to get approved routes, and showing results. It involves connecting the simulator with the planner, and uses the A2A protocol and agent registry for coordination.
Q:What is the significance of the agent registry and A2A protocol in the context of the agents' collaboration?
A:The significance of the agent registry and A2A protocol in the agents' collaboration is to eliminate brittle API code and provide a system where agents can read each other's capabilities. The registry serves as the DNS of the internet of agents, resolving every agent's identity and mapping their specific skill set across the entire network, while the A2A protocol allows agents to connect and communicate without the need for custom coding or dashboards.
Q:What are the new features that agents have been utilizing to improve user experience?
A:Agents have been utilizing real-time evaluation, dynamic UI for self-rendering, automatic connections using 8-way, and the agent registry to improve user experience.
Q:What has been the approach to optimize agents for the next run in the past and how has it evolved?
A:In the past, the approach to optimize agents for the next run involved stashing the complete history of prior runs into each new request. Now, the process has evolved to using architectural approaches and database tools for more efficient and thoughtful management of data.
Q:What is the significance of stateful agents and context windows in agent engineering?
A:Stateful agents and context windows are significant in agent engineering as they enable agents to become stateful over time and modify their context as needed, improving their ability to manage information available to them and their own generalized knowledge to complete tasks.
Q:How does the memory bank contribute to an agent's performance?
A:The memory bank contributes to an agent's performance by enabling it to pull learnings from prior events into the context window relevant for specific tasks and store memories for future sessions. It also allows the use of tools like Spark or alloy DB to provide additional context to agents using a differentiated data approach.
Q:What role does the planner agent play in demonstrating context and memory improvement?
A:The planner agent plays a role in demonstrating context and memory improvement by managing sessions, adding memory, and accessing other data sources. By integrating with agent platform sessions and memory bank, the planner agent can remember details from previous plans and recall them for future planning.
Q:How is unstructured data, such as local rules, utilized by the planner agent?
A:Unstructured data, such as local rules, is utilized by the planner agent through data engineering processes, including chunking the documents, transforming them into embeddings, and using retrieval augmented generation (RAG) to query and integrate this data into the planning process.
Q:What is the purpose of managing context and adding memory to the planner agent?
A:The purpose of managing context and adding memory to the planner agent is to enable the simulation to adjust its route based on the latest context and data, and to allow the planner agent to recall details from its long-term memory of past simulations, improving the intelligence and adaptability of the system.
Q:How does the speaker illustrate the benefit of using context engineering in making the agents more intelligent?
A:The speaker illustrates the benefit of context engineering by demonstrating how a simulation adjusted its route based on new data and context, which reflects the intelligence gained from managing context and adding memory to the planner agent.
Q:What tools and techniques are being used to optimize and debug agent behavior?
A:To optimize and debug agent behavior, tools and techniques being used include agent observability for visibility into operational metrics, Gemini Cloud Assist for managing agents from design to optimization, and the Cloud Monitoring console to identify and investigate issues such as high latency or tool calls that lead to system failures.
Q:What issue was encountered with the simulator agent and Gemini Model API?
A:The simulator agent failed to call the Gemini Model API due to a request error, which indicated that something was wrong with the payloads sent to the model.
Q:How did Cloud Assist aid in debugging the issue with the simulator agent?
A:Cloud Assist pointed to the specific line of the agent simulator code where the issue occurred, allowed the user to continue the investigation started in the console, and provided explanations and suggestions for resolving the 400 error caused by exceeding the Gemini API's context token limit.
Q:What is the significance of using agents during the debugging process?
A:Agents not only find the root cause of issues but also suggest proactive code fixes. In this case, the agent proposed a fix by suggesting an adjustment to the event compaction configuration.
Q:How did the debugging experience with Gemini Cloud Assist and the agent help in resolving the issue?
A:The debugging experience with Gemini Cloud Assist and the agent helped in tracing the root cause of the failure, identifying the issue with the storage not loading the model fast enough, and proactively fixing it with the help of Cloud Assist.
Q:What is the broader architectural evolution for applications and AI systems as mentioned in the speech?
A:The architecture for applications and AI systems evolves based on learning from AI's performance and needs. It may involve changes like swapping out runtimes, databases, or programming languages as the application changes.
Q:How did Gemini Cloud Assist assist in upgrading an existing system in the simulation?
A:Gemini Cloud Assist helped upgrade an existing system in the simulation by converting a Cloud Run service to a Google Kubernetes Engine (GKE) deployment and introducing a fine-tuned or customized model for the same cluster.
Q:What is the envisioned relationship between agents and the implementation of change in the autonomous cloud?
A:The envisioned relationship between agents and the implementation of change in the autonomous cloud is one where agents work with humans not just to answer questions but also to partner with them in safely implementing changes.
Q:How can users create their own agents in Gemini Enterprise?
A:Users can create their own no-code agents in Gemini Enterprise by utilizing the agent runtime, which allows them to make work easier, including tasks like logistics, ordering supplies, and managing various aspects of events.
Q:What capabilities do the agents in Gemini Enterprise have?
A:Agents in Gemini Enterprise have the capability to plan and order for races, work together to create comprehensive plans, and manage logistics including water, food, and entertainment. They also have access to dynamic interfaces and can be given documents as context to aid in their functionality.
Q:What is the role of the agent registry in Gemini Enterprise?
A:The role of the agent registry in Gemini Enterprise is to keep a record of all agents that have been created, making them discoverable by other agents and apps included in the Gemini Enterprise platform.
Q:What is the purpose of the marathon planner agent in Gemini Enterprise?
A:The marathon planner agent in Gemini Enterprise is used to facilitate the planning and ordering of events, such as marathons, by automating tasks like ordering supplies, managing logistics, and ensuring necessary items like water and portable toilets are available.
Q:How does Gemini Enterprise handle security concerns?
A:Gemini Enterprise handles security concerns by implementing agents that provide users and other agents new ways to intentionally or unintentionally expose data. It features agent identity and policies to ensure agents can only access what they need when they need it, and an agent gateway for managing access control policies. The platform also includes the ability to set conditions on agents' access to resources like financial databases, and a Zero Trust architecture that enforces role-based access control using agent identities and policies.
Q:How can code be made secure using the tools presented?
A:The tools presented, such as the Wiz platform, can scan code and infrastructure to build a security graph, identify risks and vulnerabilities, and suggest remediations directly within developer tools. This allows developers to ensure their code is secure and address potential threats early in the development process.

Alphabet, Inc.
Follow





