From 2673cda79a609b1f8c896cee9f2b97a8aab2254b Mon Sep 17 00:00:00 2001 From: Dominic Farolino Date: Thu, 16 Apr 2026 07:59:46 -0400 Subject: [PATCH 1/5] Add basic page observation infrastructure --- index.bs | 85 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 85 insertions(+) diff --git a/index.bs b/index.bs index 5a56b76..21db571 100644 --- a/index.bs +++ b/index.bs @@ -83,6 +83,8 @@ p + dl.props { margin-top: -0.5em; }

Introduction

@@ -439,6 +441,89 @@ The synthesize a declarative JSON Schema object algorithm, given a <{ } +

Interaction with agents

+ +

Event loop integration

+ +A web site's functionality, as registered with the APIs in this specification, is exposed. + +The [=user agent=]'s [=browser's agent=] runs [=in parallel=] to any [=event loops=] associated +with a {{ModelContext}} [=relevant global object=]. Steps running on the [=browser's agent=] get +queued on its AI agent queue, which is the result of [=starting a new parallel queue=]. + +Conversely, steps queued *from* the [=browser agent=] onto the [=event loop=] of a given +{{ModelContext}} object (i.e., the "main thread" where JavaScript runs) are queued on its [=relevant +global object=]'s [=AI task source=]. + +

Page observations

+ +In-page [=agents=] implemented in JavaScript can "observe" the tools that a page offers by using the +{{ModelContext}} APIs directly, and any other platform APIs to obtain necessary context about the +page in order to actuate it appropriately. + +The [=browser agent=], on the other hand, does not run JavaScript on the page. Instead, it obtains a +view of the page's tools and any other relevant context by getting an [=observation=]. An +observation is an [=implementation-defined=] data structure containing at least a tool map, which is a [=map=] whose [=map/keys=] are [=Document/unique ID=]s, +and whose [=map/values=] are [=tool definitions=]. + +Note: An [=observation=] is usually a "snapshot" distillation of a page being presented to the user, +along with any other state the [=user agent=] believes is relevant for the [=browser agent=]. See +[Annotated Page Content +(APC)](https://chromium.googlesource.com/chromium/src.git/+/main/third_party/blink/renderer/modules/content_extraction/readme.md) +in the Chromium project. + +
+ +
+To perform an observation given a [=top-level traversable=] |traversable|, run these +steps: + +1. [=Assert=]: This algorithm is running in the [=browser agent=]'s [=AI agent queue=]. + +1. [=Assert=]: |traversable|'s [=navigable/active document=] is not [=Document/fully active=]. + +1. Let |observation| be a new [=observation=]. + +1. Let |flat descendants| be the [=Document/inclusive descendant navigables=] of |traversable|'s + [=navigable/active document=]. + +1. [=list/For each=] [=navigable=] |descendant| of |flat descendants|: + + 1. Let |document| be |descendant|'s [=navigable/active document=]'s. + + 1. Let |id| be |document|'s [=Document/unique ID=]. + + 1. Set |observation|'s [=observation/tool map=][|id|] = |document|'s [=relevant global + object=]'s {{Navigator}}'s [=Navigator/modelContext=]'s [=ModelContext/internal context=]'s + [=model context/tool map=]'s [=map/values=], which are [=tool definitions=]. + +1. Perform any [=implementation-defined=] steps with |observation| and the [=browser agent=], to + expose the |observation|'s [=observation/tool map=] to the [=browser agent=] in whatever way it + accepts. + + Note: Despite the name of this API (i., Web*MCP*), this specification does not prescribe the + format in which tools are exposed to the [=browser agent=]. Browsers are free to distill and + expose tools via Model Context Protocol, other proprietary "function calling" methods, or any + other way it deems appropriate. + + Advisement: Implementations are expected to convey to the [=browser agent=] any relevant + security information associated with [=tool definitions=], such as the originating [=origin=], + among other things, so that the backing model has an idea of the different parties at play, and + can most safely carry out the end user's intent. + +
+ +Each {{Document}} object has a unique ID, which is a [=unique internal +value=]. + +The times at which a [=browser agent=] [=performs an observation=] are [=implementation-defined=]. +A [=browser agent=] may [=parallel queue/enqueue steps=] to the [=AI agent queue=] to [=perform an +observation=] given any [=top-level browsing context=] in the [=user agent=] [=browsing context +group set=], at any time, although implementations typically reserve this operation for when the +user is interacting with a [=browser agent=] while web content is in view. + +

Security and privacy considerations