A simple, fast, token-efficient browser for AI agents β CLI + Go Library.
δΈζζζ‘£ β’ Quick Start β’ Agent Notes β’ Commands β’ Library β’ Snapshot Format Spec
ko-browser is a browser automation tool built in Go for AI agents. It exposes the same core model in two forms:
- a CLI for shell-driven agent workflows
- a Go library for embedding browser control into agent runtimes and tools
Its custom accessibility-tree snapshot format reduces prompt footprint by 46%+ compared with more verbose alternatives.
- π Single binary β no Node.js, no Playwright runtime needed
- π€ AI-optimized snapshot format β
id: role "name" statessaves 46%+ tokens - π¦ Dual-use β works as a CLI tool AND a Go library (
go get) - β‘ Fast startup β ~50ms (Go binary) vs ~500ms (Node.js-based tools)
- π’ Simple element references β
click 5 - π Optional OCR β Tesseract integration via
-tags=ocrbuild flag for image-heavy pages - π ~86 commands β broad coverage for browser automation workflows
ββ kbr (46% fewer tokens) βββββββββββ ββ verbose snapshot output βββββββββββ
β Page: "Example" β β - document "Example" β
β β β - navigation "main": β
β 1: link "Home" β β - link "Home" [ref=@e1] β
β 2: link "About" β β - link "About" [ref=@e2] β
β 3: textbox "Search" focused β β - search: β
β 4: button "Go" β β - textbox "Search" [ref=@e3] β
β β β - button "Go" [ref=@e4] β
ββββββββββββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββββββ
π Read the full Snapshot Format Specification for detailed design decisions, BNF grammar, and examples. (δΈζη)
| Use case | Recommended path | Notes |
|---|---|---|
| macOS local usage | Homebrew | Installs an OCR-enabled build and pulls in tesseract |
| Linux/macOS manual deployment | GitHub Releases | Download the prebuilt archive and make sure Tesseract runtime libraries are present |
| Go-based integration or custom builds | From source | Best when embedding kbr into your own toolchain |
brew tap libi/tap
brew install ko-browserHomebrew installs
kbrwith OCR enabled. It pulls intesseractautomatically and builds from source.
Download from GitHub Releases:
Release binaries are also built with OCR enabled. Install Tesseract first so the runtime OCR libraries are available:
brew install tesseracton macOS,apt install libtesseract-devon Linux.
# macOS (Apple Silicon)
curl -LO https://github.com/libi/ko-browser/releases/latest/download/ko-browser-darwin-arm64.tar.gz
tar xzf ko-browser-darwin-arm64.tar.gz
mv kbr /usr/local/bin/kbr
# macOS (Intel)
curl -LO https://github.com/libi/ko-browser/releases/latest/download/ko-browser-darwin-amd64.tar.gz
tar xzf ko-browser-darwin-amd64.tar.gz
mv kbr /usr/local/bin/kbr
# Linux (amd64)
curl -LO https://github.com/libi/ko-browser/releases/latest/download/ko-browser-linux-amd64.tar.gz
tar xzf ko-browser-linux-amd64.tar.gz
mv kbr /usr/local/bin/kbr# Install kbr with OCR support (requires Tesseract to be installed)
CGO_ENABLED=1 go install -tags=ocr github.com/libi/ko-browser/cmd/kbr@latestOCR is required for the published packages. This requires Tesseract:
brew install tesseract(macOS) /apt install libtesseract-dev(Linux).
kbr install # check & download Chromium
kbr install --with-deps # also install system dependencies (Linux)- Chrome or Chromium is required. Run
kbr installif you do not already have a compatible browser. - OCR-enabled builds expect Tesseract runtime libraries on the host.
- On Linux CI or sandboxed environments, Chrome may require
--no-sandbox-style fallbacks depending on the runner configuration.
This is the most common shell-driven agent workflow:
kbr open https://example.com
kbr snapshot
kbr click 5
kbr type 8 "hello world"
kbr press Enter
kbr wait load
kbr get text 12Use snapshot to get the current page structure, act on numeric refs, then re-snapshot after the page changes.
# Open a page and take a snapshot
kbr open https://www.google.com
kbr snapshot
# Output:
# Page: "Google"
#
# 1: link "Gmail"
# 2: link "Images"
# 3: textbox "Search" focused
# 4: button "Google Search"
# 5: button "I'm Feeling Lucky"
# Interact with elements by ID
kbr click 3
kbr type 3 "ko-browser github"
kbr press Enter
# Take a screenshot
kbr screenshot result.png
# Close the browser
kbr closepackage main
import (
"fmt"
"log"
"github.com/libi/ko-browser/browser"
)
func main() {
b, err := browser.New(browser.Options{Headless: true})
if err != nil {
log.Fatal(err)
}
defer b.Close()
b.Open("https://www.google.com")
snap, _ := b.Snapshot()
fmt.Println(snap.Text)
// 1: link "Gmail"
// 2: link "Images"
// 3: textbox "Search" focused
// ...
b.Click(3)
b.Type(3, "ko-browser github")
b.Press("Enter")
}| Goal | Commands |
|---|---|
| Navigate and inspect | open, snapshot, get title, get url |
| Interact with forms | click, type, fill, select, check, press |
| Wait for state changes | wait load, wait selector, wait text, wait url |
| Debug and verify | screenshot, console messages, errors list, highlight |
| Work with sessions | tab *, cookies *, storage *, state * |
| Command | CLI Usage | Library API |
|---|---|---|
| open | kbr open <url> |
b.Open(url) |
| click | kbr click <id> |
b.Click(id) |
| dblclick | kbr dblclick <id> |
b.DblClick(id) |
| type | kbr type <id> "text" |
b.Type(id, text) |
| fill | kbr fill <id> "text" |
b.Fill(id, text) |
| press | kbr press <key> |
b.Press(key) |
| hover | kbr hover <id> |
b.Hover(id) |
| focus | kbr focus <id> |
b.Focus(id) |
| check | kbr check <id> |
b.Check(id) |
| uncheck | kbr uncheck <id> |
b.Uncheck(id) |
| select | kbr select <id> "val" |
b.Select(id, vals...) |
| scroll | kbr scroll down 500 |
b.Scroll("down", 500) |
| drag | kbr drag <src> <dst> |
b.Drag(srcID, dstID) |
| close | kbr close |
b.Close() |
kbr press Enter
kbr press Control+a
kbr keyboard type "Hello, World!"
kbr keyboard inserttext "pasted content"kbr snapshot # full accessibility tree
kbr snapshot -i # interactive elements only
kbr snapshot -c # compact mode
kbr snapshot -d 5 # max depth 5
kbr snapshot -C # include cursor elements
kbr snapshot -s "#main" # scope to CSS selector
kbr snapshot --ocr # with OCR for images
kbr screenshot out.png # viewport screenshot
kbr screenshot --full out.png # full page
kbr screenshot --annotate out.png # annotated with element IDskbr back
kbr forward
kbr reloadkbr get title # page title
kbr get url # current URL
kbr get text <id> # element inner text
kbr get html <id> # element innerHTML
kbr get value <id> # input value
kbr get attr <id> href # element attribute
kbr get count ".item" # count matching elements
kbr get box <id> # bounding box
kbr get styles <id> # computed styles
kbr get cdp-url # CDP WebSocket URLkbr is visible <id> # β true/false
kbr is enabled <id>
kbr is checked <id>kbr find role button --name Submit
kbr find text "Sign In"
kbr find label "Email"
kbr find placeholder "Search..."
kbr find alt "Logo"
kbr find title "tooltip"
kbr find testid "login-form"
kbr find first ".card"
kbr find last ".card"
kbr find nth 2 ".card"
# Add --exact for exact text matching
kbr find text "Sign In" --exactkbr wait time 2s # wait 2 seconds
kbr wait selector "#loading" # wait for element
kbr wait url "**/dashboard" # wait for URL match
kbr wait load # wait for networkidle
kbr wait text "Welcome" # wait for text
kbr wait fn "window.ready" # wait for JS expression
kbr wait hidden "#spinner" # wait for element to hide
kbr wait download ./file.pdf # wait for downloadkbr mouse move 100 200
kbr mouse click 100 200
kbr mouse down 100 200
kbr mouse up 100 200
kbr mouse wheel 100 200 0 300 # x y deltaX deltaYkbr tab list
kbr tab new https://example.com
kbr tab switch 2
kbr tab close 1kbr network route "**/api/*" --action block
kbr network unroute "**/api/*"
kbr network requests
kbr network start-logging
kbr network clearkbr cookies get
kbr cookies set session_id "abc123" --domain example.com
kbr cookies delete session_id
kbr cookies clear
kbr storage get theme --type local
kbr storage set theme "dark" --type local
kbr storage delete theme --type local
kbr storage clear --type local
kbr storage list --type sessionkbr set viewport 1920 1080
kbr set viewport 1920 1080 2 # with 2x scale
kbr set device "iPhone 12"
kbr set geo 37.7749 -122.4194
kbr set offline true
kbr set headers '{"X-Custom":"value"}'
kbr set credentials admin secret
kbr set media dark
kbr set colorscheme darkkbr upload <id> ./document.pdf
kbr download <id> ./output/kbr eval "document.title"
kbr eval -b "ZG9jdW1lbnQudGl0bGU=" # base64 encoded
cat script.js | kbr eval --stdinkbr diff snapshot # compare with last snapshot
kbr diff snapshot --baseline before.txt
kbr diff screenshot --baseline before.png
kbr diff url https://v1.example.com https://v2.example.comkbr console messages
kbr console clear
kbr errors list
kbr highlight <id>
kbr inspect # open DevTools
kbr clipboard read
kbr clipboard write "text"
kbr clipboard copy # Ctrl+C
kbr clipboard paste # Ctrl+Vkbr trace start
kbr trace stop ./trace.zip
kbr profiler start
kbr profiler stop ./profile.json
kbr record start ./recording
kbr record stopkbr auth save github --url https://github.com/login --username user --password pass
kbr auth login github
kbr auth list
kbr auth show github
kbr auth delete githubkbr session # show current session
kbr session list # list all active sessions
kbr --session test open example.com # use named sessionkbr supports three selector formats, auto-detected:
| Input | Type | Example |
|---|---|---|
| Number | Snapshot ID | kbr click 5 |
| CSS | CSS Selector | kbr click "#submit" |
| XPath | XPath | kbr click "//button[@type='submit']" |
| Flag | Description |
|---|---|
--session <name> |
Isolated session name (default: "default") |
--headed |
Show browser window |
--json |
JSON output |
--timeout <duration> |
Operation timeout (default: 30s) |
--profile <path> |
Persistent Chrome user data directory |
--state <path> |
Load saved browser state (cookies + localStorage) |
--config <path> |
Config file path |
--user-agent <ua> |
Custom User-Agent |
--proxy <url> |
Proxy server URL |
--proxy-bypass <hosts> |
Hosts to bypass proxy |
--ignore-https-errors |
Ignore HTTPS certificate errors |
--allow-file-access |
Allow file:// URLs |
--extension <path> |
Load Chrome extension (repeatable) |
--args <args> |
Extra Chrome arguments |
--download-path <path> |
Default download directory |
--screenshot-dir <path> |
Default screenshot output directory |
--screenshot-format <fmt> |
Screenshot format: png, jpeg |
--content-boundaries |
Wrap output with boundary markers |
--debug |
Debug output |
go get github.com/libi/ko-browser@latestpackage main
import (
"fmt"
"time"
"github.com/libi/ko-browser/browser"
)
func main() {
b, _ := browser.New(browser.Options{
Headless: true,
Timeout: 30 * time.Second,
})
defer b.Close()
// Navigate
b.Open("https://www.baidu.com")
// Snapshot β interact
snap, _ := b.Snapshot()
fmt.Println(snap.Text)
// Page: "ηΎεΊ¦δΈδΈοΌδ½ ε°±η₯ι"
//
// 1: link "ζ°ι»"
// 2: link "hao123"
// 3: textbox "ζη΄’" focused
// 4: button "ηΎεΊ¦δΈδΈ"
b.Click(3)
b.Type(3, "hello world")
b.Press("Enter")
// Wait & screenshot
b.WaitLoad()
b.Screenshot("result.png")
}import (
"github.com/libi/ko-browser/axtree"
"github.com/libi/ko-browser/browser"
"github.com/libi/ko-browser/ocr"
)
b, _ := browser.New(browser.Options{Headless: true})
defer b.Close()
b.Open("https://example.com")
// Low-level AX Tree API
rawNodes, _ := axtree.Extract(b.Context())
tree := axtree.BuildAndFilter(rawNodes)
text := axtree.Format(tree)
idMap := axtree.BuildIDMap(tree)
// Snapshot with OCR
snap, _ := b.Snapshot(browser.SnapshotOptions{
EnableOCR: true,
OCRLanguages: []string{"eng", "chi_sim"},
})// Connect via CDP WebSocket
b, _ := browser.Connect("ws://localhost:9222/devtools/browser/...", browser.Options{})
defer b.Close()
snap, _ := b.Snapshot()
fmt.Println(snap.Text)Click to expand all Browser methods
Navigation
Open(url string) errorBack() errorForward() errorReload() error
Snapshot
Snapshot(opts ...SnapshotOptions) (*SnapshotResult, error)
Interaction
Click(id int) errorClickNewTab(id int) errorDblClick(id int) errorType(id int, text string) errorFill(id int, text string) errorPress(key string) errorKeyboardType(text string) errorKeyboardInsertText(text string) errorHover(id int) errorFocus(id int) errorCheck(id int) errorUncheck(id int) errorSelect(id int, values ...string) errorScroll(direction string, pixels int) errorScrollIntoView(id int) error
Mouse
MouseMove(x, y float64) errorMouseClick(x, y float64, opts ...MouseOptions) errorMouseDown(x, y float64, opts ...MouseOptions) errorMouseUp(x, y float64, opts ...MouseOptions) errorMouseWheel(x, y, deltaX, deltaY float64) errorDrag(srcID, dstID int) errorDragCoords(srcX, srcY, dstX, dstY float64) error
Query
GetTitle() (string, error)GetURL() (string, error)GetText(id int) (string, error)GetHTML(id int) (string, error)GetValue(id int) (string, error)GetAttr(id int, name string) (string, error)GetCount(cssSelector string) (int, error)GetBox(id int) (*BoxResult, error)GetStyles(id int) (string, error)GetCDPURL() (string, error)
State
IsVisible(id int) (bool, error)IsEnabled(id int) (bool, error)IsChecked(id int) (bool, error)
Find
FindRole(role, name string, opts ...FindOption) (*FindResults, error)FindText(text string, opts ...FindOption) (*FindResults, error)FindLabel(label string, opts ...FindOption) (*FindResults, error)FindPlaceholder(text string, opts ...FindOption) (*FindResults, error)FindAlt(text string, opts ...FindOption) (*FindResults, error)FindTitle(text string, opts ...FindOption) (*FindResults, error)FindTestID(testID string) (*FindResults, error)FindFirst(css string) (*FindResults, error)FindLast(css string) (*FindResults, error)FindNth(css string, n int) (*FindResults, error)
Wait
Wait(d time.Duration) errorWaitSelector(css string, timeout ...time.Duration) errorWaitURL(pattern string, timeout ...time.Duration) errorWaitLoad(timeout ...time.Duration) errorWaitText(text string, timeout ...time.Duration) errorWaitFunc(expression string, timeout ...time.Duration) errorWaitHidden(css string, timeout ...time.Duration) errorWaitDownload(savePath string, timeout ...time.Duration) (string, error)
Screenshot & PDF
Screenshot(path string, opts ...ScreenshotOptions) errorScreenshotToBytes(opts ...ScreenshotOptions) ([]byte, error)ScreenshotAnnotated(path string, opts ...ScreenshotOptions) errorPDF(path string, opts ...PDFOptions) error
Eval
Eval(expression string) (string, error)
File
Upload(id int, files ...string) errorUploadCSS(css string, files ...string) errorDownload(id int, saveDir string, opts ...DownloadOptions) (string, error)
Tabs
TabList() ([]TabInfo, error)TabNew(url string) errorTabClose(index int) errorTabSwitch(index int) error
Network
NetworkRoute(pattern string, action RouteAction) errorNetworkUnroute(pattern string) errorNetworkRequests() ([]NetworkRequest, error)NetworkStartLogging() errorNetworkClearRequests()
Storage & Cookies
CookiesGet() ([]CookieInfo, error)CookieSet(cookie CookieInfo) errorCookieDelete(name string) errorCookiesClear() errorStorageGet(storageType, key string) (string, error)StorageSet(storageType, key, value string) errorStorageDelete(storageType, key string) errorStorageClear(storageType string) errorStorageGetAll(storageType string) (map[string]string, error)
Settings
SetViewport(width, height int, scale ...float64) errorSetDevice(name string) errorSetGeo(lat, lon float64) errorClearGeo() errorSetOffline(offline bool) errorSetHeaders(headers map[string]string) errorSetCredentials(user, pass string) errorSetMedia(features ...MediaFeature) errorSetColorScheme(scheme string) error
Debug
ConsoleStart() errorConsoleMessages() ([]ConsoleMessage, error)ConsoleMessagesByLevel(level string) ([]ConsoleMessage, error)ConsoleClear()PageErrors() ([]PageError, error)PageErrorsClear()Highlight(id int) errorOpenDevTools() error
Clipboard
ClipboardRead() (string, error)ClipboardWrite(text string) errorClipboardCopy() errorClipboardPaste() error
Diff
DiffSnapshot(opts ...DiffSnapshotOptions) (*DiffSnapshotResult, error)DiffScreenshot(opts DiffScreenshotOptions) (*DiffScreenshotResult, error)DiffURL(url1, url2 string, opts ...DiffURLOptions) (*DiffURLResult, error)
Trace & Record
TraceStart(categories ...string) errorTraceStop(outputPath string) errorProfilerStart() errorProfilerStop(outputPath string) errorRecordStart(outputPath string) errorRecordStop() (int, error)
State
ExportState(outputPath string) errorImportState(inputPath string) errorApplyState(state *BrowserState) error
kbr loads configuration in this priority (low β high):
~/.ko-browser/config.jsonβ user-level defaults./ko-browser.jsonβ project-level overrides- Environment variables (
KO_BROWSER_*) - CLI flags β override everything
{
"headed": true,
"proxy": "http://localhost:8080",
"profile": "./browser-data",
"screenshotFormat": "jpeg",
"downloadPath": "./downloads"
}kbr
βββ cmd/kbr/ β
CLI entry point β `go install .../cmd/kbr@latest`
βββ browser/ β
Public Go library β core browser API
βββ axtree/ β
Public β AX Tree extraction, filtering, formatting
βββ selector/ β
Public β element selector parsing (ID/CSS/XPath)
βββ ocr/ β
Public β Tesseract OCR engine (build tag: ocr)
βββ cmd/ CLI β cobra command definitions
βββ internal/ CLI-only β daemon, session management
The browser/, axtree/, selector/, and ocr/ packages are all public and importable via go get. The internal/ package is only used by the CLI daemon. OCR requires -tags=ocr at build time.
If you are an agent reading this README, use kbr as an interactive browser with a compact text interface.
- Prefer
kbr snapshotas the primary page representation. It is the most token-efficient output and gives stable numeric refs. - Use numeric refs like
kbr click 5,kbr type 8 "hello", andkbr get text 12after taking a fresh snapshot. - Re-snapshot after navigation or DOM mutations. Element IDs are snapshot-local, not permanent selectors.
- Use
kbr get,kbr find, andkbr waitfor targeted reads instead of repeatedly taking full screenshots. - Use OCR only when the page content is image-heavy or inaccessible through the DOM/accessibility tree.
- Prefer CSS/XPath selectors only when you already know the target precisely or when numeric refs are unavailable.
- The published Homebrew package and release binaries are built with OCR enabled, so Tesseract runtime libraries are expected on the host.
If your agent framework supports local skills, install kbr first, then copy the repository skill directory into your agent's skills directory as ko-browser/.
<agent-skills-dir>/
ko-browser/
SKILL.mdThe source skill lives here:
Example:
mkdir -p <agent-skills-dir>/ko-browser
cp skills/ko-browser/SKILL.md <agent-skills-dir>/ko-browser/SKILL.mdAfter that, make sure the kbr binary is available in the agent runtime PATH. For agent hosts that also support executable-tool registration, you can expose kbr directly in addition to the skill file.
Contributions are welcome! Please feel free to submit a Pull Request.
git clone https://github.com/libi/ko-browser.git
cd ko-browser
go build -o kbr ./cmd/kbr/ # without OCR
go build -tags=ocr -o kbr ./cmd/kbr/ # with OCR
go test ./tests/ -v -timeout 180sMIT