Show HN: Cua-Bench – a benchmark for AI agents in GUI environments

https://news.ycombinator.com/rss Hits: 2
Summary

Build, benchmark, and deploy agents that use computers Cua is an open-source platform for building, benchmarking, and deploying agents that can use any computer, with isolated, self-hostable sandboxes (Docker, QEMU, Apple Virtualization). vibe-photoshop.mp4 Choose Your Path Cua - Agentic UI Automation & Code Execution Build agents that see screens, click buttons, and complete tasks autonomously. Run isolated code execution environments for AI coding assistants like Claude Code, Codex CLI, or OpenCode. # Requires Python 3.12 or 3.13 from computer import Computer from agent import ComputerAgent computer = Computer ( os_type = "linux" , provider_type = "cloud" ) agent = ComputerAgent ( model = "anthropic/claude-sonnet-4-5-20250929" , computer = computer ) async for result in agent . run ([{ "role" : "user" , "content" : "Open Firefox and search for Cua" }]): print ( result ) Get Started | Examples | API Reference Cua-Bench - Benchmarks & RL Environments Evaluate computer-use agents on OSWorld, ScreenSpot, Windows Arena, and custom tasks. Export trajectories for training. # Install and create base image cd cua-bench uv tool install -e . && cb image create linux-docker # Run benchmark with agent cb run dataset datasets/cua-bench-basic --agent cua-agent --max-parallel 4 Get Started | Registry | CLI Reference Lume - macOS Virtualization Create and manage macOS/Linux VMs with near-native performance on Apple Silicon using Apple's Virtualization.Framework. # Install Lume /bin/bash -c " $( curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/lume/scripts/install.sh ) " # Pull & start a macOS VM lume run macos-sequoia-vanilla:latest Get Started | FAQ | CLI Reference Packages Package Description cua-agent AI agent framework for computer-use tasks cua-computer SDK for controlling desktop environments cua-computer-server Driver for UI interactions and code execution in sandboxes cua-bench Benchmarks and RL environments for computer-use lume macOS/Linux VM management ...

First seen: 2026-01-28 16:27

Last seen: 2026-01-28 17:27