category
🌐 Browser-use is the easiest way to connect your AI agents with the browser.
💡 See what others are building and share your projects in our Discord! Want Swag? Check out our Merch store.
🌤️ Skip the setup - try our hosted version for instant browser automation! Try the cloud ☁︎.
With pip (Python>=3.11):
pip install browser-useInstall the browser:
playwright install chromium --with-deps --no-shellSpin up your agent:
import asyncio
from dotenv import load_dotenv
load_dotenv()
from browser_use import Agent
from browser_use.llm import ChatOpenAI
async def main():
    agent = Agent(
        task="Compare the price of gpt-4o and DeepSeek-V3",
        llm=ChatOpenAI(model="o4-mini", temperature=1.0),
    )
    await agent.run()
asyncio.run(main())Add your API keys for the provider you want to use to your .env file.
OPENAI_API_KEY=
ANTHROPIC_API_KEY=
AZURE_OPENAI_ENDPOINT=
AZURE_OPENAI_KEY=
GOOGLE_API_KEY=
DEEPSEEK_API_KEY=
GROK_API_KEY=
NOVITA_API_KEY=For other settings, models, and more, check out the documentation 📕.
You can test browser-use using its Web UI or Desktop App.
You can also use our browser-use interactive CLI (similar to claude code):
pip install "browser-use[cli]"
browser-useBrowser-use supports the Model Context Protocol (MCP), enabling integration with Claude Desktop and other MCP-compatible clients.
Add browser-use to your Claude Desktop configuration:
{
  "mcpServers": {
    "browser-use": {
      "command": "uvx",
      "args": ["browser-use", "--mcp"],
      "env": {
        "OPENAI_API_KEY": "sk-..."
      }
    }
  }
}This gives Claude Desktop access to browser automation tools for web scraping, form filling, and more.
Browser-use agents can connect to multiple external MCP servers to extend their capabilities:
import asyncio
from browser_use import Agent, Controller
from browser_use.mcp.client import MCPClient
from browser_use.llm import ChatOpenAI
async def main():
    # Initialize controller
    controller = Controller()
    
    # Connect to multiple MCP servers
    filesystem_client = MCPClient(
        server_name="filesystem",
        command="npx",
        args=["-y", "@modelcontextprotocol/server-filesystem", "/Users/me/documents"]
    )
    
    github_client = MCPClient(
        server_name="github", 
        command="npx",
        args=["-y", "@modelcontextprotocol/server-github"],
        env={"GITHUB_TOKEN": "your-github-token"}
    )
    
    # Connect and register tools from both servers
    await filesystem_client.connect()
    await filesystem_client.register_to_controller(controller)
    
    await github_client.connect()
    await github_client.register_to_controller(controller)
    
    # Create agent with MCP-enabled controller
    agent = Agent(
        task="Find the latest report.pdf in my documents and create a GitHub issue about it",
        llm=ChatOpenAI(model="gpt-4o"),
        controller=controller  # Controller has tools from both MCP servers
    )
    
    # Run the agent
    await agent.run()
    
    # Cleanup
    await filesystem_client.disconnect()
    await github_client.disconnect()
asyncio.run(main())See the MCP documentation for more details.
Demos

远景
告诉你的电脑该做什么,它就会完成。
路线图
代理人
- 提高代理内存以处理+100个步骤
 - 增强规划能力(加载网站特定上下文)
 - 减少令牌消耗(系统提示、DOM状态)
 
DOM提取
- 启用对所有可能的UI元素的检测
 - 改进UI元素的状态表示,以便所有LLM都能理解页面上的内容
 
工作流
- 让用户记录一个工作流,我们可以使用浏览器作为回退来重新运行该工作流
 - 即使页面发生变化,也能重新运行工作流
 
用户体验
- 为教程执行、工作申请、QA测试、社交媒体等创建各种模板,用户只需复制和粘贴即可。
 - 改进文档
 - 让它更快
 
并行化
- 人类的工作是按顺序进行的。如果我们能够并行处理类似的任务,浏览器代理的真正力量就会成为现实。例如,如果你想查找100家公司的联系信息,这一切都可以并行完成,并报告给主代理,主代理处理结果并再次启动并行子任务。
 
- 登录 发表评论
 - 51 次浏览