3 min read

Introducing GoBot, an Autonomous AI Agent That Actually Controls Your Desktop

GoBot is a persistent AI agent that remembers conversations, controls your desktop, runs scheduled tasks, and works across multiple channels. Tonight I watched it check my Gmail, compose an email, and click Send. All by itself.

I really love GoLang for how small and portable the binaries are.

As a nod to the Clawd (now Molt), I’ve been working on GoBot. It’s an open-source (MIT) AI agent that’s always running and can actually interact with your desktop, not just chat.

What Makes It Different?

For starters, it’s a single binary and supports multiple sub-agents.

GoBot is a persistent agent that:

Remembers everything. Conversations, facts, preferences survive restarts. It’s backed by SQLite so nothing is lost.

Controls your desktop. Tonight I asked it to check my Gmail. It opened Brave, took screenshots to “see” my inbox, composed an email, and clicked Send. All by itself. Very cool to see.

Runs scheduled tasks. Cron jobs that can execute bash commands OR spawn sub-agents.

Works across channels. The same agent answers on the web UI. I’m working on connecting Discord, Telegram, Slack, and more.

What I Demoed Tonight

“Check my Gmail” → It activated Brave, navigated to Gmail, and took a screenshot to read the unread count.

“Send Ben an email saying I’ll be late” → Composed the email, typed it out, clicked Send.

I also built 7 macOS automation plugins in one session: clipboard, screen capture, window management, accessibility API access, notifications, and mouse/keyboard control.

The Desktop Control Plugins

PluginWhat it does
clipboardRead/write clipboard, history
screenScreenshots + OCR via Vision framework
accessibilityRead UI trees, click buttons by label
desktopMouse clicks, keyboard input, scrolling
windowList/focus/move/resize windows
appLaunch/quit apps, menu bar interaction
notificationSystem alerts, TTS

These plugins give GoBot the ability to see and interact with your desktop the way you do. It’s not just parsing APIs or running CLI commands. It’s literally clicking buttons and reading what’s on screen.

Tech Stack

  • Go backend (go-zero framework)
  • SvelteKit 2 + Svelte 5 frontend
  • Multiple AI providers (Anthropic, OpenAI, Gemini, Ollama)
  • hashicorp/go-plugin for extensibility
  • SQLite for persistence

The plugin architecture means you can extend GoBot with new capabilities without touching the core. Write a plugin, drop it in, and the agent can use it.

Why This Matters

I believe fully autonomous agents are the future of AI.

Right now, most AI assistants are chat-only. You ask a question, get an answer, copy-paste something, repeat. The human is still the executor.

GoBot flips that. You tell it what you want done, and it does it. It navigates apps, fills forms, clicks buttons, reads screens. It’s the difference between a consultant who gives advice and an employee who gets things done.

This is where we’re headed. Not AI that tells you how to do things, but AI that does things for you.

Get Involved

GoBot is open source under MIT. I’m actively working on it and welcome anyone who wants to help.

The alpha release is available now: github.com/localrivet/gobot

It’s early. It’s rough around the edges. But it works, and I think the core idea of a persistent, desktop-controlling agent is something worth building in public.


X: x.com/almatuck

LinkedIn: linkedin.com/in/almatuck

Written by

Alma Tuck

Back to all articles