I Built a Telegram Bot That Manages My Production Server with AI

The Problem: 7 Production Apps, 1 Dev

I manage a server running 7 production applications. Laravel, Next.js, Python. MySQL, Redis, Nginx, Supervisor, PM2. Deploys, backups, monitoring, SSL certificates, job queues.

Any of these can break at any time. And when something goes down on a Saturday at 11 PM, I need to fix it -- no matter where I am.

Before, the workflow was: open the laptop, SSH in, investigate logs, run commands, hope the bar's WiFi doesn't drop mid-process. It works, but it's slow and requires having the laptop nearby.

So the obvious question came up: what if I could do all of this from my phone, talking to an AI with direct access to the server?

"But OpenClaw Already Does This"

Yes. And I started with it.

OpenClaw is an open-source project that does exactly this: a personal AI assistant that runs on your machine, connected to Telegram, WhatsApp, Discord. It has persistent memory, system access, 50+ integrations. It's really good.

I used it for a while and it helped me a lot. It was my main tool for managing the server -- I'd ask it to check logs, verify services, that kind of thing.

But then the dev itch hit.

I looked at my real pain points -- deploying 7 different apps, monitoring specific services, backups with R2 verification, SSL, Horizon queues, Supervisor -- and realized I wanted something molded exactly to my scenario. Every server has its quirks, every project has its scripts, every stack has its tricks.

So I decided to build my own. And in the process, I learned infinitely more than if I had just kept using a ready-made tool.

Today it has completely replaced what I used before. It became my 24/7 SRE that knows all my projects, my runbooks, my deploys. Responds in seconds via Telegram, deploys from my phone, monitors everything without spending a single AI token.

An important disclaimer: this is a personal project. I use it to manage my side projects, my tests, my development server. It's not a product with real paying customers. These are my internal projects where I can experiment, break things and rebuild without serious consequences. That's exactly why it was the perfect environment for this kind of experiment.

If you have simple needs or OpenClaw already solves your case -- use it. No need to reinvent the wheel. I reinvented it because I wanted to, because I learned a ton in the process, and because my pain points were specific enough to justify it.

Why Telegram

The first decision was the platform. Slack? Discord? Custom app?

Telegram. For practical reasons:

Works everywhere -- phone, tablet, desktop, web
Native bot support with inline buttons, commands, groups, forums
Voice messages -- I can literally talk to the bot while driving
Native Markdown -- logs and outputs formatted properly
Configurable notifications -- silent for routine monitoring, max alert for incidents

I didn't need to invent anything. Telegram's infra already handles communication, authentication, push notifications, and message history.

The Evolution: From Wrapper to SDK

The first commit was simple: a Telegram bot that received messages and piped them to the Claude CLI via subprocess. It worked, but it was limited. No streaming, no granular control, no hooks.

Over one month of development (21 commits, to be exact), the project evolved significantly:

Phase 1 -- Subprocess wrapper The basics. Message comes in through Telegram, goes to Claude CLI, response comes back. It worked, but each interaction was a new session. No memory, no persistent context.

Phase 2 -- Interactive dashboard Inline buttons for quick actions: server status, service restarts, health checks. The bot became a control panel inside Telegram.

Phase 3 -- Zero-cost SRE commands This was a turning point. I realized not everything needs to go through AI. Checking if MySQL is running? systemctl is-active mysql. No need to spend API tokens on that.

I created commands like /sre, /deploy, /logs, /backup that execute directly on the server, without calling the AI. Zero-cost monitoring.

Phase 4 -- Claude Agent SDK Migrating from CLI subprocess to the Agent SDK was the biggest change. Instead of spawning a CLI process, I now have full control: response streaming, hooks to intercept actions, budget guard (limit cost per interaction), session checkpointing.

# Before: raw subprocess
process = subprocess.Popen(["claude", "--print", message])
output = process.communicate()

# After: Agent SDK with full control
async with AgentSession(model="claude-sonnet-4-20250514") as session:
    response = await session.send(
        message,
        tools=tools,
        max_tokens=budget_limit,
        system=sre_context
    )

The difference is like going from a bash script to a proper framework.

Real Technical Challenges

The Context Window Problem

Claude has a ~200k token context window. Sounds like a lot, but when you're investigating a production incident -- reading logs, checking configs, analyzing code -- it runs out fast.

The solution: subagent delegation. The main bot acts as a thin orchestrator. Tasks involving more than 3 tool calls go to a subagent that receives focused context, executes, and returns a summary.

# Orchestrator decides: direct or subagent?
if estimated_tool_calls > 3:
    result = await delegate_to_subagent(task, focused_context)
    # Subagent executes and returns compact summary
else:
    result = await execute_directly(task)

The main orchestrator's context stays light. When a session approaches the limit (150k tokens), it automatically rotates, and the new agent reads a state file to resume where it left off.

Semantic Skills: Bilingual TF-IDF

The bot learns "skills" -- instructions on how to execute specific tasks. But how do you find the right skill when the user asks for something?

I implemented semantic matching with TF-IDF and cosine similarity, with a twist: PT-BR to EN synonym expansion. When the user types "reiniciar o banco de dados" (restart the database), the system understands that "banco de dados" maps to "database" and "reiniciar" maps to "restart", finding the right skill even if it was written in English.

# Bilingual synonyms for expansion
SYNONYMS = {
    "banco de dados": ["database", "mysql", "db"],
    "reiniciar": ["restart", "reload", "reboot"],
    "servidor": ["server", "host", "machine"],
    # ...
}

It's not an LLM doing 768-dimension embeddings. It's classic TF-IDF with a lexical expansion layer. Simple, fast, no API cost.

Security: Forced Open Source

I decided to open-source the project. This forced me to do something I should have done from the start: separate ALL personal information from the code.

I rewrote the Git history. Extracted tokens, IPs, project names to external config files. Created a proper .gitignore. Audited every commit.

Open-sourcing is the best security audit there is. When you know the world will read your code, you write better code.

SDK Bugs: The Reality of Using New Tools

The Claude Agent SDK (v0.1.46 at the time) had bugs. TaskGroups that failed silently. Transport issues in long sessions. Hooks that didn't fire in the expected order.

I'm not complaining -- it's a new SDK and the Anthropic team is iterating fast. But it's the reality of early adoption: you will find bugs and need to know how to work around them.

I documented every workaround. Some became issues on the repo. The experience reinforced something I already knew: reading the source code of the tools you use is as important as reading the documentation.

The Self-Improvement Loop

The most interesting feature: the bot reviews and suggests improvements to its own code.

Here's how it works: with a command, the bot uses Claude with deep analysis to read its own source code, identify problems, suggest refactorings, and generate a prioritized improvement plan.

Is it meta? Yes. But it works. The bot has already suggested:

Simplifying modules I had over-engineered
Removing 1500+ lines of dead code (when I dropped multi-provider support that no longer made sense)
Error handling improvements I wouldn't have thought of
SQLite query optimizations for message history

The rule is simple: the bot suggests, I review and approve. Nothing goes to production without human eyes. But the suggestions are surprisingly good.

Final Architecture

After 21 commits, here's what the project looks like:

6000+ lines of Python
Telegram bot (python-telegram-bot) as the interface
Claude Agent SDK for AI processing
SQLite for sessions, costs, message history
Semantic skills with TF-IDF and bilingual expansion
SRE commands at zero cost (health checks without AI)
Voice: Whisper for audio transcription, ElevenLabs for TTS
MCP: GitHub, Cloudflare, Playwright, Context7
Self-improvement: code review by the bot itself

What I Learned

1. AI agents are only as good as their context and tools. An LLM without tool access is a glorified chatbot. The magic happens when it can read logs, execute commands, access APIs. The Agent SDK understood this -- and that's why it works better than simply calling the chat API.

2. Not everything needs AI. This was the most counter-intuitive lesson. Checking if a service is running doesn't need AI. Listing backups doesn't need AI. Creating zero-cost commands for simple operations saves money and is faster.

3. Open source forces you to write better code. When you know someone might read your code, you stop cutting corners. You extract configs, document decisions, organize modules. The code got objectively better after I decided to open it up.

4. New tools have bugs. And that's okay. Early adoption has a cost. You'll find problems nobody documented. But you also learn faster than everyone else, and your contributions (issues, workarounds) help the entire community.

The Project Is Open Source

The code is open on GitHub. If you want to build something similar -- a Telegram bot that connects AI to any infrastructure -- the repo is a good starting point.

github.com/billyfranklim1/telegram-claude-bot

If you use it, let me know how it turned out. If you find bugs or have suggestions, open an issue. Contributions are welcome.

The server keeps running. The bot keeps managing. And I keep deploying from my phone in the middle of lunch.