How it works
Telegram Search Engine discovers, analyzes, ranks, and maps public Telegram channels. This page covers the design; the full reference lives in the repo.
# Overview
Most Telegram tools are dumb scrapers. This one understands and maps communities: it discovers channels, reads what they post, classifies and summarizes them with a local LLM, ranks them by usefulness and network influence, and builds an interactive graph of how they reference each other.
It's open source and self-hosted — one docker compose up runs the whole stack. Use it as a data layer for any app that needs structured Telegram data, or run the web UI as a discovery product.
This live site is a read-only demo over a frozen snapshot of tech-community channels.
# Architecture
Two deliberately separated halves:
- The pipeline (crawl → analyze → graph) runs on your own machine — it needs a residential IP and a GPU for the local LLM. It writes to Postgres.
- The serving layer (web + read-only API + search) runs anywhere via Docker and only reads the database.
ingestion (Telethon) ─┐
analysis (Ollama) ─┼─► Postgres ──► API (FastAPI, read-only) ──► web (Next.js)
graph (networkx) ─┘ └─────► Meilisearch ──────────────┘# Pipeline
The pipeline is a set of CLI commands run on your machine against your own Telegram account.
discover + sample
# keyword discovery (with DB-driven expansion)
python -m app.ingestion.crawl --from-db --max-queries 20
# seed a known channel, then snowball via the link graph
python -m app.ingestion.add_channel https://t.me/SomeChannel
python -m app.ingestion.crawl --link-graph --max-depth 2 --limit 30analyze + rank + index
python -m app.analysis.run --limit 200 # LLM classify + score
python -m app.graph.metrics # pagerank, clusters, rescore
python -m app.search.reindex # sync into Meilisearch# Search
Search runs through Meilisearch — typo-tolerant, with word and proximity ranking over each channel's title, handle, summary, and category. Results are ranked by text relevance first, then by the channel's quality score.
If Meilisearch is unavailable, search transparently falls back to Postgres full-text search, so it never breaks.
# Graph
Every reference a channel makes — t.me links, @mentions, and forwarded-from channels — becomes a weighted edge. From that graph we compute, per channel:
- PageRank — influence / hubs
- Betweenness — bridges between communities
- Louvain clusters — community detection
These power the interactive network visualization on the graph page.
# Scoring
Each channel gets a 0–100 score blending four signals:
final = quality·40% + activity·30% + influence·20% + freshness·10%- quality — LLM usefulness (spam penalized)
- activity — volume, images, low repetition
- influence — normalized PageRank (network importance)
- freshness — recency of the latest post
# API
All endpoints are read-only GETs:
GET /search?q=&limit= ranked channels
GET /channel/{id} full profile + analytics
GET /categories categories + counts
GET /stats pipeline metrics
GET /graph nodes + edges
GET /graph/hubs most influential
GET /graph/bridges cluster connectors
GET /graph/clusters community summaries# Self-host
Clone the repo, bring your own Telegram account and a machine with a GPU (for the local LLM), and run the stack:
git clone github.com/your/telegram-search-engine
cp .env.prod.example .env.prod # fill in values
docker compose --env-file .env.prod up -d --buildFull setup, deployment, and data-migration guides are in the repository (README, DOCS, DEPLOY, CONTRIBUTING).
# Safety / ToS
The crawler is intentionally read-only and throttled. It never joins channels, honors Telegram's rate limits in full, and is meant to run on a dedicated, aged account on a residential IP.
Bans come from behavior (mass-joining, ignoring rate limits), not from reading public channels carefully. Keep crawl volume modest.