documentation

How it works

Telegram Search Engine discovers, analyzes, ranks, and maps public Telegram channels. This page covers the design; the full reference lives in the repo.

# Overview

Most Telegram tools are dumb scrapers. This one understands and maps communities: it discovers channels, reads what they post, classifies and summarizes them with a local LLM, ranks them by usefulness and network influence, and builds an interactive graph of how they reference each other.

It's open source and self-hosted — one docker compose up runs the whole stack. Use it as a data layer for any app that needs structured Telegram data, or run the web UI as a discovery product.

This live site is a read-only demo over a frozen snapshot of tech-community channels.

# Architecture

Two deliberately separated halves:

The pipeline (crawl → analyze → graph) runs on your own machine — it needs a residential IP and a GPU for the local LLM. It writes to Postgres.
The serving layer (web + read-only API + search) runs anywhere via Docker and only reads the database.

ingestion (Telethon) ─┐
analysis  (Ollama)   ─┼─► Postgres ──► API (FastAPI, read-only) ──► web (Next.js)
graph     (networkx) ─┘        └─────► Meilisearch ──────────────┘

# Pipeline

The pipeline is a set of CLI commands run on your machine against your own Telegram account.

discover + sample

# keyword discovery (with DB-driven expansion)
python -m app.ingestion.crawl --from-db --max-queries 20

# seed a known channel, then snowball via the link graph
python -m app.ingestion.add_channel https://t.me/SomeChannel
python -m app.ingestion.crawl --link-graph --max-depth 2 --limit 30

analyze + rank + index

python -m app.analysis.run --limit 200   # LLM classify + score
python -m app.graph.metrics              # pagerank, clusters, rescore
python -m app.search.reindex             # sync into Meilisearch

# Search

Search runs through Meilisearch — typo-tolerant, with word and proximity ranking over each channel's title, handle, summary, and category. Results are ranked by text relevance first, then by the channel's quality score.

If Meilisearch is unavailable, search transparently falls back to Postgres full-text search, so it never breaks.

# Graph

Every reference a channel makes — t.me links, @mentions, and forwarded-from channels — becomes a weighted edge. From that graph we compute, per channel:

PageRank — influence / hubs
Betweenness — bridges between communities
Louvain clusters — community detection

These power the interactive network visualization on the graph page.

# Scoring

Each channel gets a 0–100 score blending four signals:

final = quality·40%  +  activity·30%  +  influence·20%  +  freshness·10%

quality — LLM usefulness (spam penalized)
activity — volume, images, low repetition
influence — normalized PageRank (network importance)
freshness — recency of the latest post

# API

All endpoints are read-only GETs:

GET /search?q=&limit=     ranked channels
GET /channel/{id}         full profile + analytics
GET /categories           categories + counts
GET /stats                pipeline metrics
GET /graph                nodes + edges
GET /graph/hubs           most influential
GET /graph/bridges        cluster connectors
GET /graph/clusters       community summaries

# Self-host

Clone the repo, bring your own Telegram account and a machine with a GPU (for the local LLM), and run the stack:

git clone github.com/your/telegram-search-engine
cp .env.prod.example .env.prod   # fill in values
docker compose --env-file .env.prod up -d --build

Full setup, deployment, and data-migration guides are in the repository (README, DOCS, DEPLOY, CONTRIBUTING).

# Safety / ToS

The crawler is intentionally read-only and throttled. It never joins channels, honors Telegram's rate limits in full, and is meant to run on a dedicated, aged account on a residential IP.

Bans come from behavior (mass-joining, ignoring rate limits), not from reading public channels carefully. Keep crawl volume modest.

View on GitHub ↗