demo

Proof of concept — exploring a frozen snapshot of public channels already crawled. The full crawler + analysis pipeline is open source: run your own ↗

documentation

How it works

Telegram Search Engine discovers, analyzes, ranks, and maps public Telegram channels. This page covers the design; the full reference lives in the repo.

# Overview

Most Telegram tools are dumb scrapers. This one understands and maps communities: it discovers channels, reads what they post, classifies and summarizes them with a local LLM, ranks them by usefulness and network influence, and builds an interactive graph of how they reference each other.

It's open source and self-hosted — one docker compose up runs the whole stack. Use it as a data layer for any app that needs structured Telegram data, or run the web UI as a discovery product.

This live site is a read-only demo over a frozen snapshot of tech-community channels.

# Architecture

Two deliberately separated halves:

  • The pipeline (crawl → analyze → graph) runs on your own machine — it needs a residential IP and a GPU for the local LLM. It writes to Postgres.
  • The serving layer (web + read-only API + search) runs anywhere via Docker and only reads the database.
ingestion (Telethon) ─┐
analysis  (Ollama)   ─┼─► Postgres ──► API (FastAPI, read-only) ──► web (Next.js)
graph     (networkx) ─┘        └─────► Meilisearch ──────────────┘

# Pipeline

The pipeline is a set of CLI commands run on your machine against your own Telegram account.

discover + sample

# keyword discovery (with DB-driven expansion)
python -m app.ingestion.crawl --from-db --max-queries 20

# seed a known channel, then snowball via the link graph
python -m app.ingestion.add_channel https://t.me/SomeChannel
python -m app.ingestion.crawl --link-graph --max-depth 2 --limit 30

analyze + rank + index

python -m app.analysis.run --limit 200   # LLM classify + score
python -m app.graph.metrics              # pagerank, clusters, rescore
python -m app.search.reindex             # sync into Meilisearch

# Graph

Every reference a channel makes — t.me links, @mentions, and forwarded-from channels — becomes a weighted edge. From that graph we compute, per channel:

  • PageRank — influence / hubs
  • Betweenness — bridges between communities
  • Louvain clusters — community detection

These power the interactive network visualization on the graph page.

# Scoring

Each channel gets a 0–100 score blending four signals:

final = quality·40%  +  activity·30%  +  influence·20%  +  freshness·10%
  • quality — LLM usefulness (spam penalized)
  • activity — volume, images, low repetition
  • influence — normalized PageRank (network importance)
  • freshness — recency of the latest post

# API

All endpoints are read-only GETs:

GET /search?q=&limit=     ranked channels
GET /channel/{id}         full profile + analytics
GET /categories           categories + counts
GET /stats                pipeline metrics
GET /graph                nodes + edges
GET /graph/hubs           most influential
GET /graph/bridges        cluster connectors
GET /graph/clusters       community summaries

# Self-host

Clone the repo, bring your own Telegram account and a machine with a GPU (for the local LLM), and run the stack:

git clone github.com/your/telegram-search-engine
cp .env.prod.example .env.prod   # fill in values
docker compose --env-file .env.prod up -d --build

Full setup, deployment, and data-migration guides are in the repository (README, DOCS, DEPLOY, CONTRIBUTING).

# Safety / ToS

The crawler is intentionally read-only and throttled. It never joins channels, honors Telegram's rate limits in full, and is meant to run on a dedicated, aged account on a residential IP.

Bans come from behavior (mass-joining, ignoring rate limits), not from reading public channels carefully. Keep crawl volume modest.