Every morning I get an email with the ten most relevant articles from my RSS feeds, summarised in French, sorted by topic, with a clickable link for each one. Five minutes of reading. No paid subscription. No data sent to any US tech giant.

Here’s how I built it — and why digital sovereignty was a non-negotiable requirement.


The problem: too much information, not enough signal

I follow about a hundred RSS feeds: tech blogs, Red Hat and OpenShift news, home automation, AI tracking… That’s between 100 and 300 new articles every day. Nobody reads that.

The usual solutions didn’t work for me:

  • Feedly, Inoreader: cloud services that host your reading data, sell AI as a premium feature, and on which you are entirely dependent
  • Newsletters: human editorial curation, with editorial bias and a fixed schedule you don’t control
  • Reading FreshRSS directly: effective but time-consuming, with no automatic prioritisation

What I wanted was a tool that knows my interests, reads for me, gives me a summary of what matters — and runs on my own infrastructure.


The sovereignty requirement

I work daily with internal technical data: architectures, configurations, projects. Sending any of that to OpenAI or any US cloud service subject to the Cloud Act was out of the question.

My constraints:

  • Self-hosted for critical components (database, orchestration)
  • AI provided by a European vendor, with data staying in Europe
  • No vendor lock-in on the embedding model

My answer: n8n + PostgreSQL/pgvector + Infomaniak AI.


The architecture

FreshRSS (self-hosted)
    ↓ GReader API
n8n (self-hosted) — Phase 1: Ingestion
    ↓ HTTP Request → Infomaniak AI (embeddings)
    ↓ SQL INSERT
PostgreSQL + pgvector (self-hosted)
    ↓
n8n — Phase 2: Digest (triggered automatically)
    ↓ SQL SELECT (cosine similarity)
    ↓ HTTP Request → Infomaniak AI (LLM summary)
    ↓ SMTP
Daily email

Everything runs on my own infrastructure. The only external call goes to Infomaniak, a Swiss hosting provider with infrastructure in Europe and native GDPR compliance.


The components

FreshRSS — the sovereign aggregator

FreshRSS is an open source, self-hostable RSS aggregator. It exposes a Google Reader-compatible API (GReader), making it easy to drive programmatically.

I use it to centralise my subscriptions into categories: IT News, Home Automation, etc. It’s my single entry point.

n8n — the orchestrator

n8n is an open source, self-hostable workflow automation platform. Think Zapier or Make, but without the privacy limitations of a SaaS product.

The workflow runs in two phases that chain together automatically:

Phase 1 — Ingestion (7 AM)

  1. Authenticate with FreshRSS via the GReader API
  2. Fetch articles from the last 24 hours, by category
  3. Strip HTML, detect language (FR/EN by stopword counting)
  4. In-memory deduplication
  5. Generate one embedding per article via Infomaniak AI
  6. Insert into PostgreSQL with database-level deduplication (WHERE NOT EXISTS)

Phase 2 — Digest (chained)

  1. A list of interest topics defined in a Set node
  2. Each interest is embedded separately
  3. Cosine similarity SQL query per interest (LIMIT 5 each)
  4. Aggregate, deduplicate, keep top 10
  5. HTML summary generated by the LLM
  6. Sent by SMTP

PostgreSQL + pgvector — the vector memory

pgvector is a PostgreSQL extension that adds a VECTOR type and distance operators. It turns Postgres into a vector database with no additional infrastructure.

The table is minimal:

CREATE TABLE rss_articles (
    id        UUID DEFAULT gen_random_uuid() PRIMARY KEY,
    text      TEXT NOT NULL,
    metadata  JSONB DEFAULT '{}',
    embedding VECTOR(3584)
);

The metadata column stores the title, URL, source, language, and Unix timestamp. The embedding column stores the 3584-dimensional vector.

Infomaniak AI — the brain, in Europe

Infomaniak is a Swiss hosting provider offering an OpenAI-compatible AI API. Two models are used here:

  • BGE-Multilingual-Gemma2 (9B params, 3584 dims) for embeddings — natively multilingual FR/EN
  • Gemma 4 31B for generating the HTML summary

Everything stays in Europe. No data transits through US servers.


This is where the magic happens. Every article is stored as a vector of 3584 numbers — a mathematical representation of its meaning. Each interest topic is also converted into a vector.

The search finds articles whose vector is closest to the interest’s vector. We use cosine distance (the <=> operator in pgvector), which measures the angle between two vectors — the correct metric for text embeddings.

SELECT titre, lien,
       (embedding <=> '[...]'::vector)
         * CASE WHEN language = 'fr' THEN 0.75 ELSE 1.0 END AS distance
FROM rss_articles
WHERE to_timestamp(published_at::bigint) > NOW() - INTERVAL '30 days'
ORDER BY distance ASC
LIMIT 5

The 0.75 multiplier on French articles is a soft relevance bonus to compensate for any model bias toward English.

Why one embedding per interest?

I could have concatenated all my interests into a single string and generated one vector. Simpler, but far less effective.

Consider: “electric car” and “software supply chain” are radically different topics. An average vector represents neither correctly — it lands somewhere in the middle of nowhere. By embedding each interest separately and running N queries, each topic finds its own relevant articles.

Why BGE-Multilingual-Gemma2?

The choice of embedding model is critical. I started with all-MiniLM-L12-v2: an English-only model with 384 dimensions, fast and popular. The result: my French queries produced incoherent vectors, and the articles surfaced had no connection to my actual interests.

BGE-Multilingual-Gemma2 (BAAI) changes everything: 9 billion parameters, 3584 dimensions, natively trained on French and English. Results are immediately relevant.

⚠️ The model must be identical between ingestion and search. Switching models means recomputing all existing embeddings.


The output

Every morning around 7:30 AM I receive a structured email:

  • IT News section: summaries of Red Hat, OpenShift, DevSecOps, AI articles…
  • Home Automation section: Home Assistant, new devices…
  • For each article: 2-sentence summary in French + clickable link
  • A final recommendation from the LLM

All rendered as clean HTML, readable in any email client.


Lessons learned

Cosine distance, not Euclidean. The <-> operator in pgvector measures Euclidean distance, which is the wrong metric for text embeddings. <=> measures the angle — that’s the right one.

LangChain can betray you. I originally used n8n’s native LangChain nodes for ingestion. The document loader treated JSON as raw text and fragmented each article into a dozen separate rows. Bypassing LangChain entirely — direct HTTP calls + raw SQL — fixed it completely.

Timestamps in the database. FreshRSS returns dates as Unix timestamps (seconds). PostgreSQL cannot directly cast an integer to timestamptz. The correct conversion: to_timestamp((metadata->>'published_at')::bigint).

pgvector and high dimensions. The ivfflat and hnsw index types are limited to 2000 dimensions in pgvector < 0.7.0. BGE produces 3584 dims: no index possible without the halfvec type (pgvector ≥ 0.7.0). For a few hundred articles, a sequential scan is perfectly fine.


Going further

The workflow is available on GitHub. To adapt it:

  • Replace FreshRSS with any source: n8n’s native RSS node, an HTTP call to another aggregator
  • Change the AI provider: any OpenAI-compatible API works (Mistral, local Ollama…) — the key is to keep the same model between ingestion and search
  • Adjust interests: a plain text field in the workflow, changeable without touching any code

The full stack runs on a modest VPS. It’s the tech monitoring setup I wish I’d had for the past ten years.