From 4011d20dbe6c7a31799a949586e02dad94db975e Mon Sep 17 00:00:00 2001 From: Oneric Date: Mon, 17 Mar 2025 19:37:54 +0100 Subject: [PATCH] federation/out: tweak publish retry backoff MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit With the current strategy the individual and cumulative backoff looks like this (the + part denotes max extra random delay): attempt backoff_single cumulative 1 16+30 16+30 2 47+60 63+90 3 243+90 ≈ 4min 321+180 4 1024+120 ≈17min 1360+300 ≈23+5min 5 3125+150 ≈20min 4500+450 ≈75+8min 6 7776+180 ≈ 2.1h 12291+630 ≈3.4h 7 16807+210 ≈ 4.6h 29113+840 ≈8h 8 32768+240 ≈ 9.1h 61896+1080 ≈17h 9 59049+270 ≈16.4h 120960+1350 ≈33h 10 100000+300 ≈27.7h 220975+1650 ≈61h We default to 5 retries meaning the least backoff runs with attempt=4. Therefore outgoing activiities might already be permanently dropped by a downtime of only 23 minutes which doesn't seem too implausible to occur. Furthermore it seems excessive to retry this quickly this often at the beginning. At the same time, we’d like to have at least one quick'ish retry to deal with transient issues and maintain reasonable federation responsiveness. If an admin wants to tolerate one -day downtime of remotes, retries need to be almost doubled. The new backoff strategy implemented in this commit instead switches to an exponetial after a few initial attempts: attempt backoff_single cumulative 1 16+30 16+30 2 143+60 159+90 3 2202+90 ≈37min 2361+180 ≈40min 4 8160+120 ≈ 2.3h 10521+300 ≈ 3h 5 77393+150 ≈21.5h 87914+450 ≈24h Initial retries are still fast, but the same amount of retries now allows a remote downtime of at least 40 minutes. Customising the retry count to 5 allows for whole-day downtimes. --- lib/pleroma/workers/publisher_worker.ex | 6 +++++- lib/pleroma/workers/worker_helper.ex | 9 +++++++++ 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/lib/pleroma/workers/publisher_worker.ex b/lib/pleroma/workers/publisher_worker.ex index be94134b9..a83f035b0 100644 --- a/lib/pleroma/workers/publisher_worker.ex +++ b/lib/pleroma/workers/publisher_worker.ex @@ -9,7 +9,11 @@ defmodule Pleroma.Workers.PublisherWorker do use Pleroma.Workers.WorkerHelper, queue: "federator_outgoing" def backoff(%Job{attempt: attempt}) when is_integer(attempt) do - Pleroma.Workers.WorkerHelper.sidekiq_backoff(attempt, 5) + if attempt > 3 do + Pleroma.Workers.WorkerHelper.exponential_backoff(attempt, 9.5) + else + Pleroma.Workers.WorkerHelper.sidekiq_backoff(attempt, 6) + end end @impl Oban.Worker diff --git a/lib/pleroma/workers/worker_helper.ex b/lib/pleroma/workers/worker_helper.ex index ea9ce9d3b..9a95e7fc7 100644 --- a/lib/pleroma/workers/worker_helper.ex +++ b/lib/pleroma/workers/worker_helper.ex @@ -22,6 +22,15 @@ def sidekiq_backoff(attempt, pow \\ 4, base_backoff \\ 15) do trunc(backoff) end + def exponential_backoff(attempt, base, base_backoff \\ 15) do + backoff = + :math.pow(base, attempt) + + base_backoff + + :rand.uniform(2 * base_backoff) * attempt + + trunc(backoff) + end + defmacro __using__(opts) do caller_module = __CALLER__.module queue = Keyword.fetch!(opts, :queue)