:discard marks jobs as "discarded", i.e. jobs which permanently failed
due to e.g. exhausting all retries or explicitly being discared due to a
fatal error.
:cancel marks jobs as "cancelled" which does not imply failure.
While neither method counts as a job "exception" in the set of
telemetries we currently export via Prometheus, the different state
is visible in the (not-exported) metadata of oban job telemetry.
We can use handlers of those events to build bespoke statistics.
Ideally we'd like to distinguish in the receiver worker between
"invalid" and "already present or delete of unknown" documents,
but this is cumbersome to get get right with a list of
free-form, human-readable descriptions oof the violated constraints.
For now, just count both as an fatal error.
# but that is cumbersome to get right with a list of string error descriptions
We now add the htmlMfm key when relevant, store this in the database, and we see it when fetching using e.g.
curl -L -H 'Accept: application/activity+json' "$ap_id"
The `@context` of the Activity Pub message now also contains `htmlMfm: https://w3id.org/fep/c16b#htmlMfm`.
When an incomming post has `htmlMfm: true`, we will not re-parse the content.
FEDERATION.md is adapted to show the `htmlMfm` term is used.
CUrrently internal actors are supposed to be identified in the database
by either a NULL nickname or a nickname prefixed by "internal.". For old
installations this is true, but only if they were created over five
years ago before 70410dfafd.
Newer installations will use "relay" as the nickname of the realy actor
causing ii to be treated as a regular user.
In particular this means all installations in the last five years never
made use of the reduced endpoint case, thus it is dropped.
Simplify this distinction by properly marking internal actors asa an
Application type in the database. This was already implemented before by
ilja in https://akkoma.dev/AkkomaGang/akkoma/pulls/457 but accidentally
reverted during a translation update in
eba3cce77b. This commit effectively
restores this patch together with further changes.
Also service actors unconditionally expose follow* collections atm,
eventhough the internal fetch actor doesn't actually implement them.
Since they are optional per spec and with Mastodon omitting them too
for its instance actor proving the practical viability, we should just
omit them. The relay actor however should continue to expose such
collections and they are properly implemented here.
Here too we now just use the values or their absence in the database.
We do not have any other internal.* actors besides fetch atm.
Fixes: https://akkoma.dev/AkkomaGang/akkoma/issues/855
Co-authored-by: ilja space <git@ilja.space>
E.g. \*oma federates (most) follower-only posts multiple times
to each personal inbox. This commonly leads to race conditions
with jobs of several copies running at the same time and getting
past the initial "already known" check but then later all but
one will crash with an exception from the unique db index.
Since the only special thing we do with copies anyway is to discard them,
just don't create such duplicate jobs in the first place.
For the same reason and since failed jobs don't count towards
duplicates, this should have virtually no effect on federation.
Since we later only consider the Create activity for
access permission checks, but the semantically more
sensible set of fields are the object’s.
Changing the check itself to use the object may have unintended
consequences on already existing legacy posts as the old code
which processed it when it arrived may have never considered
effects on the objects addressing fields.
While the object itself has the expected adressing for an
"unlisted" post, we always use the Create activity’s
adressing fields for permission checks.
To avoid unintended effects on legacy objects
we will continue to use the activity for access perm checks,
but fix its addressing fields based on its object data.
Ref: https://git.pleroma.social/pleroma/pleroma/-/issues/3323
With the current strategy the individual
and cumulative backoff looks like this
(the + part denotes max extra random delay):
attempt backoff_single cumulative
1 16+30 16+30
2 47+60 63+90
3 243+90 ≈ 4min 321+180
4 1024+120 ≈17min 1360+300 ≈23+5min
5 3125+150 ≈20min 4500+450 ≈75+8min
6 7776+180 ≈ 2.1h 12291+630 ≈3.4h
7 16807+210 ≈ 4.6h 29113+840 ≈8h
8 32768+240 ≈ 9.1h 61896+1080 ≈17h
9 59049+270 ≈16.4h 120960+1350 ≈33h
10 100000+300 ≈27.7h 220975+1650 ≈61h
We default to 5 retries meaning the least backoff runs with attempt=4.
Therefore outgoing activiities might already be permanently dropped by a
downtime of only 23 minutes which doesn't seem too implausible to occur.
Furthermore it seems excessive to retry this quickly this often at the
beginning.
At the same time, we’d like to have at least one quick'ish retry to deal
with transient issues and maintain reasonable federation responsiveness.
If an admin wants to tolerate one -day downtime of remotes,
retries need to be almost doubled.
The new backoff strategy implemented in this commit instead
switches to an exponetial after a few initial attempts:
attempt backoff_single cumulative
1 16+30 16+30
2 143+60 159+90
3 2202+90 ≈37min 2361+180 ≈40min
4 8160+120 ≈ 2.3h 10521+300 ≈ 3h
5 77393+150 ≈21.5h 87914+450 ≈24h
Initial retries are still fast, but the same amount of retries
now allows a remote downtime of at least 40 minutes. Customising
the retry count to 5 allows for whole-day downtimes.
This was accidentally broken in c8e0f7848b
due to a one-letter mistake in the plug option name and an absence of
tests. Therefore it was once again possible to serve e.g. Javascript or
CSS payloads via uploads and emoji.
However due to other protections it was still NOT possible for anyone to
serve any payload with an ActivityPub Content-Type. With the CSP policy
hardening from previous JS payload exloits predating the Content-Type
sanitisation, there is currently no known way of abusing this weakened
Content-Type sanitisation, but should be fixed regardless.
This commit fixes the option name and adds tests to ensure
such a regression doesn't occur again in the future.
Reported-by: Lain Soykaf <lain@lain.com>
When note editing support was added, it was omitted to strip internal
fields from edited notes and their history.
This was uncovered due to Mastodon inlining the like count as a "likes"
collection conflicting with our internal "likes" list causing validation
failures. In a spot check with likes/like_count it was not possible to
inject those internal fields into the local db via Update, but this
was not extensively tested for all fields and avenues.
Similarly address normalisation did not normalise addressing in the
object history, although this was never at risk of being exploitable.
The revision history of the Pleroma MR adding edit support reveals
recusrive stripping was intentionally avoided, since it will end up
removing e.g. emoji from outgoing activities. This appears to still
be true. However, all current internal fields ("pleroma_interal"
appears to be unused) contain data already publicised otherwise anyway.
In the interest of fixing a federation bug (and at worst potential data
injection) quickly outgoing stripping is left non-recursive for now.
Of course the ultimate fix here is to not mix remote and internal data
into the same map in the first place, but unfortunately having a single
map of all truth is a core assumption of *oma's AP doc processing.
Changing this is a masive undertaking and not suitable for providing
a short-term fix.
We expect most requests to be made for the actual canonical ID,
so check this one first (starting without query headers matching the
predominant albeit spec-breaking version).
Also avoid unnecessary rerewrites of the digest header on each route
alias by just setting it once before iterating through aliases.
This matches behaviour prioir to the SigningKey migration
and the expected semantics of the http_signatures lib.
Additionally add a min interval paramter, to avoid
refetch floods on bugs causing incompatible signatures
(like e.g. currently with Bridgy)