This allows to retain posts and boosts of remote actors with local
follows regardless of age.
With the "full" setting this can be taken further treating such
followed actors just like local users even keeping all posts they
liked or reacated to.
Pinned objects and their threads will be refetched
on user refresh which by default happens after a day
once a user is encountered again in any form including a mention.
We observed pruning pinned objects usually results in heavy load for
hours after a database prune due to a clogged up remote fetch queue as
pinned posts and their threads of many (most?) users get refetched.
Thus do not prune pinned posts by default.
Keeping closer to earlier behaviour this will still prune threads of
pinned posts regardless of --keep-threads if nothing else prevenets it.
Statmenets for keeping and breaking threads vastly differ
and the whole if block doesn't even fit on one screen.
Thus move each version out into its own function to
improve readability
In theory a pedantic reading of the spec indeed suggests
DMs must only be delivered to personal inboxes. However,
in practice the normative force of real-world implementations
disagrees. Mastodon, Iceshrimp.NET and GtS (the latter notably has a
config option to never use sharedInboxes) all unconditionally prefer
sharedInbox for everything without ill effect. This saves on duplicate
deliveries on the sending and processing on the receiving end.
(Typically the receiving side ends up rejecting
all but the first copy as duplicates)
Furthermore current determine_inbox logic also actually needs up
forcing personal inboxes for follower-only posts, unless they
additionally explicitly address at least one specific actor.
This is even much wasteful and directly contradicts
the explicit intent of the spec.
There’s one part where the use of sharedInbox falls apart,
namely spec-compliant bcc and bto addressing. AP spec requires
bcc/bto fields to be stripped before delivery and then implicitly
reconstructed by the receiver based on the addressed personal inbox.
In practice however, this addressing mode is almost unused. Neither of
the three implementations brought up above supports it and while *oma
does use bcc for list addressing, it does not use it in a spec-compliant
way and even copies same-host recipients into cc before delivery.
Messages with bcc addressing are handled in another function clause,
always force personal inboxes for every recipient and not affected by
this commit.
In theory it would be beneficial to use sharedInbox there too for all
but bcc recipients. But in practice list addressing has been broken for
quite some time already and is not actually exposed in any frontend,
as discussed in https://akkoma.dev/AkkomaGang/akkoma/issues/812.
Therefore any changes here have virtually no effect anyway
and all code concerning it may just be outright removed.
This allows discovering a page represents an ActivityPub object
and also where to find the underlying representation.
Other servers already implement this and some tools
came to rely or profit from it.
The alternate link is provided both with the "application/activity+json"
format as used by Mastodon and the standard-compliant media type.
Just like the feed provider, ActivityPub links are always enabled
unless access to local posts is restricted and not configurable.
The commit is based on earlier work by Charlotte 🦝 Deleńkec
but with fixes and some tweaks.
Co-authored-by: Charlotte 🦝 Deleńkec <lotte@chir.rs>
:discard marks jobs as "discarded", i.e. jobs which permanently failed
due to e.g. exhausting all retries or explicitly being discared due to a
fatal error.
:cancel marks jobs as "cancelled" which does not imply failure.
While neither method counts as a job "exception" in the set of
telemetries we currently export via Prometheus, the different state
is visible in the (not-exported) metadata of oban job telemetry.
We can use handlers of those events to build bespoke statistics.
Ideally we'd like to distinguish in the receiver worker between
"invalid" and "already present or delete of unknown" documents,
but this is cumbersome to get get right with a list of
free-form, human-readable descriptions oof the violated constraints.
For now, just count both as an fatal error.
# but that is cumbersome to get right with a list of string error descriptions
CUrrently internal actors are supposed to be identified in the database
by either a NULL nickname or a nickname prefixed by "internal.". For old
installations this is true, but only if they were created over five
years ago before 70410dfafd.
Newer installations will use "relay" as the nickname of the realy actor
causing ii to be treated as a regular user.
In particular this means all installations in the last five years never
made use of the reduced endpoint case, thus it is dropped.
Simplify this distinction by properly marking internal actors asa an
Application type in the database. This was already implemented before by
ilja in https://akkoma.dev/AkkomaGang/akkoma/pulls/457 but accidentally
reverted during a translation update in
eba3cce77b. This commit effectively
restores this patch together with further changes.
Also service actors unconditionally expose follow* collections atm,
eventhough the internal fetch actor doesn't actually implement them.
Since they are optional per spec and with Mastodon omitting them too
for its instance actor proving the practical viability, we should just
omit them. The relay actor however should continue to expose such
collections and they are properly implemented here.
Here too we now just use the values or their absence in the database.
We do not have any other internal.* actors besides fetch atm.
Fixes: https://akkoma.dev/AkkomaGang/akkoma/issues/855
Co-authored-by: ilja space <git@ilja.space>
E.g. \*oma federates (most) follower-only posts multiple times
to each personal inbox. This commonly leads to race conditions
with jobs of several copies running at the same time and getting
past the initial "already known" check but then later all but
one will crash with an exception from the unique db index.
Since the only special thing we do with copies anyway is to discard them,
just don't create such duplicate jobs in the first place.
For the same reason and since failed jobs don't count towards
duplicates, this should have virtually no effect on federation.
Since we later only consider the Create activity for
access permission checks, but the semantically more
sensible set of fields are the object’s.
Changing the check itself to use the object may have unintended
consequences on already existing legacy posts as the old code
which processed it when it arrived may have never considered
effects on the objects addressing fields.
While the object itself has the expected adressing for an
"unlisted" post, we always use the Create activity’s
adressing fields for permission checks.
To avoid unintended effects on legacy objects
we will continue to use the activity for access perm checks,
but fix its addressing fields based on its object data.
Ref: https://git.pleroma.social/pleroma/pleroma/-/issues/3323
With the current strategy the individual
and cumulative backoff looks like this
(the + part denotes max extra random delay):
attempt backoff_single cumulative
1 16+30 16+30
2 47+60 63+90
3 243+90 ≈ 4min 321+180
4 1024+120 ≈17min 1360+300 ≈23+5min
5 3125+150 ≈20min 4500+450 ≈75+8min
6 7776+180 ≈ 2.1h 12291+630 ≈3.4h
7 16807+210 ≈ 4.6h 29113+840 ≈8h
8 32768+240 ≈ 9.1h 61896+1080 ≈17h
9 59049+270 ≈16.4h 120960+1350 ≈33h
10 100000+300 ≈27.7h 220975+1650 ≈61h
We default to 5 retries meaning the least backoff runs with attempt=4.
Therefore outgoing activiities might already be permanently dropped by a
downtime of only 23 minutes which doesn't seem too implausible to occur.
Furthermore it seems excessive to retry this quickly this often at the
beginning.
At the same time, we’d like to have at least one quick'ish retry to deal
with transient issues and maintain reasonable federation responsiveness.
If an admin wants to tolerate one -day downtime of remotes,
retries need to be almost doubled.
The new backoff strategy implemented in this commit instead
switches to an exponetial after a few initial attempts:
attempt backoff_single cumulative
1 16+30 16+30
2 143+60 159+90
3 2202+90 ≈37min 2361+180 ≈40min
4 8160+120 ≈ 2.3h 10521+300 ≈ 3h
5 77393+150 ≈21.5h 87914+450 ≈24h
Initial retries are still fast, but the same amount of retries
now allows a remote downtime of at least 40 minutes. Customising
the retry count to 5 allows for whole-day downtimes.
This was accidentally broken in c8e0f7848b
due to a one-letter mistake in the plug option name and an absence of
tests. Therefore it was once again possible to serve e.g. Javascript or
CSS payloads via uploads and emoji.
However due to other protections it was still NOT possible for anyone to
serve any payload with an ActivityPub Content-Type. With the CSP policy
hardening from previous JS payload exloits predating the Content-Type
sanitisation, there is currently no known way of abusing this weakened
Content-Type sanitisation, but should be fixed regardless.
This commit fixes the option name and adds tests to ensure
such a regression doesn't occur again in the future.
Reported-by: Lain Soykaf <lain@lain.com>
When note editing support was added, it was omitted to strip internal
fields from edited notes and their history.
This was uncovered due to Mastodon inlining the like count as a "likes"
collection conflicting with our internal "likes" list causing validation
failures. In a spot check with likes/like_count it was not possible to
inject those internal fields into the local db via Update, but this
was not extensively tested for all fields and avenues.
Similarly address normalisation did not normalise addressing in the
object history, although this was never at risk of being exploitable.
The revision history of the Pleroma MR adding edit support reveals
recusrive stripping was intentionally avoided, since it will end up
removing e.g. emoji from outgoing activities. This appears to still
be true. However, all current internal fields ("pleroma_interal"
appears to be unused) contain data already publicised otherwise anyway.
In the interest of fixing a federation bug (and at worst potential data
injection) quickly outgoing stripping is left non-recursive for now.
Of course the ultimate fix here is to not mix remote and internal data
into the same map in the first place, but unfortunately having a single
map of all truth is a core assumption of *oma's AP doc processing.
Changing this is a masive undertaking and not suitable for providing
a short-term fix.
We expect most requests to be made for the actual canonical ID,
so check this one first (starting without query headers matching the
predominant albeit spec-breaking version).
Also avoid unnecessary rerewrites of the digest header on each route
alias by just setting it once before iterating through aliases.
This matches behaviour prioir to the SigningKey migration
and the expected semantics of the http_signatures lib.
Additionally add a min interval paramter, to avoid
refetch floods on bugs causing incompatible signatures
(like e.g. currently with Bridgy)
User updates broke with the migration to separate signing keys
since user data carries signing keys but we didn't allow the
association data to be updated.
Previously there were mainly two attack vectors:
- for raw keys the owner <-> key mapping wasn't verified at all
- keys were retrieved with refetching allowed
and only the top-level ID was sanitised while
usually keys are but a subobject
This reintroduces public key checks in the user actor,
previously removed in 9728e2f8f7
but now adapted to account for the new mapping mechanism.
Notably at least two instances were not properly guarded from path
traversal attack before and are only now fixed by using SafeZip:
- frontend installation did never check for malicious paths.
But given a malicious froontend could already, e.g. steal
all user tokens even without this, in the real world
admins should only use frontends from trusted sources
and the practical implications are minimal
- the emoji pack update/upload API taking a ZIP file
did not protect against path traversal. While atm
only admins can use these emoji endpoints, emoji
packs are typically considered "harmless" and used
without prior verification from various sources.
Thus this appears more concerning.