Commit graph

15939 commits

Author SHA1 Message Date
Oneric
bc79bd0edf cosmetic/test/user: replace deprecated clear_config syntax 2025-02-14 22:10:25 +01:00
Oneric
ee61ce61a7 changelog: summarise preceeding changes 2025-02-14 22:10:25 +01:00
Oneric
8a0d130976 Add tests for SigninKey module 2025-02-14 22:10:25 +01:00
Oneric
898b98e5dd db: drop legacy key fields in users table 2025-02-14 22:10:25 +01:00
Oneric
ea2de1f28a signing_key: ensure only one key per user exists
Fixes: AkkomaGang/akkoma issue 858
2025-02-14 22:10:25 +01:00
Oneric
2a4587f201 Fix SigningKey db schema 2025-02-14 22:10:25 +01:00
Oneric
3460f41776 Fix user updates
User updates broke with the migration to separate signing keys
since user data carries signing keys but we didn't allow the
association data to be updated.
2025-02-14 22:10:25 +01:00
Oneric
cc5c1bb10c signing_key: cleanup code
In particular this avoids an unecessary roundtrip
over user_id when searching a key via its primary key_id
2025-02-14 22:10:25 +01:00
Oneric
70fe99d196 Prevent key-actor mapping poisoning and key take overs
Previously there were mainly two attack vectors:
 - for raw keys the owner <-> key mapping wasn't verified at all
 - keys were retrieved with refetching allowed
   and only the top-level ID was sanitised while
   usually keys are but a subobject

This reintroduces public key checks in the user actor,
previously removed in 9728e2f8f7
but now adapted to account for the new mapping mechanism.
2025-02-14 22:10:25 +01:00
Oneric
366065c0f6 fetcher: split out core object fetch validation
To allow reuse for adapted key validation logic
2025-02-14 22:10:25 +01:00
Oneric
b5fa8c6d09 readme: drop mention of YunoHost package
It’s no longer listed in the catalogue and
the git repo wasn't updated in over a year
2025-02-14 22:10:25 +01:00
Oneric
d68a5f6c56 Protected against counterfeit local docs being posted
Only possible if actor keys leaked first
thus log with alert level
2025-02-14 22:10:25 +01:00
Oneric
4231345f4e cosmetic/emoji/pack: fix spelling
There might be further debate about "emoji" vs "emojis" for the plural
but a grep shows the latter is already widely used in our codebase.
2025-02-14 22:10:25 +01:00
Oneric
96fe080e6e Convert all raw :zip usage to SafeZip
Notably at least two instances were not properly guarded from path
traversal attack before and are only now fixed by using SafeZip:

 - frontend installation did never check for malicious paths.
   But given a malicious froontend could already, e.g. steal
   all user tokens even without this, in the real world
   admins should only use frontends from trusted sources
   and the practical implications are minimal

 - the emoji pack update/upload API taking a ZIP file
   did not protect against path traversal. While atm
   only admins can use these emoji endpoints, emoji
   packs are typically considered "harmless" and used
   without prior verification from various sources.
   Thus this appears more concerning.
2025-02-14 22:10:25 +01:00
Oneric
7151ef4718 Add SafeZip module
This will replace all the slightly different safety workarounds at
different ZIP handling sites and ensure safety is actually consistently
enforced everywhere while also making code cleaner and easiert to
follow.
2025-02-14 22:10:25 +01:00
Oneric
c8e0f7848b Migrate back to upstream Plug.Static
Commit a924e117fd bumped the
plug package to 1.16.0 which includes our upstream patch.

Resolves: https://akkoma.dev/AkkomaGang/akkoma/issues/734
2025-02-14 22:10:25 +01:00
Oneric
98c7e9534e Drop obsolete APNG mime override
Commit 9d2c558f64 bumped
to a mime package version including the upstream fix.
2025-02-14 22:10:25 +01:00
Oneric
1c2eb4d799 cosmetic/object: drop is_ prefix from is_tombstone_object?
The question mark suffix already implies it being an indicator function
2025-02-14 22:10:25 +01:00
Oneric
7998a00346 cosmetic/rich_media/parser: fix typo 2025-02-14 22:10:25 +01:00
floatingghost
4c41f8c286 Merge pull request 'Improve stat queries and ReceiverWorker logic' (#862) from Oneric/akkoma:perf_tweaks_stats+jobs into develop
Reviewed-on: https://akkoma.dev/AkkomaGang/akkoma/pulls/862
2025-02-14 19:22:35 +00:00
Oneric
f0a99b4595 article_note_validator: fix handling of Mastodon-style replies collections
The first collection page is (sometimes?) inlined
which caused crashes when attempting to log the fetch failure.
But there’s no need to fetch and we can treat it like the other inline collection
2025-02-14 18:49:51 +01:00
Oneric
a1c841a122 federation.md: list FEP-dc88 formatting mathematics
Implemented by https://akkoma.dev/AkkomaGang/akkoma/pulls/642
2025-02-14 18:49:51 +01:00
Oneric
1b09b9fc22 static_fe: fix HTML quotation for upload alt text
Reported by riley on IRC
2025-02-14 18:49:51 +01:00
Oneric
46148c0825 Don't return garbage on failed collection fetches
And for now treat partial fetches as a success, since for all
current users partial collection data is better than no data at all.

If an error occurred while fetching a page, this previously
returned a bogus {:ok, {:error, _}} success, causing the error
to be attached to the object as an reply list subsequently
leading to the whole post getting rejected during validation.

Also the pinned collection caller did not actually handle
the preexisting error case resulting in process crashes.
2025-02-14 18:49:51 +01:00
Oneric
4701aa2a38 receiver_worker: log processes crashes
Oban cataches crashes to handle job failure and retry,
thus it never bubbles up all the way and nothing is logged by default.
For better debugging, catch and log any crashes.
2025-02-14 18:46:19 +01:00
Oneric
8fa51700d4 changelog: summarise preceeding perf tweaks 2025-01-07 20:27:28 +01:00
Oneric
2ddff7e386 transmogrifier: gracefully ignore Delete of unknown objects
It's quite common to receive spurious Deletes,
so we neither want to waste resources on retrying
nor spam "invalid AP" logs
2025-01-07 20:27:28 +01:00
Oneric
cd8e6a4235 transmogrifier: gracefully ignore duplicated object deletes
The object lookup is later repeated in the validator, but due to
caching shouldn't incur any noticeable performance impact.
It’s actually preferable to check here, since it avoids the otherwise
occuring user lookup and overhead from starting and aborting a
transaction
2025-01-07 20:27:28 +01:00
Oneric
ac2327c8fc transmogrfier: be more selective about Delete retry
If something else renders the Delete invalid,
there’s no point in retrying anyway
2025-01-07 20:27:28 +01:00
Oneric
92bf93a4f7 transmogrifier: avoid crashes on non-validation Delte errors
Happens e.g. for duplicated Deletes.
The remaining tombstone object no longer has an actor,
leading to an error response during side-effect handling.
2025-01-07 20:27:28 +01:00
Oneric
7ad5f8d3c0 object_validators: only query relevant table for object
Most of them actually only accept either activities or a
non-activity object later; querying both is then a waste
of resources and may create false positives.
2025-01-07 20:27:28 +01:00
Oneric
b0387dee14 Gracefully ignore Undo activities referring to unknown objects 2025-01-07 20:27:28 +01:00
Oneric
caa4fbe326 user: avoid database work on superfluous pin
The only thing this does is changing the updated_at field of the user.
Afaict this came to be because prior to pins federating this was split
into two functions, one of which created a changeset, the other applying
a given changeset. When this was merged the bits were just copied into
place.
2025-01-07 20:27:28 +01:00
Oneric
09736431e0 Don't spam logs about deleted users
User.get_or_fetch_by_(apid|nickname) are the only external users of fetch_and_prepare_user_from_ap_id,
thus there’s no point in duplicating logging, expecially not at error level.
Currently (duplicated) _not_found errors for users make up the bulk of my logs
and are created almost every second. Deleted users are a common occurence and not
worth logging outside of debug
2025-01-07 20:27:28 +01:00
Oneric
bcf3e101f6 rich_media: lower log level of update 2025-01-07 20:27:28 +01:00
Oneric
05bbdbf388 nodeinfo: lower log level of regular actions to debug 2025-01-07 20:27:28 +01:00
Oneric
2c75600532 federation/incoming: improve link_resolve retry decision
To facilitate this ObjectValidator.fetch_actor_and_object is adapted to
return an informative error. Otherwise we’d be unable to make an
informed decision on retrying or not later. There’s no point in
retrying to fetch MRF-blocked stuff or private posts for example.
2025-01-07 20:27:28 +01:00
Oneric
0cd4040db6 Error out earlier on missing mandatory reference
This is the only user of fetch_actor_and_object which previously just
always preteneded to be successful. For all the activity types handled
here, we absolutely need the referenced object to be able to process it
(other than Announce whether or not processing those activity types for
unknown remote objects is desirable in the first place is up for debate)

All other users of the similar fetch_actor already properly check success.

Note, this currently lumps all reolv failure reasons together,
so even e.g. boosts of MRF rejected posts will still exhaust all
retries. The following commit improves on this.
2025-01-07 20:27:28 +01:00
Oneric
0ba5c3649d federator: don't nest {:error, _} tuples
It makes decisions based on error sources harder since all possible
nesting levels need to be checked for. As shown by the return values
handled in the receiver worker something else still nests those,
but this is a first start.
2025-01-07 20:27:28 +01:00
Oneric
8e5defe6ca stats: estimate remote user count
This value is currently only used by Prometheus metrics
but (after optimisng the peer query inthe preceeding commit)
the most costly part of instance stats.
2025-01-07 20:27:28 +01:00
Oneric
138b1aea2f stats: use cheaper peers query
This query is one of the top cost offenders during an instances
lifetime. For small instances it was shown to take up 30-50% percent of
the total database query time, while for bigger isntaces it still held
a spot in the top 3 — alost as or even more expensive overall than
timeline queries!

The good news is, there’s a cheaper way using the instance table:
no need to process each entry, no need to filter NULLs
and no need to dedupe. EXPLAIN estimates the cost of the
old query as 13272.39 and the cost of the new query as 395.74
for me; i.e. a 33-fold reduction.

Results can slightly differ. E.g. we might have an old user
predating the instance tables existence and no interaction with since
or no instance table entry due to failure to query nodeinfo.
Conversely, we might have an instance entry but all known users got
deleted since.
However, this seems unproblematic in practice
and well worth the perf improvment.

Given the previous query didn’t exclude unreachable instances
neither does the new query.
2025-01-07 20:27:28 +01:00
Oneric
8b5183cb74 stats: fix stat spec 2025-01-07 20:27:28 +01:00
Oneric
cbb0d4b0a8 receiver_worker: log unecpected errors
This can't handle process crash errors
but i hope those get a stacktrace logged by default
2025-01-07 20:27:28 +01:00
Oneric
be2c857845 receiver_worker: don't reattempt invalid documents
Ideally we’d like to split this up more and count most invalid documents
as an error, but silently drop e.g. Deletes for unknown objects.
However, this is hard to extract from the changeset and jobs canceled
with :discard don’t count as exceptions and I’m not aware of a idiomatic
way to cancel further retries while retaining the exception status.

Thus at least keep a log, but since superfluous "Delete"s
seem kinda frequent, don't log at error, only info level.
2025-01-07 20:27:28 +01:00
Oneric
9f4d3a936f cosmetic/receiver_worker: reformat error cases
The next commit adds a multi-statement case
and then mix format will enforce this anyway
2025-01-07 20:27:28 +01:00
Oneric
f9724b5879 Don’t reattempt insertion of already known objects
Might happen if we receive e.g. a Like before the Note arrives
in our inbox and we thus already queried the Note ourselves.
2025-01-07 20:27:27 +01:00
Oneric
041dedb86e Don't reattempt RichMediaBackfill by default
Retrying seems unlikely to be helpful:
 - if it timed out, chances are the short delay before reattempting
   won't give the remote enough time to recover from its outage and
   a longer delay makes the job pointless as users likely scrolled
   further already. (Likely this is already be the case after the
   first 20s timeout)
 - if the remote data is so borked we couldn't even parse it far enough
   for an "invalid metadata" error, chances are it will remain borked
   upon reattempt
2025-01-07 20:27:27 +01:00
Oneric
280652651c rich_media: don't reattempt parsing on rejected URLs 2025-01-07 20:27:27 +01:00
Oneric
92544e8f99 Don't enqueue a plethora of unnecessary NodeInfoFetcher jobs
There were two issues leading to needles effort:
Most importnatly, the use of AP IDs as "source_url" meant multiple
simultaneous jobs got scheduled for the same instance even with the
default unique settings.
Also jobs were scheduled uncontionally for each processed AP object
meaning we incured oberhead from managing Oban jobs even if we knew it
wasn't necessary. By comparison the single query to check if an update
is needed should be cheaper overall.
2025-01-07 20:27:27 +01:00
Oneric
d283ac52c3 Don't create noop SearchIndexingWorker jobs for passive index 2025-01-07 20:27:27 +01:00