akkoma

Author	SHA1	Message	Date
Oneric	f576807f1b	worker/receiver: don't retry unsupported actions Observed for e.g. user delete Undos and Bite activities	2025-05-09 22:29:49 +02:00
Oneric	8cdfbf872d	Merge pull request 'federation/out: tweak publish retry backoff' (#884 ) from Oneric/akkoma:publish_backoff into develop Reviewed-on: https://akkoma.dev/AkkomaGang/akkoma/pulls/884	2025-05-09 20:12:56 +00:00
Oneric	13940a558a	Merge pull request 'Expose stats about finally failed AP deliveries in prometheus' (#882 ) from Oneric/akkoma:telemetry-failed-deliveries into develop Reviewed-on: https://akkoma.dev/AkkomaGang/akkoma/pulls/882	2025-05-09 20:12:01 +00:00
Oneric	d6f5f4db18	Merge pull request 'receiver_worker: prevent duplicate jobs' (#886 ) from Oneric/akkoma:receive_dedupe into develop Reviewed-on: https://akkoma.dev/AkkomaGang/akkoma/pulls/886	2025-05-09 19:13:14 +00:00
Oneric	0d38385d6f	publisher: don't mangle between string and atom Oban jobs only can have string args and there’s no reason to insist on atoms here. Plus this used unchecked string_to_atom	2025-05-06 17:38:18 +02:00
Oneric	2fee79e1f5	Use apropriate cancellation type for oban jobs :discard marks jobs as "discarded", i.e. jobs which permanently failed due to e.g. exhausting all retries or explicitly being discared due to a fatal error. :cancel marks jobs as "cancelled" which does not imply failure. While neither method counts as a job "exception" in the set of telemetries we currently export via Prometheus, the different state is visible in the (not-exported) metadata of oban job telemetry. We can use handlers of those events to build bespoke statistics. Ideally we'd like to distinguish in the receiver worker between "invalid" and "already present or delete of unknown" documents, but this is cumbersome to get get right with a list of free-form, human-readable descriptions oof the violated constraints. For now, just count both as an fatal error. # but that is cumbersome to get right with a list of string error descriptions	2025-04-15 19:40:26 +02:00
Oneric	195042bdc9	receiver_worker: prevent duplicate jobs E.g. \*oma federates (most) follower-only posts multiple times to each personal inbox. This commonly leads to race conditions with jobs of several copies running at the same time and getting past the initial "already known" check but then later all but one will crash with an exception from the unique db index. Since the only special thing we do with copies anyway is to discard them, just don't create such duplicate jobs in the first place. For the same reason and since failed jobs don't count towards duplicates, this should have virtually no effect on federation.	2025-03-18 03:46:33 +01:00
Oneric	4011d20dbe	federation/out: tweak publish retry backoff With the current strategy the individual and cumulative backoff looks like this (the + part denotes max extra random delay): attempt backoff_single cumulative 1 16+30 16+30 2 47+60 63+90 3 243+90 ≈ 4min 321+180 4 1024+120 ≈17min 1360+300 ≈23+5min 5 3125+150 ≈20min 4500+450 ≈75+8min 6 7776+180 ≈ 2.1h 12291+630 ≈3.4h 7 16807+210 ≈ 4.6h 29113+840 ≈8h 8 32768+240 ≈ 9.1h 61896+1080 ≈17h 9 59049+270 ≈16.4h 120960+1350 ≈33h 10 100000+300 ≈27.7h 220975+1650 ≈61h We default to 5 retries meaning the least backoff runs with attempt=4. Therefore outgoing activiities might already be permanently dropped by a downtime of only 23 minutes which doesn't seem too implausible to occur. Furthermore it seems excessive to retry this quickly this often at the beginning. At the same time, we’d like to have at least one quick'ish retry to deal with transient issues and maintain reasonable federation responsiveness. If an admin wants to tolerate one -day downtime of remotes, retries need to be almost doubled. The new backoff strategy implemented in this commit instead switches to an exponetial after a few initial attempts: attempt backoff_single cumulative 1 16+30 16+30 2 143+60 159+90 3 2202+90 ≈37min 2361+180 ≈40min 4 8160+120 ≈ 2.3h 10521+300 ≈ 3h 5 77393+150 ≈21.5h 87914+450 ≈24h Initial retries are still fast, but the same amount of retries now allows a remote downtime of at least 40 minutes. Customising the retry count to 5 allows for whole-day downtimes.	2025-03-17 19:37:54 +01:00
Floatingghost	f176294d6d	elixir 1.18 formatting	2025-03-02 11:54:00 +00:00
Oneric	4701aa2a38	receiver_worker: log processes crashes Oban cataches crashes to handle job failure and retry, thus it never bubbles up all the way and nothing is logged by default. For better debugging, catch and log any crashes.	2025-02-14 18:46:19 +01:00
Oneric	2c75600532	federation/incoming: improve link_resolve retry decision To facilitate this ObjectValidator.fetch_actor_and_object is adapted to return an informative error. Otherwise we’d be unable to make an informed decision on retrying or not later. There’s no point in retrying to fetch MRF-blocked stuff or private posts for example.	2025-01-07 20:27:28 +01:00
Oneric	0cd4040db6	Error out earlier on missing mandatory reference This is the only user of fetch_actor_and_object which previously just always preteneded to be successful. For all the activity types handled here, we absolutely need the referenced object to be able to process it (other than Announce whether or not processing those activity types for unknown remote objects is desirable in the first place is up for debate) All other users of the similar fetch_actor already properly check success. Note, this currently lumps all reolv failure reasons together, so even e.g. boosts of MRF rejected posts will still exhaust all retries. The following commit improves on this.	2025-01-07 20:27:28 +01:00
Oneric	0ba5c3649d	federator: don't nest {:error, _} tuples It makes decisions based on error sources harder since all possible nesting levels need to be checked for. As shown by the return values handled in the receiver worker something else still nests those, but this is a first start.	2025-01-07 20:27:28 +01:00
Oneric	cbb0d4b0a8	receiver_worker: log unecpected errors This can't handle process crash errors but i hope those get a stacktrace logged by default	2025-01-07 20:27:28 +01:00
Oneric	be2c857845	receiver_worker: don't reattempt invalid documents Ideally we’d like to split this up more and count most invalid documents as an error, but silently drop e.g. Deletes for unknown objects. However, this is hard to extract from the changeset and jobs canceled with :discard don’t count as exceptions and I’m not aware of a idiomatic way to cancel further retries while retaining the exception status. Thus at least keep a log, but since superfluous "Delete"s seem kinda frequent, don't log at error, only info level.	2025-01-07 20:27:28 +01:00
Oneric	9f4d3a936f	cosmetic/receiver_worker: reformat error cases The next commit adds a multi-statement case and then mix format will enforce this anyway	2025-01-07 20:27:28 +01:00
Oneric	f9724b5879	Don’t reattempt insertion of already known objects Might happen if we receive e.g. a Like before the Note arrives in our inbox and we thus already queried the Note ourselves.	2025-01-07 20:27:27 +01:00
Oneric	92544e8f99	Don't enqueue a plethora of unnecessary NodeInfoFetcher jobs There were two issues leading to needles effort: Most importnatly, the use of AP IDs as "source_url" meant multiple simultaneous jobs got scheduled for the same instance even with the default unique settings. Also jobs were scheduled uncontionally for each processed AP object meaning we incured oberhead from managing Oban jobs even if we knew it wasn't necessary. By comparison the single query to check if an update is needed should be cheaper overall.	2025-01-07 20:27:27 +01:00
Oneric	d283ac52c3	Don't create noop SearchIndexingWorker jobs for passive index	2025-01-07 20:27:27 +01:00
Oneric	ed4019e7a3	workers: make custom filtering ahead of enqueue possible	2025-01-07 20:27:27 +01:00
Haelwenn (lanodan) Monnier	c17681ae1e	Purge obsolete ap_enabled indicator It was used to migrate OStatus connections to ActivityPub if possible, but support for OStatus was long since dropped, all new actors always AP and if anything wasn't migrated before, their instance is already marked as unreachable anyway. The associated logic was also buggy in several ways and deleted users got set to ap_enabled=false also causing some issues. This patch is a pretty direct port of the original Pleroma MR; follow-up commits will further fix and clean up remaining issues. Changes made (other than trivial merge conflict resolutions): - converted CHANGELOG format - adapted migration id for Akkoma’s timeline - removed ap_enabled from additional tests Ported-from: https://git.pleroma.social/pleroma/pleroma/-/merge_requests/3880	2025-01-07 20:27:26 +01:00
Oneric	e8bf4422ff	Delay attachment deletion Otherwise attachments have a high chance to disappear with akkoma-fe’s “delete & redraft” feature when cleanup is enabled in the backend. Since we don't know whether a deletion was intended to be part of a redraft process or even if whether the redraft was abandoned we still have to delete attachments eventually. A thirty minute delay should provide sufficient time for redrafting. Fixes: https://akkoma.dev/AkkomaGang/akkoma/issues/775	2025-01-03 20:49:11 +01:00
Oneric	bcfbfbcff5	Don't try to cleanup remote attachments The cleanup attachment worker was run for every deleted post, even if it’s a remote post whose attachments we don't even store. This was especially bad due to attachment cleanup involving a particularly heavy query wasting a bunch of database perf for nil. This was uncovered by comparing statistics from https://akkoma.dev/AkkomaGang/akkoma/issues/784 and https://akkoma.dev/AkkomaGang/akkoma/issues/765#issuecomment-12256	2025-01-03 20:48:46 +01:00
Mark Felder	5da9cbd8a5	RichMedia refactor Rich Media parsing was previously handled on-demand with a 2 second HTTP request timeout and retained only in Cachex. Every time a Pleroma instance is restarted it will have to request and parse the data for each status with a URL detected. When fetching a batch of statuses they were processed in parallel to attempt to keep the maximum latency at 2 seconds, but often resulted in a timeline appearing to hang during loading due to a URL that could not be successfully reached. URLs which had images links that expire (Amazon AWS) were parsed and inserted with a TTL to ensure the image link would not break. Rich Media data is now cached in the database and fetched asynchronously. Cachex is used as a read-through cache. When the data becomes available we stream an update to the clients. If the result is returned quickly the experience is almost seamless. Activities were already processed for their Rich Media data during ingestion to warm the cache, so users should not normally encounter the asynchronous loading of the Rich Media data. Implementation notes: - The async worker is a Task with a globally unique process name to prevent duplicate processing of the same URL - The Task will attempt to fetch the data 3 times with increasing sleep time between attempts - The HTTP request obeys the default HTTP request timeout value instead of 2 seconds - URLs that cannot be successfully parsed due to an unexpected error receives a negative cache entry for 15 minutes - URLs that fail with an expected error will receive a negative cache with no TTL - Activities that have no detected URLs insert a nil value in the Cachex :scrubber_cache so we do not repeat parsing the object content with Floki every time the activity is rendered - Expiring image URLs are handled with an Oban job - There is no automatic cleanup of the Rich Media data in the database, but it is safe to delete at any time - The post draft/preview feature makes the URL processing synchronous so the rendered post preview will have an accurate rendering Overall performance of timelines and creating new posts which contain URLs is greatly improved.	2024-06-09 17:33:48 +01:00
Haelwenn (lanodan) Monnier	0c2f200b4d	ReceiverWorker: Make sure non-{:ok, _} is returned as {:error, …} Otherwise an error like `{:signature, {:error, {:error, :not_found}}}` ends up considered a success. Cherry-picked-from: `a299ddb10e`	2024-04-21 20:58:06 +02:00
Floatingghost	370576474c	only consider :op and :id args in duplicate checks	2024-04-19 11:39:27 +01:00
Floatingghost	d2cee15c15	mix format says no	2024-04-16 03:07:28 +01:00
Floatingghost	d70fa16383	oban options should be a keyword list	2024-04-16 02:58:50 +01:00
Floatingghost	5043571084	Enable oban job uniqueness by default just prevent job floods with a 1-seconds uniqueness check, but override in RemoteFetcherWorker for 5 minute uniqueness check over all states :infinity is an option we can go for maybe at some point, but that would prevent any refetches so maybe not idk.	2024-04-16 02:53:24 +01:00
Floatingghost	b7dd739de1	Make sure we return the right format for oban	2024-04-16 02:35:21 +01:00
Floatingghost	2fc25980d1	fix pattern matching in fetch errors	2024-04-13 23:55:26 +01:00
Mark Felder	2e369aef71	Allow the Remote Fetcher to attempt fetching an unreachable instance	2024-04-12 20:33:21 +01:00
Mark Felder	fed7a78c77	Oban jobs should be discarded on permanent errors	2024-04-12 20:33:17 +01:00
Mark Felder	ff515c05c3	Prevent requeuing Remote Fetcher jobs that exceed thread depth	2024-04-12 20:32:31 +01:00
Mark Felder	7e5004b3e2	Leverage existing atoms as return errors for the object fetcher	2024-04-12 20:32:13 +01:00
Mark Felder	e2b04fac5a	Skip remote fetch jobs for unreachable instances	2024-04-12 20:28:36 +01:00
Mark Felder	6d368808d3	Remove mistaken duplicate fetch	2024-04-12 20:28:31 +01:00
Mark Felder	132036f951	Cancel remote fetch jobs for deleted objects	2024-04-12 20:28:21 +01:00
Mark Felder	4c29366fe5	Mark instances as unreachable when returning a 403 from an object fetch This is a definite sign the instance is blocked and they are enforcing authorized_fetch	2024-04-12 20:27:33 +01:00
Oneric	1a7839eaf2	Prune old Update activities Once processed they serve no purpose anymore afaict. Therefor, lets prune them like other transient activities to not unnecessarily bloat the table.	2024-02-17 16:57:40 +01:00
floatingghost	6b882a2c0b	Purge Rejected Follow requests in daily task (#334 ) Co-authored-by: FloatingGhost <hannah@coffee-and-dreams.uk> Reviewed-on: https://akkoma.dev/AkkomaGang/akkoma/pulls/334	2022-12-03 23:17:43 +00:00
floatingghost	db60640c5b	Fixing up deletes a bit (#327 ) Co-authored-by: FloatingGhost <hannah@coffee-and-dreams.uk> Reviewed-on: https://akkoma.dev/AkkomaGang/akkoma/pulls/327	2022-12-01 15:00:53 +00:00
FloatingGhost	ee7059c9cf	Spin off imports into n oban jobs	2022-11-27 21:45:41 +00:00
floatingghost	2a1f17e3ed	and i yoink (#275 ) Co-authored-by: Mark Felder <feld@feld.me> Co-authored-by: FloatingGhost <hannah@coffee-and-dreams.uk> Reviewed-on: https://akkoma.dev/AkkomaGang/akkoma/pulls/275	2022-11-14 15:07:26 +00:00
floatingghost	c1127e321b	Add configurable timeline per oban job (#273 ) Heavily inspired by https://git.pleroma.social/pleroma/pleroma/-/merge_requests/3777 Co-authored-by: FloatingGhost <hannah@coffee-and-dreams.uk> Reviewed-on: https://akkoma.dev/AkkomaGang/akkoma/pulls/273	2022-11-13 23:55:51 +00:00
floatingghost	b7e8ce2350	Scrape instance nodeinfo (#251 ) Co-authored-by: FloatingGhost <hannah@coffee-and-dreams.uk> Reviewed-on: https://akkoma.dev/AkkomaGang/akkoma/pulls/251	2022-11-06 22:49:39 +00:00
FloatingGhost	d3b9cfb03f	use :discard instead of cancel	2022-08-11 19:17:50 +01:00
floatingghost	1245141779	treat rejections in MRF as a reject in federator (#155 ) Reviewed-on: https://akkoma.dev/AkkomaGang/akkoma/pulls/155	2022-08-08 15:47:57 +00:00
Tusooa Zhu	f08241c8ab	Allow users to create backups without providing email address Ref: backup-without-email	2022-08-02 22:16:54 -04:00
Ekaterina Vaartis	7aebff799b	Fix meilisearch tests and jobs for oban	2022-06-29 20:49:45 +01:00

1 2 3 4

161 commits