Commit Graph

126 Commits

Author SHA1 Message Date
Floatingghost 4d6fb43cbd No need to spawn() any more 2024-06-12 02:09:24 +01:00
Floatingghost ad52135bf5 Convert rich media backfill to oban task 2024-06-11 18:06:51 +01:00
Floatingghost 9c5feb81aa fix tests 2024-06-09 21:26:29 +01:00
Floatingghost 840c70c4fa remove prints 2024-06-09 18:52:09 +01:00
Floatingghost c65379afea attempt to fix some tests 2024-06-09 18:45:38 +01:00
Floatingghost 16bed0562d Fix tests 2024-06-09 18:28:00 +01:00
Mark Felder a801dd7b07 Fix module struct matching 2024-06-09 17:38:28 +01:00
Mark Felder 1e86da43f5 Credo 2024-06-09 17:38:24 +01:00
Mark Felder 411831458c Credo 2024-06-09 17:38:18 +01:00
Mark Felder 56463b2121 Fix compile warning
warning: "else" clauses will never match because all patterns in "with" will always match
  lib/pleroma/web/rich_media/parser/ttl/opengraph.ex:10
2024-06-09 17:38:12 +01:00
Mark Felder 4746f98851 Fix broken Rich Media parsing when the image URL is a relative path 2024-06-09 17:36:28 +01:00
Mark Felder 765c7e98d2 Respect the TTL returned in OpenGraph tags 2024-06-09 17:36:15 +01:00
Mark Felder bfe4152385 Increase the :max_body for Rich Media to 5MB
Websites are increasingly getting more bloated with tricks like inlining content (e.g., CNN.com) which puts pages at or above 5MB. This value may still be too low.
2024-06-09 17:34:29 +01:00
Mark Felder 5da9cbd8a5 RichMedia refactor
Rich Media parsing was previously handled on-demand with a 2 second HTTP request timeout and retained only in Cachex. Every time a Pleroma instance is restarted it will have to request and parse the data for each status with a URL detected. When fetching a batch of statuses they were processed in parallel to attempt to keep the maximum latency at 2 seconds, but often resulted in a timeline appearing to hang during loading due to a URL that could not be successfully reached. URLs which had images links that expire (Amazon AWS) were parsed and inserted with a TTL to ensure the image link would not break.

Rich Media data is now cached in the database and fetched asynchronously. Cachex is used as a read-through cache. When the data becomes available we stream an update to the clients. If the result is returned quickly the experience is almost seamless. Activities were already processed for their Rich Media data during ingestion to warm the cache, so users should not normally encounter the asynchronous loading of the Rich Media data.

Implementation notes:

- The async worker is a Task with a globally unique process name to prevent duplicate processing of the same URL
- The Task will attempt to fetch the data 3 times with increasing sleep time between attempts
- The HTTP request obeys the default HTTP request timeout value instead of 2 seconds
- URLs that cannot be successfully parsed due to an unexpected error receives a negative cache entry for 15 minutes
- URLs that fail with an expected error will receive a negative cache with no TTL
- Activities that have no detected URLs insert a nil value in the Cachex :scrubber_cache so we do not repeat parsing the object content with Floki every time the activity is rendered
- Expiring image URLs are handled with an Oban job
- There is no automatic cleanup of the Rich Media data in the database, but it is safe to delete at any time
- The post draft/preview feature makes the URL processing synchronous so the rendered post preview will have an accurate rendering

Overall performance of timelines and creating new posts which contain URLs is greatly improved.
2024-06-09 17:33:48 +01:00
FloatingGhost 98cb255d12 Support elixir1.15
OTP builds to 1.15

Changelog entry

Ensure policies are fully loaded

Fix :warn

use main branch for linkify

Fix warn in tests

Migrations for phoenix 1.17

Revert "Migrations for phoenix 1.17"

This reverts commit 6a3b2f15b74ea5e33150529385215b7a531f3999.

Oban upgrade

Add default empty whitelist

mix format

limit test to amd64

OTP 26 tests for 1.15

use OTP_VERSION tag

baka

just 1.15

Massive deps update

Update locale, deps

Mix format

shell????

multiline???

?

max cases 1

use assert_recieve

don't put_env in async tests

don't async conn/fs tests

mix format

FIx some uploader issues

Fix tests
2023-08-03 17:44:09 +01:00
Haelwenn (lanodan) Monnier 70b0f93865 Apply oembed patch 2023-05-26 20:45:57 +01:00
FloatingGhost bf7ff6a337 Put rich media processing in a Task 2022-12-30 20:11:53 +00:00
FloatingGhost 9d9c26b833 Ensure Gun is Gone 2022-12-11 19:26:21 +00:00
floatingghost dc9f66749c remove all endpoints marked as deprecated (#91)
Reviewed-on: https://akkoma.dev/AkkomaGang/akkoma/pulls/91
2022-07-20 12:00:58 +00:00
floatingghost 364b6969eb Use finch everywhere (#33)
Reviewed-on: https://akkoma.dev/AkkomaGang/akkoma/pulls/33
2022-07-04 16:30:38 +00:00
Haelwenn (lanodan) Monnier c4439c630f
Bump Copyright to 2021
grep -rl '# Copyright © .* Pleroma' * | xargs sed -i 's;Copyright © .* Pleroma .*;Copyright © 2017-2021 Pleroma Authors <https://pleroma.social/>;'
2021-01-13 07:49:50 +01:00
lain e1e7e4d379 Object: Rework how Object.normalize works
Now it defaults to not fetching, and the option is named.
2021-01-04 13:38:31 +01:00
lain 713612c377 Cachex: Make caching provider switchable at runtime.
Defaults to Cachex.
2020-12-18 17:44:46 +01:00
Alexander Strizhakov 8d218ebaf5
Moving some background jobs into simple tasks
- fetching activity data
- attachment prefetching
- using limiter to prevent overload
2020-11-11 13:39:49 +03:00
Alexander Strizhakov fc7151a9c4
more files renamings 2020-10-13 16:38:19 +03:00
Alexander Strizhakov 103f3dcb9e
rich media parser ttl files consistency 2020-10-13 16:38:15 +03:00
Mark Felder 8539e386c3 Add missing Copyright headers 2020-10-12 12:00:50 -05:00
Mark Felder ba7f9459b4 Revert Rich Media censorship for sensitive statuses
The #NSFW hashtag test was broken anyway.
2020-09-28 18:22:59 -05:00
rinpatch db80b9d630 RichMedia: Fix log spam on failures and resetting TTL on cached errors 2020-09-17 16:56:39 +03:00
rinpatch bb407edce4 RichMedia: fix a compilation error due to nonexistent variable
No idea why this passed Gitlab CI
2020-09-14 15:46:00 +03:00
rinpatch f70335002d RichMedia: Do a HEAD request to check content type/length
This shouldn't be too expensive, since the connections are pooled,
but it should save us some bandwidth since we won't fetch non-html
files and files that are too large for us to process (especially
since you can't cancel a request without closing the connection
with HTTP1).
2020-09-14 14:45:58 +03:00
rinpatch f66a15c4a5 RichMedia parser: do not set a cache TTL for unchanging errors 2020-09-14 14:44:25 +03:00
Alexander Strizhakov 696bf09433
passing adapter options directly without adapter key 2020-09-07 19:59:17 +03:00
Alexander Strizhakov a83916fdac
adapter options unification
not needed options deletion
2020-09-07 19:59:17 +03:00
lain fdab01ab56 Merge branch 'fix/rich-media-fake-statuses' into 'develop'
Rich Media: Do not cache URLs for preview statuses

Closes #1987

See merge request pleroma/pleroma!2956
2020-09-07 10:19:19 +00:00
rinpatch 170599c390 RichMedia: do not log webpages missing metadata as errors
Also fixes the return value of Parser.parse on errors, previously
was just `:ok` due to the logger call in the end
2020-09-05 22:05:35 +03:00
rinpatch e198ba492e Rich Media: Do not cache URLs for preview statuses
Closes #1987
2020-09-05 20:53:46 +03:00
rinpatch 19691389b9 Rich media: Add failure tracking 2020-09-02 14:59:52 +03:00
rinpatch 47ff425cfd Merge branch 'fix/2047-rich-media-parser' into 'develop'
RichMedia parser fix

Closes #2047

See merge request pleroma/pleroma!2941
2020-09-02 09:38:43 +00:00
Alexander Strizhakov 79f65b4374
correct pool and uniform headers format 2020-09-02 09:16:51 +03:00
Alexander Strizhakov 03d06062ab
don't fail on url fetch 2020-09-01 19:39:07 +03:00
Mark Felder 016d8d6c56 Consolidate construction of Rich Media Parser HTTP requests 2020-08-03 12:37:31 -05:00
lain 781b270863 ChatMessageReferenceView: Display preview cards. 2020-07-30 19:57:26 +02:00
lain 5b1eeb06d8 Revert "Merge branch 'revert-2b5d9eb1' into 'develop'"
This reverts merge request !2784
2020-07-21 22:18:17 +00:00
lain 696c13ce54 Revert "Merge branch 'linkify' into 'develop'"
This reverts merge request !2677
2020-07-21 22:17:34 +00:00
Alex Gleason 38425ebdbf
Merge remote-tracking branch 'upstream/develop' into linkify 2020-07-16 14:51:36 -05:00
Mark Felder 18438a9bf0 Add "Bot" to User Agent to coerce Twitter into serving OGP <meta> tags. 2020-07-07 11:21:05 -05:00
Alex Gleason 8daacc9114
AutoLinker --> Linkify, update to latest version
https://git.pleroma.social/pleroma/elixir-libraries/linkify
2020-06-30 16:39:15 -05:00
Egor Kislitsyn 58e4e3db8b
Merge remote-tracking branch 'origin/develop' into merge-ogp-twitter-parsers 2020-06-15 16:03:40 +04:00
Egor Kislitsyn 520367d6fd Fix atom leak in Rich Media Parser 2020-06-13 12:08:46 +03:00