Commit graph

21 commits

Author SHA1 Message Date
Oneric
43d4716b5a telemetry: expose count of currently pending jobs per queue
Split into scheduled (intentionally delayed until a later trigger date)
and available (eligible for immediate processing but did not yet start).
This will help in diagnosing overloaded instances or too-low queue
limits as well as expose configuration mishaps like
https://akkoma.dev/AkkomaGang/akkoma/issues/924.
(The latter by violently crashing the telemetry poller process while
attempting put_in for a non-configured queue creating well visible logs)
2025-10-03 00:00:00 +00:00
Oneric
b1e5dda26a telemetry: reduce polling frequency for periodic measurements
Instancce stats are cached only renewed every 5 minutes anyway
and IO stats are cumulative over the entire runtime so no info
is lost.
Polling those every 10s is wasteful and the next commit will add a
periodic measurement which is (comparetively) more costly to compute.
2025-09-21 00:00:00 +00:00
Oneric
7f9823258a metrics: adjust router and db buckets
Most HTTPS requests actually fall into the single-digit millisecond
range or below on average. Even the more costly endpoints almost always
average around the lower third of the millisecond magnitude.
Only endpoints doing synchronous remote HTTP fetches (e.g. for signing
keys) occasionally spike into the order of seconds.
As is, the bucket resolution is completely unfit to reason about
anything and even just averages are better indications.

Most database queries take less than a millisecond and even in total
almost all take less than 50ms for me. Decode time is but a tiny
fraction of that and queue time usually only takes a small part of total
time too (but may spike on high load).

Shift the buckets down to be able to
give insight into all relevant cases.
In particular this allows to determine whether high averages
are the result of generally high processing times or just a few
outliers lifting the whole average up (e.g. slow network fetches).

Exact numbers are biased towards my setup for lack of other comparison
data, but at least the order of magnitude should be ok everywhere.
2025-08-24 00:00:00 +00:00
Oneric
da4d642945 telemetry: log which states get mapped to 'unknown' 2025-08-19 00:00:00 +00:00
Oneric
1f6f5edf85 telemetry: expose stats about failed deliveries
And also log about it which we so far didn't do
2025-04-15 19:41:12 +02:00
Oneric
fb3de8045a Expose Port IO stats via Prometheus 2025-01-27 20:00:30 +01:00
timorl
59d32c10d9
Formatting 2024-04-16 08:02:13 +02:00
Oneric
29f564f700 Use fallbacks of summary metrics for prometheus 2024-02-12 02:00:09 +01:00
Oneric
16197ff57a Display memory as MB in live dashboard
With kilobyte the resulting numbers got too large and were cut off
in the charts, making them useless. However, even an idle Akkoma
server’s memory usage is in the lower hundreths of megabytes, so
we don’t need this much precision to begin with for the dashboard.

Other metric users might prefer base units and can handle scaling in a
smarter way, so keep this configurable.
2024-02-12 02:00:09 +01:00
Oneric
a6df71eebb Don't add summary metrics to prometheus
The exporter doesn’t support them thus we don't lose anything by this,
but it avoids a bunch of warnings each time the server starts up.
2024-02-12 01:59:18 +01:00
FloatingGhost
6e646c4cbc Use a genserver to periodically fetch metrics
Ref https://github.com/beam-telemetry/telemetry_metrics_prometheus_core/issues/52
2023-01-01 18:32:14 +00:00
FloatingGhost
b91e671c0d add remote user count for the heck of it 2022-12-16 17:22:26 +00:00
FloatingGhost
c2054f82ab allow users with admin:metrics to read app metrics 2022-12-16 03:32:51 +00:00
FloatingGhost
e2320f870e Add prometheus metrics to router 2022-12-15 02:02:07 +00:00
Tim Buchwaldt
29584197bb Measure stats-data 2022-12-15 01:04:56 +00:00
Tim Buchwaldt
63be819661 Take tesla telemetry 2022-12-15 01:04:56 +00:00
Tim Buchwaldt
0995fa1410 Track oban failures 2022-12-15 01:04:56 +00:00
Tim Buchwaldt
f8d3383179 Fix oban tags 2022-12-15 01:04:56 +00:00
Tim Buchwaldt
a06bb694c1 Listen to loopback 2022-12-15 01:04:56 +00:00
Tim Buchwaldt
1e9c2cd8ef Fix buckets for query timing 2022-12-15 01:04:56 +00:00
Tim Buchwaldt
33243c56e5 Start adding telemetry 2022-12-15 01:04:55 +00:00