7 Commits

Author SHA1 Message Date
Andrey Antukh
405a73e8ba
Add climit impl and config for file snapshot methods (#9722)
*  Add dedicated concurrency limit for restore-file-snapshot

This adds a dedicated climit configuration for the restore-file-snapshot
RPC method with :permits 1 per profile (plus queue of 2 and 60s timeout)
and a global limit of 3. Previously the method only used the generic
root/by-profile and root/global limits, allowing up to 7 concurrent
restore operations per profile which caused database row lock contention
on FOR UPDATE and connection pool exhaustion.

*  Skip locking on restore! to avoid blocking other operations

Changes the row lock acquisition in restore! from a blocking FOR UPDATE
to FOR UPDATE SKIP LOCKED. If the file row is already locked by another
concurrent operation (e.g., another restore or an update-file), the query
returns no rows and the caller fails fast with a clear conflict error
instead of blocking indefinitely holding a database connection.

*  Add queue and timeout limits to root/by-profile concurrency limit

Previously root/by-profile had no queue limit (unbounded Integer/MAX_VALUE)
and no timeout, allowing requests to pile up indefinitely behind a profile
whose permits were exhausted by long-running operations. This could lead
to memory pressure and cascading failures. Now limited to 30 queued
requests with a 30-second timeout so excess requests fail fast.

*  Move backup snapshot creation outside restore transaction

The backup snapshot (fsnap/create!) is now created in its own short-lived
connection before the actual restore transaction begins. This ensures the
backup is persisted independently of the restore outcome and reduces the
restore transaction window.

The restore itself runs inside a db/tx-run! block with an optimistic
locking check: it reads the file with FOR UPDATE and compares its revn
against the value captured at backup time. If the file was edited
concurrently, the restore aborts with a conflict error to prevent data
loss.

Co-dependent with the SKIP LOCKED change in restore! — the FOR UPDATE
acquired here is in the same transaction as restore!, so the SKIP LOCKED
inside restore! correctly sees the row as unlocked (same transaction).

* ♻️ Remove unused private function get-minimal-file

The local get-minimal-file function in file_snapshots.clj is no longer
used since restore! switched to direct exec-one! with FOR UPDATE SKIP
LOCKED. The sql:get-minimal-file SQL constant is still used directly.

*  Add minor improvements on db connection management

* ♻️ Refactor create-file-snapshot to use explicit transaction management

Remove automatic transaction wrapping (`::db/transaction true`) and
pass `cfg` through the call chain instead of destructured `conn`.
Wrap `fsnap/create!` in an explicit `db/tx-run!` for clearer
transaction boundaries.

Signed-off-by: Andrey Antukh <niwi@niwi.nz>

*  Add dedicated concurrency limit for create-file-snapshot

This adds a dedicated climit configuration for the create-file-snapshot
RPC method with :permits 1 per profile (plus queue of 2 and 60s timeout)
and a global limit of 3. Previously the method only used the generic
root/by-profile and root/global limits, allowing up to 10 concurrent
snapshot creation operations per profile which could cause database
contention and connection pool exhaustion.

Signed-off-by: Andrey Antukh <niwi@niwi.nz>

---------

Signed-off-by: Andrey Antukh <niwi@niwi.nz>
2026-05-19 14:30:44 +02:00
Andrey Antukh
a5c6d78ee5 ♻️ Fix some fundamental bugs on climit module
The climit previously of this commit is heavily used inside a
transactions, so in heavy contention operation such that file thumbnail
creation can cause a db pool exhaust.

This commit fixes this issue setting up a better resource limiting
mechanism that works outside the transactions so, contention will
no longer hold an open connection/transaction.

It also adds general improvement to the traceability to the climit
mechanism: it now properly logs the profile-id that is currently
cause some contention on specific resources.

It also add a general/root climit that is applied to all requests
so if someone start making abussive requests, we can clearly detect
it.
2024-02-01 17:37:49 +01:00
Andrey Antukh
54341d5b22 Make the RPC climit subsystem more robust 2023-11-27 14:25:12 +01:00
Andrey Antukh
da68eae7c6 Add proper climit configuration for file-thumbnails 2023-11-08 17:11:42 +01:00
Andrey Antukh
aafbf6bc15 ♻️ Refactor cocurrency model on backend
Mainly the followin changes:

- Pass majority of code to the old and plain synchronous style
  and start using virtual threads for the RPC (and partially some
  HTTP server middlewares).
- Make some improvements on how CLIMIT is handled, simplifying code
- Improve considerably performance reducing the reflection and
  unnecesary funcion calls on the whole stack-trace of an RPC call.
- Improve efficiency reducing considerably the total threads number.
2023-03-14 12:30:27 +01:00
Andrey Antukh
7f589b09ca ♻️ Move audit http handler to RPC 2022-12-13 16:17:31 +01:00
Andrey Antukh
37ad04d2a6 🎉 Add robust concurrency limiter for RPC 2022-11-07 10:05:56 +01:00