Andrey Antukh 405a73e8ba
Add climit impl and config for file snapshot methods (#9722)
*  Add dedicated concurrency limit for restore-file-snapshot

This adds a dedicated climit configuration for the restore-file-snapshot
RPC method with :permits 1 per profile (plus queue of 2 and 60s timeout)
and a global limit of 3. Previously the method only used the generic
root/by-profile and root/global limits, allowing up to 7 concurrent
restore operations per profile which caused database row lock contention
on FOR UPDATE and connection pool exhaustion.

*  Skip locking on restore! to avoid blocking other operations

Changes the row lock acquisition in restore! from a blocking FOR UPDATE
to FOR UPDATE SKIP LOCKED. If the file row is already locked by another
concurrent operation (e.g., another restore or an update-file), the query
returns no rows and the caller fails fast with a clear conflict error
instead of blocking indefinitely holding a database connection.

*  Add queue and timeout limits to root/by-profile concurrency limit

Previously root/by-profile had no queue limit (unbounded Integer/MAX_VALUE)
and no timeout, allowing requests to pile up indefinitely behind a profile
whose permits were exhausted by long-running operations. This could lead
to memory pressure and cascading failures. Now limited to 30 queued
requests with a 30-second timeout so excess requests fail fast.

*  Move backup snapshot creation outside restore transaction

The backup snapshot (fsnap/create!) is now created in its own short-lived
connection before the actual restore transaction begins. This ensures the
backup is persisted independently of the restore outcome and reduces the
restore transaction window.

The restore itself runs inside a db/tx-run! block with an optimistic
locking check: it reads the file with FOR UPDATE and compares its revn
against the value captured at backup time. If the file was edited
concurrently, the restore aborts with a conflict error to prevent data
loss.

Co-dependent with the SKIP LOCKED change in restore! — the FOR UPDATE
acquired here is in the same transaction as restore!, so the SKIP LOCKED
inside restore! correctly sees the row as unlocked (same transaction).

* ♻️ Remove unused private function get-minimal-file

The local get-minimal-file function in file_snapshots.clj is no longer
used since restore! switched to direct exec-one! with FOR UPDATE SKIP
LOCKED. The sql:get-minimal-file SQL constant is still used directly.

*  Add minor improvements on db connection management

* ♻️ Refactor create-file-snapshot to use explicit transaction management

Remove automatic transaction wrapping (`::db/transaction true`) and
pass `cfg` through the call chain instead of destructured `conn`.
Wrap `fsnap/create!` in an explicit `db/tx-run!` for clearer
transaction boundaries.

Signed-off-by: Andrey Antukh <niwi@niwi.nz>

*  Add dedicated concurrency limit for create-file-snapshot

This adds a dedicated climit configuration for the create-file-snapshot
RPC method with :permits 1 per profile (plus queue of 2 and 60s timeout)
and a global limit of 3. Previously the method only used the generic
root/by-profile and root/global limits, allowing up to 10 concurrent
snapshot creation operations per profile which could cause database
contention and connection pool exhaustion.

Signed-off-by: Andrey Antukh <niwi@niwi.nz>

---------

Signed-off-by: Andrey Antukh <niwi@niwi.nz>
2026-05-19 14:30:44 +02:00
..
2024-10-22 20:23:38 +02:00
2024-10-22 20:23:38 +02:00