mirror of https://github.com/penpot/penpot.git synced 2026-04-25 11:18:36 +00:00

Andrey Antukh 6fa440cf92 🎉 Add chunked upload API for large media and binary files

Introduce a purpose-agnostic three-step session-based upload API that
allows uploading large binary blobs (media files and .penpot imports)
without hitting multipart size limits.

Backend:
- Migration 0147: new `upload_session` table (profile_id, total_chunks,
  created_at) with indexes on profile_id and created_at.
- Three new RPC commands in media.clj:
    * `create-upload-session`  – allocates a session row; enforces
      `upload-sessions-per-profile` and `upload-chunks-per-session`
      quota limits (configurable in config.clj, defaults 5 / 20).
    * `upload-chunk`           – stores each slice as a storage object;
      validates chunk index bounds and profile ownership.
    * `assemble-file-media-object` – reassembles chunks via the shared
      `assemble-chunks!` helper and creates the final media object.
- `assemble-chunks!` is a public helper in media.clj shared by both
  `assemble-file-media-object` and `import-binfile`.
- `import-binfile` (binfile.clj): accepts an optional `upload-id` param;
  when provided, materialises the temp file from chunks instead of
  expecting an inline multipart body, removing the 200 MiB body limit
  on .penpot imports.  Schema updated with an `:and` validator requiring
  either `:file` or `:upload-id`.
- quotes.clj: new `upload-sessions-per-profile` quota check.
- Background GC task (`tasks/upload_session_gc.clj`): deletes stalled
  (never-completed) sessions older than 1 hour; scheduled daily at
  midnight via the cron system in main.clj.
- backend/AGENTS.md: document the background-task wiring pattern.

Frontend:
- New `app.main.data.uploads` namespace: generic `upload-blob-chunked`
  helper drives steps 1–2 (create session + upload all chunks with a
  concurrency cap of 2) and emits `{:session-id uuid}` for callers.
- `config.cljs`: expose `upload-chunk-size` (default 25 MiB, overridable
  via `penpotUploadChunkSize` global).
- `workspace/media.cljs`: blobs ≥ chunk-size go through the chunked path
  (`upload-blob-chunked` → `assemble-file-media-object`); smaller blobs
  use the existing direct `upload-file-media-object` path.
  `handle-media-error` simplified; `on-error` callback removed.
- `worker/import.cljs`: new `import-blob-via-upload` helper replaces the
  inline multipart approach for both binfile-v1 and binfile-v3 imports.
- `repo.cljs`: `:upload-chunk` derived as a `::multipart-upload`;
  `form-data?` removed from `import-binfile` (JSON params only).

Tests:
- Backend (rpc_media_test.clj): happy path, idempotency, permission
  isolation, invalid media type, missing chunks, session-not-found,
  chunk-index out-of-range, and quota-limit scenarios.
- Frontend (uploads_test.cljs): session creation and chunk-count
  correctness for `upload-blob-chunked`.
- Frontend (workspace_media_test.cljs): direct-upload path for small
  blobs, chunked path for large blobs, and chunk-count correctness for
  `process-blobs`.
- `helpers/http.cljs`: shared fetch-mock helpers (`install-fetch-mock!`,
  `make-json-response`, `make-transit-response`, `url->cmd`).

Signed-off-by: Andrey Antukh <niwi@niwi.nz>

2026-04-16 19:43:57 +02:00

8.4 KiB

Raw Blame History

Penpot Backend – Agent Instructions

Clojure backend (RPC) service running on the JVM.

Uses Integrant for dependency injection, PostgreSQL for storage, and Redis for messaging/caching.

General Guidelines

To ensure consistency across the Penpot JVM stack, all contributions must adhere to these criteria:

1. Testing & Validation

Coverage: If code is added or modified in src/, corresponding tests in test/backend_tests/ must be added or updated.
Execution:
- Isolated: Run clojure -M:dev:test --focus backend-tests.my-ns-test for the specific test namespace.
- Regression: Run clojure -M:dev:test to ensure the suite passes without regressions in related functional areas.

2. Code Quality & Formatting

Linting: All code must pass clj-kondo checks (run pnpm run lint:clj)
Formatting: All the code must pass the formatting check (run pnpm run check-fmt). Use pnpm run fmt to fix formatting issues. Avoid "dirty" diffs caused by unrelated whitespace changes.
Type Hinting: Use explicit JVM type hints (e.g., ^String, ^long) in performance-critical paths to avoid reflection overhead.

Code Conventions

Namespace Overview

The source is located under src directory and this is a general overview of namespaces structure:

app.rpc.commands.* – RPC command implementations (auth, files, teams, etc.)
app.http.* – HTTP routes and middleware
app.db.* – Database layer
app.tasks.* – Background job tasks
app.main – Integrant system setup and entrypoint
app.loggers – Internal loggers (auditlog, mattermost, etc.) (not to be confused with app.common.logging)

RPC

The RPC methods are implemented using a multimethod-like structure via the app.util.services namespace. The main RPC methods are collected under app.rpc.commands namespace and exposed under /api/rpc/command/<cmd-name>.

The RPC method accepts POST and GET requests indistinctly and uses the Accept header to negotiate the response encoding (which can be Transit — the default — or plain JSON). It also accepts Transit (default) or JSON as input, which should be indicated using the Content-Type header.

The main convention is: use get- prefix on RPC name when we want READ operation.

Example of RPC method definition:

(sv/defmethod ::my-command
  {::rpc/auth true            ;; requires auth
   ::doc/added "1.18"
   ::sm/params [:map ...]     ;; malli input schema
   ::sm/result [:map ...]}    ;; malli output schema
  [{:keys [::db/pool] :as cfg} {:keys [::rpc/profile-id] :as params}]
  ;; return a plain map or throw
  {:id (uuid/next)})

Look under src/app/rpc/commands/*.clj to see more examples.

Tests

Test namespaces match .*-test$ under test/. Config is in tests.edn.

Integrant System

The src/app/main.clj declares the system map. Each key is a component; values are config maps with ::ig/ref for dependencies. Components implement ig/init-key / ig/halt-key!.

Connecting to the Database

Two PostgreSQL databases are used in this environment:

Database	Purpose	Connection string
`penpot`	Development / app	`postgresql://penpot:penpot@postgres/penpot`
`penpot_test`	Test suite	`postgresql://penpot:penpot@postgres/penpot_test`

Interactive psql session:

# development DB
psql "postgresql://penpot:penpot@postgres/penpot"

# test DB
psql "postgresql://penpot:penpot@postgres/penpot_test"

One-shot query (non-interactive):

psql "postgresql://penpot:penpot@postgres/penpot" -c "SELECT id, name FROM team LIMIT 5;"

Useful psql meta-commands:

\dt              -- list all tables
\d <table>       -- describe a table (columns, types, constraints)
\di              -- list indexes
\q               -- quit

Migrations table: Applied migrations are tracked in the migrations table with columns module, step, and created_at. When renaming a migration logical name, update this table in both databases to match the new name; otherwise the runner will attempt to re-apply the migration on next startup.

# Example: fix a renamed migration entry in the test DB
psql "postgresql://penpot:penpot@postgres/penpot_test" \
  -c "UPDATE migrations SET step = 'new-name' WHERE step = 'old-name';"

Database Access (Clojure)

app.db wraps next.jdbc. Queries use a SQL builder that auto-converts kebab-case ↔ snake_case.

;; Query helpers
(db/get cfg-or-pool :table {:id id})                    ; fetch one row (throws if missing)
(db/get* cfg-or-pool :table {:id id})                   ; fetch one row (returns nil)
(db/query cfg-or-pool :table {:team-id team-id})        ; fetch multiple rows
(db/insert! cfg-or-pool :table {:name "x" :team-id id}) ; insert
(db/update! cfg-or-pool :table {:name "y"} {:id id})    ; update
(db/delete! cfg-or-pool :table {:id id})                ; delete

;; Run multiple statements/queries on single connection
(db/run! cfg (fn [{:keys [::db/conn]}]
               (db/insert! conn :table row1)
               (db/insert! conn :table row2))


;; Transactions
(db/tx-run! cfg (fn [{:keys [::db/conn]}]
                  (db/insert! conn :table row)))

Almost all methods in the app.db namespace accept pool, conn, or cfg as params.

Migrations live in src/app/migrations/ as numbered SQL files. They run automatically on startup.

Error Handling

The exception helpers are defined on Common module, and are available under app.common.exceptions namespace.

Example of raising an exception:

(ex/raise :type :not-found
          :code :object-not-found
          :hint "File does not exist"
          :file-id id)

Common types: :not-found, :validation, :authorization, :conflict, :internal.

Performance Macros (`app.common.data.macros`)

Always prefer these macros over their clojure.core equivalents — they provide optimized implementations:

(dm/select-keys m [:a :b])     ;; faster than core/select-keys
(dm/get-in obj [:a :b :c])     ;; faster than core/get-in
(dm/str "a" "b" "c")           ;; string concatenation

Configuration

src/app/config.clj reads PENPOT_* environment variables, validated with Malli. Access anywhere via (cf/get :smtp-host). Feature flags: (cf/flags :enable-smtp).

Background Tasks

Background tasks live in src/app/tasks/. Each task is an Integrant component that exposes a ::handler key and follows this three-method pattern:

(defmethod ig/assert-key ::handler   ;; validate config at startup
  [_ params]
  (assert (db/pool? (::db/pool params)) "expected a valid database pool"))

(defmethod ig/expand-key ::handler   ;; inject defaults before init
  [k v]
  {k (assoc v ::my-option default-value)})

(defmethod ig/init-key ::handler     ;; return the task fn
  [_ cfg]
  (fn [_task]                        ;; receives the task row from the worker
    (db/tx-run! cfg (fn [{:keys [::db/conn]}]
                      ;; … do work …
                      ))))

Wiring a new task requires two changes in src/app/main.clj:

Handler config – add an entry in system-config with the dependencies:

:app.tasks.my-task/handler
{::db/pool (ig/ref ::db/pool)}

Registry + cron – register the handler name and schedule it:

;; in ::wrk/registry ::wrk/tasks map:
:my-task (ig/ref :app.tasks.my-task/handler)

;; in worker-config ::wrk/cron ::wrk/entries vector:
{:cron #penpot/cron "0 0 0 * * ?"   ;; daily at midnight
 :task :my-task}

Useful cron patterns (Quartz format — six fields: s m h dom mon dow):

Expression	Meaning
`"0 0 0 * * ?"`	Daily at midnight
`"0 0 /6 * ?"`	Every 6 hours
`"0 /5 * * ?"`	Every 5 minutes

Time helpers (app.common.time):

(ct/now)                          ;; current instant
(ct/duration {:hours 1})          ;; java.time.Duration
(ct/minus (ct/now) some-duration) ;; subtract duration from instant

db/interval converts a Duration (or millis / string) to a PostgreSQL interval object suitable for use in SQL queries:

(db/interval (ct/duration {:hours 1}))  ;; → PGInterval "3600.0 seconds"

8.4 KiB Raw Blame History Unescape Escape