mirror of https://github.com/penpot/penpot.git synced 2026-05-11 02:58:25 +00:00

Andrey Antukh affb6aec84 🎉 Add telemetry anonymous event collection

Rewrite the audit logging subsystem to support three operating modes and
add anonymous telemetry event collection:

Modes:
- A (audit-log only): events persisted with full context
- B (audit-log + telemetry): same as A, plus events are collected for
  telemetry shipping
- C (telemetry-only): events stored anonymously with PII stripped,
  telemetry flag active, audit-log flag inactive

Audit system refactoring (app.loggers.audit):
- Replace qualified map keys (::audit/name etc.) with plain keywords
- Rename submit! -> submit, insert! -> insert, prepare-event ->
  prepare-rpc-event
- Add submit* as a lower-level public API
- Add process-event dispatch function that handles all three modes and
  webhooks in a single tx-run!
- Add :id to event schema (auto-generated if omitted)
- Add filter-telemetry-props: anonymises event props per event type.
  Keeps UUID/boolean/number values; for login/identify events preserves
  lang, auth-backend, email-domain; for navigate events preserves route,
  file-id, team-id, page-id; instance-start trigger passes through.
- Add filter-telemetry-context: retains only safe context keys.
  Backend: version, initiator, client-version, client-user-agent.
  Frontend: browser, os, locale, screen metrics, event-origin.
- Timestamps truncated to day precision via ct/truncate for telemetry
  storage
- PII stripped: props emptied, ip-addr zeroed, session-linking and
  access-token fields removed from context

Config (app.config):
- Derive :enable-telemetry flag from telemetry-enabled config option

Email utilities (app.email):
- Add email/clean and email/get-domain helper functions for domain
  extraction from email addresses

Setup (app.setup):
- Emit instance-start trigger event at system startup
- Simplify handle-instance-id (remove read-only check)

RPC layer (app.rpc):
- wrap-audit now activates when :telemetry flag is set
- Add :request-id to RPC params context for event correlation

RPC commands (management, teams_invitations, verify_token, OIDC auth,
webhooks): migrate all audit call sites to use the new plain-key API

SREPL (app.srepl.main):
- Migrate all audit/insert! calls to audit/insert with plain keys

Telemetry task (app.tasks.telemetry):
- Restructure legacy report into make-legacy-request; distinguish
  payload type as :telemetry-legacy-report
- Add collect-and-send-audit-events: loop fetching up to 10,000 rows
  per iteration, encodes and sends each page, deletes on success,
  stops immediately on failure for retry
- Add send-event-batch: POSTs fressian+zstd batch (base64 via
  blob/encode-str) to the telemetry endpoint with instance-id per event
- Add gc-telemetry-events: enforces 100,000-row safety cap by dropping
  oldest rows first
- Add delete-sent-events: deletes successfully shipped rows by id

Blob utilities (app.util.blob):
- Add encode-str/decode-str: combine fressian+zstd encoding with URL-
  safe base64 for JSON-safe string transport

Database:
- Add migration 0145: index on audit_log (source, created_at ASC) for
  efficient telemetry batch collection queries

Frontend:
- Always initialize event system regardless of :audit-log flag
- Defer auth events (signin identify) to after profile is set
- Refactor event subsystem for telemetry support

Tests (21 test vars, 94 assertions in tasks-telemetry-test):
- Cover all code paths: disabled/enabled telemetry, no-events no-op,
  happy-path batch send and delete, failure retention, payload anonymity,
  context stripping, timestamp day precision, batch encoding round-trip,
  multi-page iteration, GC cap enforcement, partial failure handling
- blob encode-str/decode-str round-trip tests (14 test vars)
- RPC audit integration tests (5 test vars)

Signed-off-by: Andrey Antukh <niwi@niwi.nz>

2026-05-10 12:16:50 +02:00

8.5 KiB

Raw Blame History

Penpot Backend – Agent Instructions

Clojure backend (RPC) service running on the JVM.

Uses Integrant for dependency injection, PostgreSQL for storage, and Redis for messaging/caching.

General Guidelines

To ensure consistency across the Penpot JVM stack, all contributions must adhere to these criteria.

IMPORTANT: all CLI commands should be executed under backend/ subdirectory for make them work correctly.

1. Testing & Validation

Coverage: If code is added or modified in src/, corresponding tests in test/backend_tests/ must be added or updated.
Execution:
- Isolated: Run clojure -M:dev:test --focus backend-tests.my-ns-test for the specific test namespace.
- Regression: Run clojure -M:dev:test to ensure the suite passes without regressions in related functional areas.

2. Code Quality & Formatting

Linting: All code must pass linter checks (run pnpm run lint:clj or pnpm run lint on the repository root)
Formatting: All the code must pass the formatting check (run pnpm run check-fmt). Use pnpm run fmt to fix formatting issues. Avoid "dirty" diffs caused by unrelated whitespace changes.
Type Hinting: Use explicit JVM type hints (e.g., ^String, ^long) in performance-critical paths to avoid reflection overhead.

Code Conventions

Namespace Overview

The source is located under src directory and this is a general overview of namespaces structure:

app.rpc.commands.* – RPC command implementations (auth, files, teams, etc.)
app.http.* – HTTP routes and middleware
app.db.* – Database layer
app.tasks.* – Background job tasks
app.main – Integrant system setup and entrypoint
app.loggers – Internal loggers (auditlog, mattermost, etc.) (not to be confused with app.common.logging)

RPC

The RPC methods are implemented using a multimethod-like structure via the app.util.services namespace. The main RPC methods are collected under app.rpc.commands namespace and exposed under /api/rpc/command/<cmd-name>.

The RPC method accepts POST and GET requests indistinctly and uses the Accept header to negotiate the response encoding (which can be Transit — the default — or plain JSON). It also accepts Transit (default) or JSON as input, which should be indicated using the Content-Type header.

The main convention is: use get- prefix on RPC name when we want READ operation.

Example of RPC method definition:

(sv/defmethod ::my-command
  {::rpc/auth true            ;; requires auth
   ::doc/added "1.18"
   ::sm/params [:map ...]     ;; malli input schema
   ::sm/result [:map ...]}    ;; malli output schema
  [{:keys [::db/pool] :as cfg} {:keys [::rpc/profile-id] :as params}]
  ;; return a plain map or throw
  {:id (uuid/next)})

Look under src/app/rpc/commands/*.clj to see more examples.

Tests

Test namespaces match .*-test$ under test/. Config is in tests.edn.

Integrant System

The src/app/main.clj declares the system map. Each key is a component; values are config maps with ::ig/ref for dependencies. Components implement ig/init-key / ig/halt-key!.

Connecting to the Database

Two PostgreSQL databases are used in this environment:

Database	Purpose	Connection string
`penpot`	Development / app	`postgresql://penpot:penpot@postgres/penpot`
`penpot_test`	Test suite	`postgresql://penpot:penpot@postgres/penpot_test`

Interactive psql session:

# development DB
psql "postgresql://penpot:penpot@postgres/penpot"

# test DB
psql "postgresql://penpot:penpot@postgres/penpot_test"

One-shot query (non-interactive):

psql "postgresql://penpot:penpot@postgres/penpot" -c "SELECT id, name FROM team LIMIT 5;"

Useful psql meta-commands:

\dt              -- list all tables
\d <table>       -- describe a table (columns, types, constraints)
\di              -- list indexes
\q               -- quit

Migrations table: Applied migrations are tracked in the migrations table with columns module, step, and created_at. When renaming a migration logical name, update this table in both databases to match the new name; otherwise the runner will attempt to re-apply the migration on next startup.

# Example: fix a renamed migration entry in the test DB
psql "postgresql://penpot:penpot@postgres/penpot_test" \
  -c "UPDATE migrations SET step = 'new-name' WHERE step = 'old-name';"

Database Access (Clojure)

app.db wraps next.jdbc. Queries use a SQL builder that auto-converts kebab-case ↔ snake_case.

;; Query helpers
(db/get cfg-or-pool :table {:id id})                    ; fetch one row (throws if missing)
(db/get* cfg-or-pool :table {:id id})                   ; fetch one row (returns nil)
(db/query cfg-or-pool :table {:team-id team-id})        ; fetch multiple rows
(db/insert! cfg-or-pool :table {:name "x" :team-id id}) ; insert
(db/update! cfg-or-pool :table {:name "y"} {:id id})    ; update
(db/delete! cfg-or-pool :table {:id id})                ; delete

;; Run multiple statements/queries on single connection
(db/run! cfg (fn [{:keys [::db/conn]}]
               (db/insert! conn :table row1)
               (db/insert! conn :table row2))


;; Transactions
(db/tx-run! cfg (fn [{:keys [::db/conn]}]
                  (db/insert! conn :table row)))

Almost all methods in the app.db namespace accept pool, conn, or cfg as params.

Migrations live in src/app/migrations/ as numbered SQL files. They run automatically on startup.

Error Handling

The exception helpers are defined on Common module, and are available under app.common.exceptions namespace.

Example of raising an exception:

(ex/raise :type :not-found
          :code :object-not-found
          :hint "File does not exist"
          :file-id id)

Common types: :not-found, :validation, :authorization, :conflict, :internal.

Performance Macros (`app.common.data.macros`)

Always prefer these macros over their clojure.core equivalents — they provide optimized implementations:

(dm/select-keys m [:a :b])     ;; faster than core/select-keys
(dm/get-in obj [:a :b :c])     ;; faster than core/get-in
(dm/str "a" "b" "c")           ;; string concatenation

Configuration

src/app/config.clj reads PENPOT_* environment variables, validated with Malli. Access anywhere via (cf/get :smtp-host). Feature flags: (cf/flags :enable-smtp).

Background Tasks

Background tasks live in src/app/tasks/. Each task is an Integrant component that exposes a ::handler key and follows this three-method pattern:

(defmethod ig/assert-key ::handler   ;; validate config at startup
  [_ params]
  (assert (db/pool? (::db/pool params)) "expected a valid database pool"))

(defmethod ig/expand-key ::handler   ;; inject defaults before init
  [k v]
  {k (assoc v ::my-option default-value)})

(defmethod ig/init-key ::handler     ;; return the task fn
  [_ cfg]
  (fn [_task]                        ;; receives the task row from the worker
    (db/tx-run! cfg (fn [{:keys [::db/conn]}]
                      ;; … do work …
                      ))))

Wiring a new task requires two changes in src/app/main.clj:

Handler config – add an entry in system-config with the dependencies:

:app.tasks.my-task/handler
{::db/pool (ig/ref ::db/pool)}

Registry + cron – register the handler name and schedule it:

;; in ::wrk/registry ::wrk/tasks map:
:my-task (ig/ref :app.tasks.my-task/handler)

;; in worker-config ::wrk/cron ::wrk/entries vector:
{:cron #penpot/cron "0 0 0 * * ?"   ;; daily at midnight
 :task :my-task}

Useful cron patterns (Quartz format — six fields: s m h dom mon dow):

Expression	Meaning
`"0 0 0 * * ?"`	Daily at midnight
`"0 0 /6 * ?"`	Every 6 hours
`"0 /5 * * ?"`	Every 5 minutes

Time helpers (app.common.time):

(ct/now)                          ;; current instant
(ct/duration {:hours 1})          ;; java.time.Duration
(ct/minus (ct/now) some-duration) ;; subtract duration from instant

db/interval converts a Duration (or millis / string) to a PostgreSQL interval object suitable for use in SQL queries:

(db/interval (ct/duration {:hours 1}))  ;; → PGInterval "3600.0 seconds"

8.5 KiB Raw Blame History Unescape Escape