penpot/backend/AGENTS.md
Andrey Antukh 7d4be33d4f 🎉 Add telemetry anonymous event collection (#9483)
* 🎉 Add telemetry anonymous event collection

Rewrite the audit logging subsystem to support three operating modes and
add anonymous telemetry event collection:

Modes:
- A (audit-log only): events persisted with full context
- B (audit-log + telemetry): same as A, plus events are collected for
  telemetry shipping
- C (telemetry-only): events stored anonymously with PII stripped,
  telemetry flag active, audit-log flag inactive

Audit system refactoring (app.loggers.audit):
- Replace qualified map keys (::audit/name etc.) with plain keywords
- Rename submit! -> submit, insert! -> insert, prepare-event ->
  prepare-rpc-event
- Add submit* as a lower-level public API
- Add process-event dispatch function that handles all three modes and
  webhooks in a single tx-run!
- Add :id to event schema (auto-generated if omitted)
- Add filter-telemetry-props: anonymises event props per event type.
  Keeps UUID/boolean/number values; for login/identify events preserves
  lang, auth-backend, email-domain; for navigate events preserves route,
  file-id, team-id, page-id; instance-start trigger passes through.
- Add filter-telemetry-context: retains only safe context keys.
  Backend: version, initiator, client-version, client-user-agent.
  Frontend: browser, os, locale, screen metrics, event-origin.
- Timestamps truncated to day precision via ct/truncate for telemetry
  storage
- PII stripped: props emptied, ip-addr zeroed, session-linking and
  access-token fields removed from context

Config (app.config):
- Derive :enable-telemetry flag from telemetry-enabled config option

Email utilities (app.email):
- Add email/clean and email/get-domain helper functions for domain
  extraction from email addresses

Setup (app.setup):
- Emit instance-start trigger event at system startup
- Simplify handle-instance-id (remove read-only check)

RPC layer (app.rpc):
- wrap-audit now activates when :telemetry flag is set
- Add :request-id to RPC params context for event correlation

RPC commands (management, teams_invitations, verify_token, OIDC auth,
webhooks): migrate all audit call sites to use the new plain-key API

SREPL (app.srepl.main):
- Migrate all audit/insert! calls to audit/insert with plain keys

Telemetry task (app.tasks.telemetry):
- Restructure legacy report into make-legacy-request; distinguish
  payload type as :telemetry-legacy-report
- Add collect-and-send-audit-events: loop fetching up to 10,000 rows
  per iteration, encodes and sends each page, deletes on success,
  stops immediately on failure for retry
- Add send-event-batch: POSTs fressian+zstd batch (base64 via
  blob/encode-str) to the telemetry endpoint with instance-id per event
- Add gc-telemetry-events: enforces 100,000-row safety cap by dropping
  oldest rows first
- Add delete-sent-events: deletes successfully shipped rows by id

Blob utilities (app.util.blob):
- Add encode-str/decode-str: combine fressian+zstd encoding with URL-
  safe base64 for JSON-safe string transport

Database:
- Add migration 0145: index on audit_log (source, created_at ASC) for
  efficient telemetry batch collection queries

Frontend:
- Always initialize event system regardless of :audit-log flag
- Defer auth events (signin identify) to after profile is set
- Refactor event subsystem for telemetry support

Tests (21 test vars, 94 assertions in tasks-telemetry-test):
- Cover all code paths: disabled/enabled telemetry, no-events no-op,
  happy-path batch send and delete, failure retention, payload anonymity,
  context stripping, timestamp day precision, batch encoding round-trip,
  multi-page iteration, GC cap enforcement, partial failure handling
- blob encode-str/decode-str round-trip tests (14 test vars)
- RPC audit integration tests (5 test vars)

Signed-off-by: Andrey Antukh <niwi@niwi.nz>

* 📎 Add pr feedback changes

---------

Signed-off-by: Andrey Antukh <niwi@niwi.nz>
2026-05-11 12:42:01 +02:00

263 lines
8.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Penpot Backend Agent Instructions
Clojure backend (RPC) service running on the JVM.
Uses Integrant for dependency injection, PostgreSQL for storage, and
Redis for messaging/caching.
## General Guidelines
To ensure consistency across the Penpot JVM stack, all contributions must adhere
to these criteria.
IMPORTANT: all CLI commands should be executed under backend/
subdirectory for make them work correctly.
### 1. Testing & Validation
* **Coverage:** If code is added or modified in `src/`, corresponding
tests in `test/backend_tests/` must be added or updated.
* **Execution:**
* **Isolated:** Run `clojure -M:dev:test --focus backend-tests.my-ns-test` for the specific test namespace.
* **Regression:** Run `clojure -M:dev:test` to ensure the suite passes without regressions in related functional areas.
### 2. Code Quality & Formatting
* **Linting:** All code must pass linter checks (run `pnpm run lint:clj` or `pnpm run lint` on the repository root)
* **Formatting:** All the code must pass the formatting check (run `pnpm run
check-fmt`). Use `pnpm run fmt` to fix formatting issues. Avoid "dirty"
diffs caused by unrelated whitespace changes.
* **Type Hinting:** Use explicit JVM type hints (e.g., `^String`, `^long`) in
performance-critical paths to avoid reflection overhead.
## Code Conventions
### Namespace Overview
The source is located under `src` directory and this is a general overview of
namespaces structure:
- `app.rpc.commands.*` RPC command implementations (`auth`, `files`, `teams`, etc.)
- `app.http.*` HTTP routes and middleware
- `app.db.*` Database layer
- `app.tasks.*` Background job tasks
- `app.main` Integrant system setup and entrypoint
- `app.loggers` Internal loggers (auditlog, mattermost, etc.) (not to be confused with `app.common.logging`)
### RPC
The RPC methods are implemented using a multimethod-like structure via the
`app.util.services` namespace. The main RPC methods are collected under
`app.rpc.commands` namespace and exposed under `/api/rpc/command/<cmd-name>`.
The RPC method accepts POST and GET requests indistinctly and uses the `Accept`
header to negotiate the response encoding (which can be Transit — the default —
or plain JSON). It also accepts Transit (default) or JSON as input, which should
be indicated using the `Content-Type` header.
The main convention is: use `get-` prefix on RPC name when we want READ
operation.
Example of RPC method definition:
```clojure
(sv/defmethod ::my-command
{::rpc/auth true ;; requires auth
::doc/added "1.18"
::sm/params [:map ...] ;; malli input schema
::sm/result [:map ...]} ;; malli output schema
[{:keys [::db/pool] :as cfg} {:keys [::rpc/profile-id] :as params}]
;; return a plain map or throw
{:id (uuid/next)})
```
Look under `src/app/rpc/commands/*.clj` to see more examples.
### Tests
Test namespaces match `.*-test$` under `test/`. Config is in `tests.edn`.
### Integrant System
The `src/app/main.clj` declares the system map. Each key is a component; values
are config maps with `::ig/ref` for dependencies. Components implement
`ig/init-key` / `ig/halt-key!`.
### Connecting to the Database
Two PostgreSQL databases are used in this environment:
| Database | Purpose | Connection string |
|---------------|--------------------|----------------------------------------------------|
| `penpot` | Development / app | `postgresql://penpot:penpot@postgres/penpot` |
| `penpot_test` | Test suite | `postgresql://penpot:penpot@postgres/penpot_test` |
**Interactive psql session:**
```bash
# development DB
psql "postgresql://penpot:penpot@postgres/penpot"
# test DB
psql "postgresql://penpot:penpot@postgres/penpot_test"
```
**One-shot query (non-interactive):**
```bash
psql "postgresql://penpot:penpot@postgres/penpot" -c "SELECT id, name FROM team LIMIT 5;"
```
**Useful psql meta-commands:**
```
\dt -- list all tables
\d <table> -- describe a table (columns, types, constraints)
\di -- list indexes
\q -- quit
```
> **Migrations table:** Applied migrations are tracked in the `migrations` table
> with columns `module`, `step`, and `created_at`. When renaming a migration
> logical name, update this table in both databases to match the new name;
> otherwise the runner will attempt to re-apply the migration on next startup.
```bash
# Example: fix a renamed migration entry in the test DB
psql "postgresql://penpot:penpot@postgres/penpot_test" \
-c "UPDATE migrations SET step = 'new-name' WHERE step = 'old-name';"
```
### Database Access (Clojure)
`app.db` wraps next.jdbc. Queries use a SQL builder that auto-converts kebab-case ↔ snake_case.
```clojure
;; Query helpers
(db/get cfg-or-pool :table {:id id}) ; fetch one row (throws if missing)
(db/get* cfg-or-pool :table {:id id}) ; fetch one row (returns nil)
(db/query cfg-or-pool :table {:team-id team-id}) ; fetch multiple rows
(db/insert! cfg-or-pool :table {:name "x" :team-id id}) ; insert
(db/update! cfg-or-pool :table {:name "y"} {:id id}) ; update
(db/delete! cfg-or-pool :table {:id id}) ; delete
;; Run multiple statements/queries on single connection
(db/run! cfg (fn [{:keys [::db/conn]}]
(db/insert! conn :table row1)
(db/insert! conn :table row2))
;; Transactions
(db/tx-run! cfg (fn [{:keys [::db/conn]}]
(db/insert! conn :table row)))
```
Almost all methods in the `app.db` namespace accept `pool`, `conn`, or
`cfg` as params.
Migrations live in `src/app/migrations/` as numbered SQL files. They run automatically on startup.
### Error Handling
The exception helpers are defined on Common module, and are available under
`app.common.exceptions` namespace.
Example of raising an exception:
```clojure
(ex/raise :type :not-found
:code :object-not-found
:hint "File does not exist"
:file-id id)
```
Common types: `:not-found`, `:validation`, `:authorization`, `:conflict`, `:internal`.
### Performance Macros (`app.common.data.macros`)
Always prefer these macros over their `clojure.core` equivalents — they provide
optimized implementations:
```clojure
(dm/select-keys m [:a :b]) ;; faster than core/select-keys
(dm/get-in obj [:a :b :c]) ;; faster than core/get-in
(dm/str "a" "b" "c") ;; string concatenation
```
### Configuration
`src/app/config.clj` reads `PENPOT_*` environment variables, validated with
Malli. Access anywhere via `(cf/get :smtp-host)`. Feature flags: `(cf/flags
:enable-smtp)`.
### Background Tasks
Background tasks live in `src/app/tasks/`. Each task is an Integrant component
that exposes a `::handler` key and follows this three-method pattern:
```clojure
(defmethod ig/assert-key ::handler ;; validate config at startup
[_ params]
(assert (db/pool? (::db/pool params)) "expected a valid database pool"))
(defmethod ig/expand-key ::handler ;; inject defaults before init
[k v]
{k (assoc v ::my-option default-value)})
(defmethod ig/init-key ::handler ;; return the task fn
[_ cfg]
(fn [_task] ;; receives the task row from the worker
(db/tx-run! cfg (fn [{:keys [::db/conn]}]
;; … do work …
))))
```
**Wiring a new task** requires two changes in `src/app/main.clj`:
1. **Handler config** add an entry in `system-config` with the dependencies:
```clojure
:app.tasks.my-task/handler
{::db/pool (ig/ref ::db/pool)}
```
2. **Registry + cron** register the handler name and schedule it:
```clojure
;; in ::wrk/registry ::wrk/tasks map:
:my-task (ig/ref :app.tasks.my-task/handler)
;; in worker-config ::wrk/cron ::wrk/entries vector:
{:cron #penpot/cron "0 0 0 * * ?" ;; daily at midnight
:task :my-task}
```
**Useful cron patterns** (Quartz format — six fields: s m h dom mon dow):
| Expression | Meaning |
|------------------------------|--------------------|
| `"0 0 0 * * ?"` | Daily at midnight |
| `"0 0 */6 * * ?"` | Every 6 hours |
| `"0 */5 * * * ?"` | Every 5 minutes |
**Time helpers** (`app.common.time`):
```clojure
(ct/now) ;; current instant
(ct/duration {:hours 1}) ;; java.time.Duration
(ct/minus (ct/now) some-duration) ;; subtract duration from instant
```
`db/interval` converts a `Duration` (or millis / string) to a PostgreSQL
interval object suitable for use in SQL queries:
```clojure
(db/interval (ct/duration {:hours 1})) ;; → PGInterval "3600.0 seconds"
```