5 Commits

Author SHA1 Message Date
Andrey Antukh
7d4be33d4f 🎉 Add telemetry anonymous event collection (#9483)
* 🎉 Add telemetry anonymous event collection

Rewrite the audit logging subsystem to support three operating modes and
add anonymous telemetry event collection:

Modes:
- A (audit-log only): events persisted with full context
- B (audit-log + telemetry): same as A, plus events are collected for
  telemetry shipping
- C (telemetry-only): events stored anonymously with PII stripped,
  telemetry flag active, audit-log flag inactive

Audit system refactoring (app.loggers.audit):
- Replace qualified map keys (::audit/name etc.) with plain keywords
- Rename submit! -> submit, insert! -> insert, prepare-event ->
  prepare-rpc-event
- Add submit* as a lower-level public API
- Add process-event dispatch function that handles all three modes and
  webhooks in a single tx-run!
- Add :id to event schema (auto-generated if omitted)
- Add filter-telemetry-props: anonymises event props per event type.
  Keeps UUID/boolean/number values; for login/identify events preserves
  lang, auth-backend, email-domain; for navigate events preserves route,
  file-id, team-id, page-id; instance-start trigger passes through.
- Add filter-telemetry-context: retains only safe context keys.
  Backend: version, initiator, client-version, client-user-agent.
  Frontend: browser, os, locale, screen metrics, event-origin.
- Timestamps truncated to day precision via ct/truncate for telemetry
  storage
- PII stripped: props emptied, ip-addr zeroed, session-linking and
  access-token fields removed from context

Config (app.config):
- Derive :enable-telemetry flag from telemetry-enabled config option

Email utilities (app.email):
- Add email/clean and email/get-domain helper functions for domain
  extraction from email addresses

Setup (app.setup):
- Emit instance-start trigger event at system startup
- Simplify handle-instance-id (remove read-only check)

RPC layer (app.rpc):
- wrap-audit now activates when :telemetry flag is set
- Add :request-id to RPC params context for event correlation

RPC commands (management, teams_invitations, verify_token, OIDC auth,
webhooks): migrate all audit call sites to use the new plain-key API

SREPL (app.srepl.main):
- Migrate all audit/insert! calls to audit/insert with plain keys

Telemetry task (app.tasks.telemetry):
- Restructure legacy report into make-legacy-request; distinguish
  payload type as :telemetry-legacy-report
- Add collect-and-send-audit-events: loop fetching up to 10,000 rows
  per iteration, encodes and sends each page, deletes on success,
  stops immediately on failure for retry
- Add send-event-batch: POSTs fressian+zstd batch (base64 via
  blob/encode-str) to the telemetry endpoint with instance-id per event
- Add gc-telemetry-events: enforces 100,000-row safety cap by dropping
  oldest rows first
- Add delete-sent-events: deletes successfully shipped rows by id

Blob utilities (app.util.blob):
- Add encode-str/decode-str: combine fressian+zstd encoding with URL-
  safe base64 for JSON-safe string transport

Database:
- Add migration 0145: index on audit_log (source, created_at ASC) for
  efficient telemetry batch collection queries

Frontend:
- Always initialize event system regardless of :audit-log flag
- Defer auth events (signin identify) to after profile is set
- Refactor event subsystem for telemetry support

Tests (21 test vars, 94 assertions in tasks-telemetry-test):
- Cover all code paths: disabled/enabled telemetry, no-events no-op,
  happy-path batch send and delete, failure retention, payload anonymity,
  context stripping, timestamp day precision, batch encoding round-trip,
  multi-page iteration, GC cap enforcement, partial failure handling
- blob encode-str/decode-str round-trip tests (14 test vars)
- RPC audit integration tests (5 test vars)

Signed-off-by: Andrey Antukh <niwi@niwi.nz>

* 📎 Add pr feedback changes

---------

Signed-off-by: Andrey Antukh <niwi@niwi.nz>
2026-05-11 12:42:01 +02:00
Andrey Antukh
e5f9c1e863 🎉 Add chunked upload API for large media and binary files
Introduce a purpose-agnostic three-step session-based upload API that
allows uploading large binary blobs (media files and .penpot imports)
without hitting multipart size limits.

Backend:
- Migration 0147: new `upload_session` table (profile_id, total_chunks,
  created_at) with indexes on profile_id and created_at.
- Three new RPC commands in media.clj:
    * `create-upload-session`  – allocates a session row; enforces
      `upload-sessions-per-profile` and `upload-chunks-per-session`
      quota limits (configurable in config.clj, defaults 5 / 20).
    * `upload-chunk`           – stores each slice as a storage object;
      validates chunk index bounds and profile ownership.
    * `assemble-file-media-object` – reassembles chunks via the shared
      `assemble-chunks!` helper and creates the final media object.
- `assemble-chunks!` is a public helper in media.clj shared by both
  `assemble-file-media-object` and `import-binfile`.
- `import-binfile` (binfile.clj): accepts an optional `upload-id` param;
  when provided, materialises the temp file from chunks instead of
  expecting an inline multipart body, removing the 200 MiB body limit
  on .penpot imports.  Schema updated with an `:and` validator requiring
  either `:file` or `:upload-id`.
- quotes.clj: new `upload-sessions-per-profile` quota check.
- Background GC task (`tasks/upload_session_gc.clj`): deletes stalled
  (never-completed) sessions older than 1 hour; scheduled daily at
  midnight via the cron system in main.clj.
- backend/AGENTS.md: document the background-task wiring pattern.

Frontend:
- New `app.main.data.uploads` namespace: generic `upload-blob-chunked`
  helper drives steps 1–2 (create session + upload all chunks with a
  concurrency cap of 2) and emits `{:session-id uuid}` for callers.
- `config.cljs`: expose `upload-chunk-size` (default 25 MiB, overridable
  via `penpotUploadChunkSize` global).
- `workspace/media.cljs`: blobs ≥ chunk-size go through the chunked path
  (`upload-blob-chunked` → `assemble-file-media-object`); smaller blobs
  use the existing direct `upload-file-media-object` path.
  `handle-media-error` simplified; `on-error` callback removed.
- `worker/import.cljs`: new `import-blob-via-upload` helper replaces the
  inline multipart approach for both binfile-v1 and binfile-v3 imports.
- `repo.cljs`: `:upload-chunk` derived as a `::multipart-upload`;
  `form-data?` removed from `import-binfile` (JSON params only).

Tests:
- Backend (rpc_media_test.clj): happy path, idempotency, permission
  isolation, invalid media type, missing chunks, session-not-found,
  chunk-index out-of-range, and quota-limit scenarios.
- Frontend (uploads_test.cljs): session creation and chunk-count
  correctness for `upload-blob-chunked`.
- Frontend (workspace_media_test.cljs): direct-upload path for small
  blobs, chunked path for large blobs, and chunk-count correctness for
  `process-blobs`.
- `helpers/http.cljs`: shared fetch-mock helpers (`install-fetch-mock!`,
  `make-json-response`, `make-transit-response`, `url->cmd`).

Signed-off-by: Andrey Antukh <niwi@niwi.nz>
2026-04-21 18:51:10 +00:00
Andrey Antukh
cc03f3f884 📚 Add minor improvements to ai agents documentation 2026-03-24 18:00:39 +01:00
Andrey Antukh
2d616cf9c0
📚 Add better organization for AGENTS.md file (#8675) 2026-03-18 14:59:38 +01:00
Andrey Antukh
32cf95265a 📚 Add GitHub Copilot instructions (#8548) 2026-03-10 13:12:15 +01:00