R-Same M2 Data Sources + M3 Query Engine Foundation
Decision
M2 + M3 foundation shipped in commit e55a766 (2026-04-17). +29 files / +2235 lines. M2 Data Sources complete: DataSource + DataSourceGroup SQLAlchemy models + Alembic migration (3 enums), Fernet encryption via HKDF-SHA256-derived key from jwt_secret with domain-separation, slug regex validation, per-source queue_name/scheduled_queue_name isolation, soft-delete, workspace-scoped or global visibility. M2 service with CRUD + encrypt-on-write + decrypt-on-read + merge-patch on options + test-connection + schema introspection (all via asyncio.to_thread boundaries). M2 routes /api/v1/data-sources CRUD + /test + /schema gated by platform-admin OR governance-admin, all mutations audit-logged. M3 Query Engine foundation: Query + QueryResult + QueryJob models with SHA1 query_hash cache keyed by (hash, data_source_id), Query.latest_result_id linkage, QueryJob status machine (pending/running/success/failed). QueryService synchronous execute path (render Mustache params → hash → cache lookup → threadpool driver call → cache write → update Query.latest_result_id). M3 routes /api/v1/queries CRUD + /execute + /results/:id + /jobs/:id gated by workspace roles + platform-admin. New packages/query-runner workspace package with BaseQueryRunner + BaseSQLQueryRunner abstract contract, register/get registry pattern, QueryResult dataclass, QueryExecutionError, auto-limit for bare SELECTs via sqlparse, annotate_query tracer, ParameterizedQuery Mustache engine (6 types: text/number/enum/date/datetime/daterange), SnowflakeQueryRunner driver via snowflake-connector-python (sync lib, threadpool-wrapped, configurable statement timeout 10-600s default 60s, INFORMATION_SCHEMA introspection, RSAME_READ role default). cryptography + aiosmtplib added to apps/api deps; query-runner added to uv workspace. VENDOR.md updated to document fresh-implementation rationale (not verbatim Redash port). 14 new unit-test assertions (4 crypto + 10 parameterized query) bringing Wave-1 total to 46.
Rationale
M2 + M3 are the data-plane pillars. Without them there’s no reason to log in. Shipping them together as foundation means the execution path is testable end-to-end (admin creates data source → user creates query → execute → see result). Synchronous execution in v1 is deliberate: RQ worker + Redis lock dedup is a significant additional piece (next pass) and shouldn’t block end-to-end demo. Fernet via HKDF-derived key avoids a second secret in v1 while preserving domain-separation from JWT signing — Layer 5 graduates to dedicated RSAME_ENCRYPTION_KEY with Vault rotation. Parameterized query engine is security-critical (user-provided query text + user-provided parameters) — enforcing type-check + quote-escape + enum whitelist at render time makes SQL injection via parameters architecturally impossible. Auto-limit on bare SELECTs prevents runaway queries from users who forget LIMIT. Statement timeout bounds query execution server-side. Snowflake RSAME_READ role (admin must provision in Layer 5 IaC) with SELECT-only grant means any DDL/DML is rejected at warehouse level regardless of what query text is submitted. Fresh re-implementation of Redash architecture vs verbatim vendor: avoids Python 2/3-compat cruft, unwinds Flask coupling, cleaner license provenance for proprietary use (Redash is BSD-2; architectural patterns aren’t copyrightable). Query-result cache keyed by (SHA1 of rendered SQL, data_source_id) matches Redash’s proven dedup approach — same query against same source returns cached result instantly. Workspace role gate on query routes honors L2 RBAC matrix. Admin-gate on data-sources routes prevents non-admin users from seeing connection details. All mutations audit-logged in same transaction preserves Law 11 audit integrity.
Alternatives Rejected
Verbatim vendor of Redash query_runner — rejected: VENDOR.md explains. Unwinding Flask coupling + Python 2/3-compat cruft is as expensive as re-implementing. License cleaner without inherited code.
Async-native Snowflake driver — rejected: snowflake-connector-python is sync-only at 3.12; asyncio.to_thread keeps event loop responsive without driver rewrite.
execute inline via synchronous DB write + driver call — rejected original worker-based plan: Worker-based execute with RQ + Redis dedup is ideal but adds 2-3 days of infrastructure work. V1 inline-synchronous unblocks demo; V2 moves to worker without changing API contract.
Store options as plain JSONB with Postgres encryption at rest — rejected: column-level Fernet gives defense-in-depth (protects from DBA snooping + backup leak + replica compromise).
Separate encryption key in env — deferred to Layer 5: HKDF-derived from jwt_secret is sufficient for v1; Vault-rotated key is infrastructure concern.
Single migration for M2 + M3 — rejected: separate migrations preserve module boundaries in history; easier to squash or reorder during dev.
Query execution as GET /queries/:id/execute — rejected: POST /queries/execute allows ad-hoc execution without saving (preview mode); the saved-query execute path uses query_id in the body.
Cache invalidation via TTL — deferred: Redash’s pattern is “cache forever until explicitly refreshed” via scheduled_refresh; we follow that in v2 with RQ scheduler. V1 always tries cache first; explicit use_cache=false bypasses.
Run query as the requesting user’s Snowflake identity — rejected for v1: requires per-user Snowflake credentials or RLS-via-role which isn’t configured yet. V1 uses a single RSAME_READ role per data source; M5 Governance wires per-user identity propagation (Wave 2).
Expose raw Snowflake connection errors to users — rejected: generic QueryExecutionFailedError with driver-detail in audit meta (not user body) prevents credential/internal-schema leak.
Let Query.tags default to NULL — rejected: PostgreSQL ARRAY with server_default=’{}’ means queries always have iterable tags; no None-guard needed in service code.
Defer packages/query-runner to separate commit — rejected: coupled to M3 by nature (services.py imports get_runner); cleaner as single commit.
Outcome
Pending
Related
- R-Same M1 Identity COMPLETE
- r-same-wave-0-scaffold-complete
- r-dash-architecture-freeze-checkpoint-2
Snowflake- fernet-encryption-for-data-source-credentials
- pwa-push-notifications-require-vapid-push-handler
- trpc-protected-procedure-is-authn-only-not-authz
- cube-query-cache-column-masks-reapplied-on-cache-hit
- query-revision-auto-snapshot-wired-into-service-update-not-r
- wave-4-pass-4a-architecture-serwistnext-9510-workbox-style-s
- enterprise-backup-syncsh-drift-detector-auto-discovers-optne
- runwal-aws-account-798513555087-is-vendor-owned-midnight-dig
- tableau-mcp-gap-set-project-default-permissions