Telemetry Backend Research for OFEM (OneLake File Explorer)¶
Context¶
OFEM is an open-source macOS app shipping opt-out telemetry with the schema
{ ts, installId, appVersion, platform, osVersion, tenantId, event, durationMs, success, errorCode }
(~300 B per event). Projected volume: < 100k events/month in year one, possibly
growing toward 1M/month. Data must end up queryable in Microsoft Fabric, but does
not have to land in Fabric. A 24/7 Fabric capacity is a non-starter for a personal
OSS project.
All EUR prices are West Europe pay-as-you-go, rounded; check the Azure pricing calculator before committing.
1. Azure Functions Consumption + ADLS Gen2¶
- Free grant: 1,000,000 executions/month + 400,000 GB-s/month, per subscription.
- 100k events/month sits ~10x inside the free tier. 1M/month still fits.
- Storage: 100k events x ~300 B = ~30 MB/month. ADLS Gen2 hot = ~0.018 EUR/GB; writes ~0.05 EUR per 10k transactions. Batching client side keeps writes well under 1 EUR/mo.
- Cold start: 1-3 s on first hit after idle. Fine for fire-and-forget telemetry.
- Fabric integration: create an ADLS Gen2 shortcut in a Lakehouse
Files/, then a notebook (or a Fabric trial capacity, or your own developer machine) parses JSONL into Delta. Shortcuts themselves consume zero capacity when idle. - Total: effectively 0 EUR/month at 100k; <1 EUR/month at 1M.
2. Application Insights (workspace-based)¶
- Free grant: 5 GB/month ingestion, 31-day retention free.
- 100k events x ~1 KB enriched = ~100 MB/month - 50x headroom. 1M/month = ~1 GB.
- Fabric query path: AI/Log Analytics has no native Fabric connector. Workarounds:
- Diagnostic Settings -> Storage Account (continuous export of the
customEventstable to Blob/ADLS). Then shortcut that storage into OneLake. Export itself costs ~0.13 EUR/GB ingested, so ~0.65 EUR/mo at 1M events. - Or query Log Analytics from a Fabric notebook via the
azure-monitor-querySDK. - Pros: SDK is one line, anomaly detection / live metrics / sampling are free.
- Cons: extra hop into Fabric, double-storage, Azure Monitor egress charge if you export, vendor-flavored schema.
3. Azure Event Hubs Basic¶
- Minimum 1 Throughput Unit, billed hourly. Basic = ~0.011 EUR/hr = ~8 EUR/month baseline, plus 0.025 EUR per million ingress events.
- 1 TU = 1 MB/s, 1,000 events/s. Our peak is maybe 1 event/s. Massively overkill.
- Basic tier max retention is 1 day, no Capture, no Kafka consumer groups.
- Verdict: rule out. Pay-per-streaming-unit doesn't beat Functions free tier.
4. Direct client write to Storage with SAS¶
- Cost: storage only (~cents/mo).
- Security: SAS embedded in a public open-source binary is essentially a public
write token. Anyone can spam your container; the moment it's abused you need to
rotate (and every old client breaks). Account SAS also allows listing/deletion
unless tightly scoped to one container with
add-only and short expiry. - Even with a write-only, time-bounded user-delegated SAS rotated server-side, you still need a server to mint it - which negates the "no backend" advantage.
- Verdict: bad trust model for OSS telemetry. Skip.
5. Cloudflare Workers + R2¶
- Workers free: 100,000 requests/day (3M/month), 10 ms CPU/request. Plenty.
- R2 free: 10 GB storage, 1M Class A (write) ops/month, no egress fees.
- Pattern: Worker validates a static ingest key, appends NDJSON line into a daily R2 object via multipart, or POSTs to a Durable Object that flushes every N seconds.
- Fabric integration: OneLake supports S3-compatible shortcuts, and R2
exposes an S3 API. Create the shortcut with the R2 access key + endpoint
(
https://<accountid>.r2.cloudflarestorage.com). Verified pattern by multiple community blogs; the shortcut caches reads, so cross-cloud egress is minimal. - Total: 0 EUR/month at 100k and 1M.
- Cons: data lives at Cloudflare (US/EU edge), so the privacy notice must say "telemetry is stored at Cloudflare R2 (EU) and analyzed in Microsoft Fabric".
6. Fabric Eventstream Custom Endpoint (HTTPS)¶
- The eventstream custom endpoint requires the workspace to live on a Fabric capacity (or a Trial). There is no serverless/0-CU path for ingestion.
- Smallest paid SKU = F2 at ~0.18 EUR/hr PAYG = ~130 EUR/month 24/7, or ~88 EUR/month with 1-year reservation. You can pause F2, but then events that arrive while paused are dropped.
- Eventstream processing itself is metered only while data flows, but the underlying capacity is hourly. This is the gold-standard architecture - and the wrong fit for a personal OSS project.
- Verdict: ideal experience, unacceptable cost. Skip.
7. Static Web Apps + Managed Functions¶
- SWA Free tier includes managed Functions, but the managed-functions runtime is restricted to ~1M executions/month bundled with SWA - effectively the same as bare Functions Consumption, plus a static landing/privacy page for free.
- Same downstream story as option 1 (write to Storage, shortcut into Fabric).
- Marginal upside: free HTTPS hostname + privacy page co-located with the API.
8. GitHub Actions cron + Git-stored data¶
- Rate-limited, no public ingress endpoint, committing telemetry to a public repo would PII-leak install IDs forever. Ruled out, as requested.
Comparison¶
| # | Option | 100k/mo | 1M/mo | Setup | Latency to query | Fabric-native |
|---|---|---|---|---|---|---|
| 1 | Functions + ADLS | ~0 EUR | <1 EUR | 2 | minutes (shortcut) | Lakehouse + Notebook |
| 2 | App Insights (+export) | 0 EUR | ~1 EUR | 1 | minutes-hours | indirect |
| 3 | Event Hubs Basic | ~8 EUR | ~8 EUR | 3 | seconds | needs F SKU |
| 4 | Direct SAS to Storage | ~0 EUR | ~0 EUR | 1 | minutes | Lakehouse |
| 5 | Cloudflare Workers + R2 | 0 EUR | 0 EUR | 2 | minutes (shortcut) | Lakehouse + Notebook |
| 6 | Fabric Eventstream (F2) | ~88 EUR | ~88 EUR | 3 | seconds | Eventhouse/KQL |
| 7 | SWA + Functions | 0 EUR | <1 EUR | 2 | minutes | Lakehouse |
| 8 | GitHub Actions | n/a | n/a | n/a | n/a | n/a |
Recommendation: Azure Functions Consumption + ADLS Gen2 -> OneLake shortcut¶
+----------------+ HTTPS POST +-------------------------+
| OFEM (macOS) | /v1/events + key | Azure Function (Cons.) |
| opt-out | ---------------------> | Node/.NET, validates, |
| telemetry SDK | | appends NDJSON line |
+----------------+ +------------+------------+
|
v
+------------+-------------+
| ADLS Gen2 Storage Acct |
| container: telemetry/ |
| YYYY/MM/DD/HH.jsonl |
+------------+-------------+
|
OneLake Shortcut
(zero copy)
v
+------------+-------------+
| Fabric Lakehouse |
| Files/telemetry/... |
| + Notebook job (trial |
| capacity / on-demand) |
| -> Delta table |
+--------------------------+
Why this beats the alternatives¶
It is the only option that is simultaneously free at projected volume, fully in Azure (matching Sam's ecosystem and the privacy story end-users will expect from a Microsoft-Fabric-adjacent tool), avoids any 24/7 capacity cost, and lands raw JSONL in a storage account that Fabric reads zero-copy via a OneLake shortcut. Option 5 (Cloudflare R2) is equally free but moves data outside Azure (extra disclosure, extra provider trust); option 2 (App Insights) is great for live debugging but indirect to Fabric; options 3 and 6 cost real money for streaming guarantees we don't need at ~1 event/second. Setup is one Bicep file plus a 30-line HTTP-trigger function, and the same architecture scales to 10M events/month before any line item crosses 5 EUR.