fix: Implement distribute locker in Redis for cron jobs

2026-01-01 11:02:54 -06:00
parent ffd8ecd1d0
commit d8ea0c7297
6 changed files with 271 additions and 6 deletions
--- a/backend/src/core/scheduler/README.md
+++ b/backend/src/core/scheduler/README.md
@@ -0,0 +1,92 @@
+# Scheduler Module
+
+Centralized cron job scheduler using `node-cron` for background tasks.
+
+## Overview
+
+The scheduler runs periodic background jobs. In blue-green deployments, **multiple backend containers may run simultaneously**, so all jobs MUST use distributed locking to prevent duplicate execution.
+
+## Registered Jobs
+
+| Job | Schedule | Description |
+|-----|----------|-------------|
+| Notification processing | 8 AM daily | Process scheduled notifications |
+| Account purge | 2 AM daily | GDPR compliance - purge deleted accounts |
+| Backup check | Every minute | Check for due scheduled backups |
+| Retention cleanup | 4 AM daily | Clean up old backups (also runs after each backup) |
+
+## Distributed Locking Requirement
+
+**All scheduled jobs MUST use the `lockService`** from `core/config/redis.ts` to prevent duplicate execution when multiple containers are running.
+
+### Pattern for New Jobs
+
+```typescript
+import { v4 as uuidv4 } from 'uuid';
+import { lockService } from '../../core/config/redis';
+import { logger } from '../../core/logging/logger';
+
+export async function processMyJob(): Promise<void> {
+  const lockKey = 'job:my-job-name';
+  const lockValue = uuidv4();
+  const lockTtlSeconds = 300; // 5 minutes - adjust based on expected job duration
+
+  // Try to acquire lock
+  const acquired = await lockService.acquireLock(lockKey, lockTtlSeconds, lockValue);
+  if (!acquired) {
+    logger.debug('Job already running in another container, skipping');
+    return;
+  }
+
+  try {
+    logger.info('Starting my job');
+    // Do work...
+    logger.info('My job completed');
+  } catch (error) {
+    logger.error('My job failed', { error });
+    throw error;
+  } finally {
+    // Always release the lock
+    await lockService.releaseLock(lockKey, lockValue);
+  }
+}
+```
+
+### Lock Key Conventions
+
+Use descriptive, namespaced lock keys:
+
+| Pattern | Example | Use Case |
+|---------|---------|----------|
+| `job:{name}` | `job:notification-processor` | Global jobs (run once) |
+| `job:{name}:{id}` | `backup:schedule:uuid-here` | Per-entity jobs |
+
+### Lock TTL Guidelines
+
+Set TTL longer than the expected job duration, but short enough to recover from crashes:
+
+| Job Duration | Recommended TTL |
+|--------------|-----------------|
+| < 10 seconds | 60 seconds |
+| < 1 minute | 5 minutes |
+| < 5 minutes | 15 minutes |
+| Long-running | 30 minutes + heartbeat |
+
+## Adding New Jobs
+
+1. Create job file in the feature's `jobs/` directory
+2. Implement distributed locking (see pattern above)
+3. Register in `core/scheduler/index.ts`
+4. Update this README with the new job
+
+## Blue-Green Deployment Behavior
+
+When both blue and green containers are running:
+
+1. Both schedulers trigger at the same time
+2. Both attempt to acquire the lock
+3. Only one succeeds (atomic Redis operation)
+4. The other skips the job execution
+5. Lock is released when job completes
+
+This ensures exactly-once execution regardless of how many containers are running.