fix: Implement distribute locker in Redis for cron jobs
Some checks failed
Deploy to Staging / Build Images (push) Failing after 30s
Deploy to Staging / Deploy to Staging (push) Has been skipped
Deploy to Staging / Verify Staging (push) Has been skipped
Deploy to Staging / Notify Staging Ready (push) Has been skipped
Deploy to Staging / Notify Staging Failure (push) Successful in 6s

This commit is contained in:
Eric Gullickson
2026-01-01 11:02:54 -06:00
parent ffd8ecd1d0
commit d8ea0c7297
6 changed files with 271 additions and 6 deletions

View File

@@ -3,9 +3,45 @@
## Configuration (`src/core/config/`)
- `config-loader.ts` — Load and validate environment variables
- `database.ts` — PostgreSQL connection pool
- `redis.ts` — Redis client and cache helpers
- `redis.ts` — Redis client, cache helpers, and distributed locking
- `user-context.ts` — User context utilities
### Distributed Lock Service
The `DistributedLockService` in `redis.ts` provides Redis-based distributed locking for preventing duplicate operations across multiple containers (blue-green deployments).
**All scheduled jobs MUST use distributed locking** to prevent duplicate execution when multiple backend containers are running.
```typescript
import { lockService } from '../core/config/redis';
import { v4 as uuidv4 } from 'uuid';
// Acquire lock (returns false if already held)
const lockKey = 'job:my-scheduled-task';
const lockValue = uuidv4(); // Unique identifier for this execution
const ttlSeconds = 300; // Auto-release after 5 minutes
const acquired = await lockService.acquireLock(lockKey, ttlSeconds, lockValue);
if (!acquired) {
// Another container is already running this job
return;
}
try {
// Do work...
} finally {
// Always release the lock
await lockService.releaseLock(lockKey, lockValue);
}
```
**API:**
| Method | Description |
|--------|-------------|
| `acquireLock(key, ttlSeconds, lockValue)` | Acquire lock atomically (SET NX EX) |
| `releaseLock(key, lockValue)` | Release only if we hold it (Lua script) |
| `isLocked(key)` | Check if lock exists |
## Plugins (`src/core/plugins/`)
- `auth.plugin.ts` — Auth0 JWT via JWKS (@fastify/jwt, get-jwks)
- `error.plugin.ts` — Error handling

View File

@@ -82,3 +82,75 @@ export class CacheService {
}
export const cacheService = new CacheService();
/**
* Distributed lock service for preventing concurrent operations across containers
*/
export class DistributedLockService {
private prefix = 'mvp:lock:';
/**
* Attempts to acquire a lock with the given key
* @param key Lock identifier
* @param ttlSeconds Time-to-live in seconds (auto-release)
* @param lockValue Unique identifier for this lock holder
* @returns true if lock acquired, false if already held
*/
async acquireLock(key: string, ttlSeconds: number, lockValue: string): Promise<boolean> {
try {
// SET NX (only if not exists) with EX (expiry)
const result = await redis.set(
this.prefix + key,
lockValue,
'EX',
ttlSeconds,
'NX'
);
return result === 'OK';
} catch (error) {
logger.error('Lock acquisition error', { key, error });
return false;
}
}
/**
* Releases a lock only if we hold it (compare lockValue)
* @param key Lock identifier
* @param lockValue The value used when acquiring the lock
* @returns true if lock was released, false if we didn't hold it
*/
async releaseLock(key: string, lockValue: string): Promise<boolean> {
try {
// Lua script to atomically check and delete
const script = `
if redis.call("get", KEYS[1]) == ARGV[1] then
return redis.call("del", KEYS[1])
else
return 0
end
`;
const result = await redis.eval(script, 1, this.prefix + key, lockValue);
return result === 1;
} catch (error) {
logger.error('Lock release error', { key, error });
return false;
}
}
/**
* Checks if a lock is currently held
* @param key Lock identifier
* @returns true if lock exists
*/
async isLocked(key: string): Promise<boolean> {
try {
const exists = await redis.exists(this.prefix + key);
return exists === 1;
} catch (error) {
logger.error('Lock check error', { key, error });
return false;
}
}
}
export const lockService = new DistributedLockService();

View File

@@ -0,0 +1,92 @@
# Scheduler Module
Centralized cron job scheduler using `node-cron` for background tasks.
## Overview
The scheduler runs periodic background jobs. In blue-green deployments, **multiple backend containers may run simultaneously**, so all jobs MUST use distributed locking to prevent duplicate execution.
## Registered Jobs
| Job | Schedule | Description |
|-----|----------|-------------|
| Notification processing | 8 AM daily | Process scheduled notifications |
| Account purge | 2 AM daily | GDPR compliance - purge deleted accounts |
| Backup check | Every minute | Check for due scheduled backups |
| Retention cleanup | 4 AM daily | Clean up old backups (also runs after each backup) |
## Distributed Locking Requirement
**All scheduled jobs MUST use the `lockService`** from `core/config/redis.ts` to prevent duplicate execution when multiple containers are running.
### Pattern for New Jobs
```typescript
import { v4 as uuidv4 } from 'uuid';
import { lockService } from '../../core/config/redis';
import { logger } from '../../core/logging/logger';
export async function processMyJob(): Promise<void> {
const lockKey = 'job:my-job-name';
const lockValue = uuidv4();
const lockTtlSeconds = 300; // 5 minutes - adjust based on expected job duration
// Try to acquire lock
const acquired = await lockService.acquireLock(lockKey, lockTtlSeconds, lockValue);
if (!acquired) {
logger.debug('Job already running in another container, skipping');
return;
}
try {
logger.info('Starting my job');
// Do work...
logger.info('My job completed');
} catch (error) {
logger.error('My job failed', { error });
throw error;
} finally {
// Always release the lock
await lockService.releaseLock(lockKey, lockValue);
}
}
```
### Lock Key Conventions
Use descriptive, namespaced lock keys:
| Pattern | Example | Use Case |
|---------|---------|----------|
| `job:{name}` | `job:notification-processor` | Global jobs (run once) |
| `job:{name}:{id}` | `backup:schedule:uuid-here` | Per-entity jobs |
### Lock TTL Guidelines
Set TTL longer than the expected job duration, but short enough to recover from crashes:
| Job Duration | Recommended TTL |
|--------------|-----------------|
| < 10 seconds | 60 seconds |
| < 1 minute | 5 minutes |
| < 5 minutes | 15 minutes |
| Long-running | 30 minutes + heartbeat |
## Adding New Jobs
1. Create job file in the feature's `jobs/` directory
2. Implement distributed locking (see pattern above)
3. Register in `core/scheduler/index.ts`
4. Update this README with the new job
## Blue-Green Deployment Behavior
When both blue and green containers are running:
1. Both schedulers trigger at the same time
2. Both attempt to acquire the lock
3. Only one succeeds (atomic Redis operation)
4. The other skips the job execution
5. Lock is released when job completes
This ensures exactly-once execution regardless of how many containers are running.