fix: Implement distribute locker in Redis for cron jobs
Some checks failed
Deploy to Staging / Build Images (push) Failing after 30s
Deploy to Staging / Deploy to Staging (push) Has been skipped
Deploy to Staging / Verify Staging (push) Has been skipped
Deploy to Staging / Notify Staging Ready (push) Has been skipped
Deploy to Staging / Notify Staging Failure (push) Successful in 6s
Some checks failed
Deploy to Staging / Build Images (push) Failing after 30s
Deploy to Staging / Deploy to Staging (push) Has been skipped
Deploy to Staging / Verify Staging (push) Has been skipped
Deploy to Staging / Notify Staging Ready (push) Has been skipped
Deploy to Staging / Notify Staging Failure (push) Successful in 6s
This commit is contained in:
92
backend/src/core/scheduler/README.md
Normal file
92
backend/src/core/scheduler/README.md
Normal file
@@ -0,0 +1,92 @@
|
||||
# Scheduler Module
|
||||
|
||||
Centralized cron job scheduler using `node-cron` for background tasks.
|
||||
|
||||
## Overview
|
||||
|
||||
The scheduler runs periodic background jobs. In blue-green deployments, **multiple backend containers may run simultaneously**, so all jobs MUST use distributed locking to prevent duplicate execution.
|
||||
|
||||
## Registered Jobs
|
||||
|
||||
| Job | Schedule | Description |
|
||||
|-----|----------|-------------|
|
||||
| Notification processing | 8 AM daily | Process scheduled notifications |
|
||||
| Account purge | 2 AM daily | GDPR compliance - purge deleted accounts |
|
||||
| Backup check | Every minute | Check for due scheduled backups |
|
||||
| Retention cleanup | 4 AM daily | Clean up old backups (also runs after each backup) |
|
||||
|
||||
## Distributed Locking Requirement
|
||||
|
||||
**All scheduled jobs MUST use the `lockService`** from `core/config/redis.ts` to prevent duplicate execution when multiple containers are running.
|
||||
|
||||
### Pattern for New Jobs
|
||||
|
||||
```typescript
|
||||
import { v4 as uuidv4 } from 'uuid';
|
||||
import { lockService } from '../../core/config/redis';
|
||||
import { logger } from '../../core/logging/logger';
|
||||
|
||||
export async function processMyJob(): Promise<void> {
|
||||
const lockKey = 'job:my-job-name';
|
||||
const lockValue = uuidv4();
|
||||
const lockTtlSeconds = 300; // 5 minutes - adjust based on expected job duration
|
||||
|
||||
// Try to acquire lock
|
||||
const acquired = await lockService.acquireLock(lockKey, lockTtlSeconds, lockValue);
|
||||
if (!acquired) {
|
||||
logger.debug('Job already running in another container, skipping');
|
||||
return;
|
||||
}
|
||||
|
||||
try {
|
||||
logger.info('Starting my job');
|
||||
// Do work...
|
||||
logger.info('My job completed');
|
||||
} catch (error) {
|
||||
logger.error('My job failed', { error });
|
||||
throw error;
|
||||
} finally {
|
||||
// Always release the lock
|
||||
await lockService.releaseLock(lockKey, lockValue);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Lock Key Conventions
|
||||
|
||||
Use descriptive, namespaced lock keys:
|
||||
|
||||
| Pattern | Example | Use Case |
|
||||
|---------|---------|----------|
|
||||
| `job:{name}` | `job:notification-processor` | Global jobs (run once) |
|
||||
| `job:{name}:{id}` | `backup:schedule:uuid-here` | Per-entity jobs |
|
||||
|
||||
### Lock TTL Guidelines
|
||||
|
||||
Set TTL longer than the expected job duration, but short enough to recover from crashes:
|
||||
|
||||
| Job Duration | Recommended TTL |
|
||||
|--------------|-----------------|
|
||||
| < 10 seconds | 60 seconds |
|
||||
| < 1 minute | 5 minutes |
|
||||
| < 5 minutes | 15 minutes |
|
||||
| Long-running | 30 minutes + heartbeat |
|
||||
|
||||
## Adding New Jobs
|
||||
|
||||
1. Create job file in the feature's `jobs/` directory
|
||||
2. Implement distributed locking (see pattern above)
|
||||
3. Register in `core/scheduler/index.ts`
|
||||
4. Update this README with the new job
|
||||
|
||||
## Blue-Green Deployment Behavior
|
||||
|
||||
When both blue and green containers are running:
|
||||
|
||||
1. Both schedulers trigger at the same time
|
||||
2. Both attempt to acquire the lock
|
||||
3. Only one succeeds (atomic Redis operation)
|
||||
4. The other skips the job execution
|
||||
5. Lock is released when job completes
|
||||
|
||||
This ensures exactly-once execution regardless of how many containers are running.
|
||||
Reference in New Issue
Block a user