Project Requirements Document
Overview
This document outlines the requirements for implementing rate limiting across the application to protect against abuse, ensure fair usage, and maintain service reliability. Rate limiting controls how many requests a client can make within a given time period.
Implementation Status: Rate limiting is not yet implemented. This PRD describes the complete implementation required to protect all API endpoints.
Security Priority: Rate limiting is a critical security feature. Without it, the application may be more vulnerable to brute force attacks, denial of service, credential stuffing, and email spam abuse.
Product Context
Purpose
Implement rate limiting to:
- Protect against denial of service (DoS) attacks
- Prevent brute force attacks on authentication endpoints
- Stop credential stuffing attempts
- Limit email spam from OTP/notification endpoints
- Ensure fair resource usage across all users
- Maintain service availability during traffic spikes
Target Threats
| Threat | Description | Without Rate Limiting |
|---|---|---|
| DoS Attack | Flooding server with requests | Service becomes unavailable |
| Brute Force | Automated password/OTP guessing | Account compromise |
| Credential Stuffing | Using leaked credentials for mass logins | Unauthorized access |
| Email Spam | Triggering excessive OTP emails | Provider blacklisting, user harassment |
| API Abuse | Scraping or exhausting API resources | Increased costs, degraded performance |
| Resource Exhaustion | Overwhelming database/compute resources | Service degradation for legitimate users |
Success Metrics
- All API endpoints are protected by rate limiting by default
- Authentication endpoints have stricter, customized limits
- Legitimate users never encounter rate limits during normal usage
- Abusive traffic is blocked before consuming significant resources
- Rate limit events are logged for security monitoring
- (You will learn more about structured logging in a later lab. For now, you can log to the console using
console.log().)
- (You will learn more about structured logging in a later lab. For now, you can log to the console using
- System gracefully degrades if Redis becomes unavailable
Implementation Status
| Component | Status | Priority |
|---|---|---|
| Rate limit service | 🔲 TODO | High |
| Redis integration | 🔲 TODO | High |
| tRPC middleware | 🔲 TODO | High |
| Per-endpoint configuration | 🔲 TODO | High |
| Fingerprinting (user/IP) | 🔲 TODO | High |
| Custom error responses | 🔲 TODO | Medium |
| Memory fallback | 🔲 TODO | Medium |
| Monitoring/logging | 🔲 TODO | Medium |
| Frontend error handling | 🔲 TODO | Low |
Prereads
Strongly recommended to read before starting this lab, so that you understand the theory and rationale behind rate limiting and the various implementation details and strategies to ensure a robust solution.
User Stories
Security & Protection
| ID | As a... | I want... | So that... | Priority |
|---|---|---|---|---|
| RL-1 | System operator | All endpoints rate limited by default | New endpoints are automatically protected | High |
| RL-2 | System operator | Authentication endpoints to have stricter limits | Brute force attacks are prevented | High |
| RL-3 | System operator | OTP endpoints to have strict per-email limits | Users aren't harassed with spam emails | High |
| RL-4 | System operator | Rate limiting to work across multiple servers | Attackers can't bypass limits via load balancing | High |
| RL-5 | System operator | Rate limiting to continue if Redis fails | Service remains protected during Redis outages | Medium |
User Experience
| ID | As a... | I want... | So that... | Priority |
|---|---|---|---|---|
| RL-6 | Authenticated user | Normal usage to never trigger rate limits | My experience isn't disrupted | High |
| RL-7 | User | Clear error messages when rate limited | I understand why my request failed | Medium |
| RL-8 | User | To know when I can retry | I don't keep trying and getting blocked | Medium |
| RL-9 | User | Quick page navigation without hitting limits | I can browse normally | High |
Developer Experience
| ID | As a... | I want... | So that... | Priority |
|---|---|---|---|---|
| RL-10 | Developer | To easily customize limits per endpoint | I can tune limits for specific use cases | Medium |
| RL-11 | Developer | To disable rate limiting for specific endpoints | Health checks and internal endpoints work freely | Medium |
| RL-12 | Developer | Rate limiting disabled in test environment | Tests run quickly without artificial delays | Medium |
| RL-13 | Developer | Clear logging of rate limit events | I can debug and monitor the system | Medium |
API Client (Optional, Future Consideration)
These user stories are only relevant for applications with API consumers integrating with our services. You do not have to implement these requirements in this lab. It will be covered in future labs on designing for API consumers.
| ID | As a... | I want... | So that... | Priority |
|---|---|---|---|---|
| RL-16 | API client | Consistent retry-after values in 429 responses | I can implement reliable exponential backoff | - |
| RL-14 | API client | To receive rate limit headers in responses | I can proactively manage my request rate | - |
| RL-18 | API client | Different rate limits for different API keys | I can upgrade for higher limits if needed | - |
Functional Requirements
FR-1: Rate Limit Service
- FR-1.1: System MUST implement a rate limit service that tracks request counts per key
- FR-1.2: Service MUST support configurable limits (points, duration, burst points, burst duration)
- FR-1.3: Service MUST implement bursty rate limiting to allow natural usage patterns
- FR-1.4: Service MUST cache rate limiter instances to avoid recreation overhead
- FR-1.5: Service MUST use the
rate-limiter-flexiblelibrary for implementation
The rate-limiter-flexible library provides robust rate limiting features, including support for Redis storage and bursty limiting. It is generally not recommended to implement your own rate limiting mechanisms.
Refer to its documentation for more details.
FR-2: Storage Backend
- FR-2.1: System MUST use distributed storage (e.g. Redis, DB) as the primary storage for rate limit counters (Redis is preferred)
- FR-2.2: System MUST implement in-memory fallback when the distributed storage is unavailable
- FR-2.3: The rate limiter MUST use insurance strategy for resilience
- FR-2.4: System MUST handle storage connection failures gracefully
- FR-2.5: Rate limit keys MUST be namespaced to prevent collisions
FR-3: Request Fingerprinting
- FR-3.1: System MUST identify requests using a composite key strategy
- FR-3.2: Authenticated requests MUST be keyed by user ID
- This is to ensure that logged-in users are rate limited based on their account, regardless of IP address
- FR-3.3: Unauthenticated requests MUST be keyed by a unique fingerprint (usually the IP address)
- FR-3.4: IPv6 addresses MUST be sanitized for Redis key compatibility
- FR-3.5: Unknown identifiers MUST fall back to a safe default key
FR-4: Middleware Integration
- FR-4.1: Rate limiting MUST be implemented as tRPC middleware
- FR-4.2: Middleware MUST apply to all procedures by default (opt-out approach)
- FR-4.3: Middleware MUST check rate limits BEFORE executing procedure logic
- FR-4.4: Middleware MUST support per-procedure configuration via a metadata configuration
- FR-4.5: Middleware MUST allow procedures to opt-out of rate limiting
- FR-4.6: Middleware MUST not affect testing environments (i.e., disabled when
NODE_ENV === 'test')
FR-5: Default Configuration
- FR-5.1: There should be a default rate limit configuration applied to all endpoints:
- Suggested sustained limit: 2 requests per second
- Suggested bursty limit: 5 requests per 10 seconds
- FR-5.2: Defaults MUST be overridable per procedure
FR-6: Error Handling
- FR-6.1: Rate limit exceeded MUST return
TOO_MANY_REQUESTSerror code - FR-6.2: Error response header MUST include
Retry-Afterindicating when the client can retry - FR-6.3: System MUST log rate limit exceeded events for monitoring
FR-7: Endpoint-Specific Limits
The following endpoints require custom rate limits:
| Endpoint Category | Limit | Duration | Rationale |
|---|---|---|---|
| Login/Auth | 5 | 60s | Prevent brute force & email spam |
| Thread Create | 10 | 60s | Prevent spam |
| Comment Create | 20 | 60s | Allow discussion |
Non-Functional Requirements
NFR-1: Performance
- NFR-1.1: Rate limit check MUST complete within 10ms under normal conditions
- NFR-1.2: Rate limiter instances MUST be cached to avoid recreation
- NFR-1.3: Rate limiter storage operations SHOULD use connection pooling
- NFR-1.4: Memory fallback MUST not significantly impact response time
NFR-2: Reliability
- NFR-2.1: System MUST continue functioning if distributed storage becomes unavailable
- NFR-2.2: Insurance limiter MUST provide degraded protection during distributed storage outage
- NFR-2.3: Rate limiting MUST NOT cause application crashes
- NFR-2.4: System MUST handle malformed IP addresses gracefully
NFR-3: Security
- NFR-3.1: Rate limit keys MUST be derived server-side, never from client input
- NFR-3.2: Rate limiting MUST occur before business logic execution
- NFR-3.3: System MUST NOT leak information about other users' rate limit status
- NFR-3.4: Logs MUST NOT contain sensitive user data
NFR-4: Observability
- NFR-4.1: All rate limit exceeded events SHOULD be logged
- NFR-4.2: Logs SHOULD include key prefix, timestamp, and retry-after value
- NFR-4.3: System SHOULD support metrics export for monitoring dashboards
- NFR-4.4: Alerts SHOULD be configurable for rate limit spikes
NFR-5: Developer Experience
- NFR-5.1: Configuration API MUST be type-safe
- NFR-5.2: Rate limiting MUST be easily toggleable per endpoint
- NFR-5.3: Test utilities SHOULD be provided for testing rate limit behavior
Technical Architecture
Component Diagram
┌─────────────────────────────────────────────────────────────────┐
│ tRPC Router │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌──────────────────┐ ┌───────────────┐ │
│ │ Request │───▶│ Rate Limit │───▶│ Procedure │ │
│ │ │ │ Middleware │ │ Handler │ │
│ └─────────────┘ └────────┬─────────┘ └───────────────┘ │
│ │ │
└──────────────────────────────┼──────────────────────────────────┘
│
▼
┌────────────────────┐
│ Rate Limit │
│ Service │
├────────────────────
│ • checkRateLimit │
│ • createFingerprint│
│ • createLimiter │
└────────┬───────────┘
│
┌──────────────┴───────────────┐
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ Redis Limiter │ │ Memory Limiter │
│ (Primary) │ │ (Fallback) │
├──────────────────┤ ├──────────────────┤
│ • Distributed │ │ • Per-server │
│ • Persistent │ │ • Fast │
│ • Coordinated │ │ • No dependencies│
└──────────────────┘ └──────────────────┘
Possible Data Flow
File Structure
apps/web/src/server/
├── api/
│ └── trpc.ts # Rate limit middleware integration
└── modules/
└── rate-limit/
├── index.ts # Public exports
├── rate-limit.service.ts # Core rate limiting logic
├── types.ts # TypeScript interfaces
└── errors.ts # Custom error classes
packages/redis/
└── src/
└── index.ts # Redis client singleton
API Specification
The files have been scaffolded for you in the lab repository. You need to implement the logic as per the requirements.
Rate Limiter Configuration Interface
interface RateLimiterConfig {
/** Number of points (requests) allowed in the duration window */
points?: number;
/** Duration window in seconds for sustained rate */
duration?: number;
/** Number of points allowed for burst traffic */
burstPoints?: number;
/** Duration window in seconds for burst rate */
burstDuration?: number;
/** Prefix for Redis keys (for namespacing) */
keyPrefix?: string;
}
tRPC Meta Interface
interface Meta {
/**
* Rate limit options for this procedure.
* - undefined: Apply default rate limiting
* - RateLimiterConfig: Apply custom rate limiting
* - null: Disable rate limiting for this procedure
*/
rateLimitOptions?: RateLimiterConfig | null;
}
Rate Limit Service Functions
// Check rate limit for a key, throws TRPCRateLimitError if exceeded
checkRateLimit(params: {
key: string
options?: RateLimiterConfig
}): Promise<void>
// Create a fingerprint key from user/IP information
createRateLimitFingerprint(params: {
userId: string | undefined
ipAddress: string | null
}): string
Error Response
// HTTP 429 Too Many Requests
{
error: {
code: "TOO_MANY_REQUESTS",
message: "Rate limit exceeded. Please try again in 30 seconds.",
}
}
Best Practices Requirements
The implementation SHOULD follow the best practices:
| Best Practice | Requirement | Validation |
|---|---|---|
| BP-1: Middleware Layer | Rate limiting MUST be applied at the middleware layer before business logic | Code review |
| BP-2: Composite Keys | MUST use userId for authenticated, IP for unauthenticated requests | Unit tests |
| BP-3: Bursty Limiting | MUST implement dual-bucket (sustained + burst) rate limiting | Unit tests |
| BP-4: Redis Storage | SHOULD use Redis for distributed coordination with memory fallback | Integration tests |
| BP-5: Error Responses | MUST return 429 status with retryAfterSeconds | API tests |
| BP-6: Per-Endpoint Config | MUST support custom limits via tRPC metadata field | Code review |
| BP-7: Namespaced Keys | MUST use keyPrefix to namespace rate limit keys | Unit tests |
| BP-8: Multiple Scopes | SHOULD support per-procedure and global rate limits | Design review |
| BP-9: Monitoring | MUST log rate limit exceeded events | Log review |
| BP-10: Test Support | SHOULD skip rate limiting (or opt into rate limiting) in test environment | Test verification |
Common Pitfalls to Avoid
| Pitfall | Mitigation | Validation |
|---|---|---|
| Rate limiting after business logic | Middleware checks BEFORE handler | Code review |
| Trusting client identifiers | Derive fingerprint from server context only | Security review |
| Single point of rate limiting | Apply default + custom limits | Design review |
| No fallback for Redis failure | Insurance limiter pattern | Chaos testing |
| Blocking legitimate users | Bursty limiting with appropriate defaults | User testing |
Acceptance Criteria
AC-1: Default Rate Limiting
- All tRPC procedures are rate limited by default
- Default limit is 2 requests per 10 seconds sustained
- Default burst allowance is 5 requests per 10 seconds
- Exceeding limit returns HTTP 429 with retryAfterSeconds
AC-2: Authentication Endpoints
- Login endpoint has custom limit of 5 requests per 60 seconds
- OTP request endpoint has limit of 3 requests per 10 minutes
- Rate limit applies per-email for OTP requests
AC-3: Fingerprinting
- Authenticated requests are keyed by userId
- Unauthenticated requests are keyed by IP address
- IPv6 addresses are properly sanitized
- Fallback to "unknown" for missing identifiers
AC-4: Redis Integration
- Rate limit counters are stored in Redis
- Keys are properly namespaced with prefix
- Memory fallback activates when Redis unavailable
- No application crash on Redis connection failure
AC-5: Developer Experience
- Procedures can opt-out with
rateLimitOptions: null - Procedures can customize with
rateLimitOptions: { ... }
AC-6: User Experience
- Normal browsing (5 page loads in 10 seconds) doesn't trigger limits
- Error message clearly explains rate limit and retry time
- Frontend handles 429 errors gracefully with toast notification
AC-7: Observability
- Rate limit exceeded events are logged with key and retry time
- Logs do not contain sensitive user information
- Key prefix is included in logs for filtering
Implementation Checklist
Phase 1: Core Infrastructure
- Create
packages/rediswith Redis client singleton - Create
rate-limitmodule directory structure - Implement
RateLimiterConfigTypeScript interface - Implement
TRPCRateLimitErrorcustom error class
Phase 2: Rate Limit Service
- Implement
createRateLimiterwith Redis + memory fallback - Implement
BurstyRateLimiterwith dual-bucket approach - Implement
createRateLimitFingerprintfunction - Implement
checkRateLimitfunction with error handling
Phase 3: Middleware Integration
- Add
Metainterface withrateLimitOptions - Implement
rateLimitMiddlewarein tRPC setup - Apply middleware to default procedure
- Add test environment bypass
Phase 4: Endpoint Configuration
- Configure authentication endpoints with custom limits
- Configure OTP endpoints with per-email limits
- Configure read endpoints with higher limits
- Document all custom configurations
Phase 5: Frontend & Polish
- Add frontend error handling for 429 responses
- Add toast notifications for rate limit errors
- Verify logging and monitoring
- Update documentation