Skip to main content

Project Requirements Document

Overview

This document outlines the requirements for implementing rate limiting across the application to protect against abuse, ensure fair usage, and maintain service reliability. Rate limiting controls how many requests a client can make within a given time period.

info

Implementation Status: Rate limiting is not yet implemented. This PRD describes the complete implementation required to protect all API endpoints.

warning

Security Priority: Rate limiting is a critical security feature. Without it, the application may be more vulnerable to brute force attacks, denial of service, credential stuffing, and email spam abuse.


Product Context

Purpose

Implement rate limiting to:

  • Protect against denial of service (DoS) attacks
  • Prevent brute force attacks on authentication endpoints
  • Stop credential stuffing attempts
  • Limit email spam from OTP/notification endpoints
  • Ensure fair resource usage across all users
  • Maintain service availability during traffic spikes

Target Threats

ThreatDescriptionWithout Rate Limiting
DoS AttackFlooding server with requestsService becomes unavailable
Brute ForceAutomated password/OTP guessingAccount compromise
Credential StuffingUsing leaked credentials for mass loginsUnauthorized access
Email SpamTriggering excessive OTP emailsProvider blacklisting, user harassment
API AbuseScraping or exhausting API resourcesIncreased costs, degraded performance
Resource ExhaustionOverwhelming database/compute resourcesService degradation for legitimate users

Success Metrics

  • All API endpoints are protected by rate limiting by default
  • Authentication endpoints have stricter, customized limits
  • Legitimate users never encounter rate limits during normal usage
  • Abusive traffic is blocked before consuming significant resources
  • Rate limit events are logged for security monitoring
    • (You will learn more about structured logging in a later lab. For now, you can log to the console using console.log().)
  • System gracefully degrades if Redis becomes unavailable

Implementation Status

ComponentStatusPriority
Rate limit service🔲 TODOHigh
Redis integration🔲 TODOHigh
tRPC middleware🔲 TODOHigh
Per-endpoint configuration🔲 TODOHigh
Fingerprinting (user/IP)🔲 TODOHigh
Custom error responses🔲 TODOMedium
Memory fallback🔲 TODOMedium
Monitoring/logging🔲 TODOMedium
Frontend error handling🔲 TODOLow

Prereads

info

Strongly recommended to read before starting this lab, so that you understand the theory and rationale behind rate limiting and the various implementation details and strategies to ensure a robust solution.


User Stories

Security & Protection

IDAs a...I want...So that...Priority
RL-1System operatorAll endpoints rate limited by defaultNew endpoints are automatically protectedHigh
RL-2System operatorAuthentication endpoints to have stricter limitsBrute force attacks are preventedHigh
RL-3System operatorOTP endpoints to have strict per-email limitsUsers aren't harassed with spam emailsHigh
RL-4System operatorRate limiting to work across multiple serversAttackers can't bypass limits via load balancingHigh
RL-5System operatorRate limiting to continue if Redis failsService remains protected during Redis outagesMedium

User Experience

IDAs a...I want...So that...Priority
RL-6Authenticated userNormal usage to never trigger rate limitsMy experience isn't disruptedHigh
RL-7UserClear error messages when rate limitedI understand why my request failedMedium
RL-8UserTo know when I can retryI don't keep trying and getting blockedMedium
RL-9UserQuick page navigation without hitting limitsI can browse normallyHigh

Developer Experience

IDAs a...I want...So that...Priority
RL-10DeveloperTo easily customize limits per endpointI can tune limits for specific use casesMedium
RL-11DeveloperTo disable rate limiting for specific endpointsHealth checks and internal endpoints work freelyMedium
RL-12DeveloperRate limiting disabled in test environmentTests run quickly without artificial delaysMedium
RL-13DeveloperClear logging of rate limit eventsI can debug and monitor the systemMedium

API Client (Optional, Future Consideration)

info

These user stories are only relevant for applications with API consumers integrating with our services. You do not have to implement these requirements in this lab. It will be covered in future labs on designing for API consumers.

IDAs a...I want...So that...Priority
RL-16API clientConsistent retry-after values in 429 responsesI can implement reliable exponential backoff-
RL-14API clientTo receive rate limit headers in responsesI can proactively manage my request rate-
RL-18API clientDifferent rate limits for different API keysI can upgrade for higher limits if needed-

Functional Requirements

FR-1: Rate Limit Service

  • FR-1.1: System MUST implement a rate limit service that tracks request counts per key
  • FR-1.2: Service MUST support configurable limits (points, duration, burst points, burst duration)
  • FR-1.3: Service MUST implement bursty rate limiting to allow natural usage patterns
  • FR-1.4: Service MUST cache rate limiter instances to avoid recreation overhead
  • FR-1.5: Service MUST use the rate-limiter-flexible library for implementation
info

The rate-limiter-flexible library provides robust rate limiting features, including support for Redis storage and bursty limiting. It is generally not recommended to implement your own rate limiting mechanisms. Refer to its documentation for more details.

FR-2: Storage Backend

  • FR-2.1: System MUST use distributed storage (e.g. Redis, DB) as the primary storage for rate limit counters (Redis is preferred)
  • FR-2.2: System MUST implement in-memory fallback when the distributed storage is unavailable
  • FR-2.3: The rate limiter MUST use insurance strategy for resilience
  • FR-2.4: System MUST handle storage connection failures gracefully
  • FR-2.5: Rate limit keys MUST be namespaced to prevent collisions

FR-3: Request Fingerprinting

  • FR-3.1: System MUST identify requests using a composite key strategy
  • FR-3.2: Authenticated requests MUST be keyed by user ID
    • This is to ensure that logged-in users are rate limited based on their account, regardless of IP address
  • FR-3.3: Unauthenticated requests MUST be keyed by a unique fingerprint (usually the IP address)
  • FR-3.4: IPv6 addresses MUST be sanitized for Redis key compatibility
  • FR-3.5: Unknown identifiers MUST fall back to a safe default key

FR-4: Middleware Integration

  • FR-4.1: Rate limiting MUST be implemented as tRPC middleware
  • FR-4.2: Middleware MUST apply to all procedures by default (opt-out approach)
  • FR-4.3: Middleware MUST check rate limits BEFORE executing procedure logic
  • FR-4.4: Middleware MUST support per-procedure configuration via a metadata configuration
  • FR-4.5: Middleware MUST allow procedures to opt-out of rate limiting
  • FR-4.6: Middleware MUST not affect testing environments (i.e., disabled when NODE_ENV === 'test')

FR-5: Default Configuration

  • FR-5.1: There should be a default rate limit configuration applied to all endpoints:
    • Suggested sustained limit: 2 requests per second
    • Suggested bursty limit: 5 requests per 10 seconds
  • FR-5.2: Defaults MUST be overridable per procedure

FR-6: Error Handling

  • FR-6.1: Rate limit exceeded MUST return TOO_MANY_REQUESTS error code
  • FR-6.2: Error response header MUST include Retry-After indicating when the client can retry
  • FR-6.3: System MUST log rate limit exceeded events for monitoring

FR-7: Endpoint-Specific Limits

The following endpoints require custom rate limits:

Endpoint CategoryLimitDurationRationale
Login/Auth560sPrevent brute force & email spam
Thread Create1060sPrevent spam
Comment Create2060sAllow discussion

Non-Functional Requirements

NFR-1: Performance

  • NFR-1.1: Rate limit check MUST complete within 10ms under normal conditions
  • NFR-1.2: Rate limiter instances MUST be cached to avoid recreation
  • NFR-1.3: Rate limiter storage operations SHOULD use connection pooling
  • NFR-1.4: Memory fallback MUST not significantly impact response time

NFR-2: Reliability

  • NFR-2.1: System MUST continue functioning if distributed storage becomes unavailable
  • NFR-2.2: Insurance limiter MUST provide degraded protection during distributed storage outage
  • NFR-2.3: Rate limiting MUST NOT cause application crashes
  • NFR-2.4: System MUST handle malformed IP addresses gracefully

NFR-3: Security

  • NFR-3.1: Rate limit keys MUST be derived server-side, never from client input
  • NFR-3.2: Rate limiting MUST occur before business logic execution
  • NFR-3.3: System MUST NOT leak information about other users' rate limit status
  • NFR-3.4: Logs MUST NOT contain sensitive user data

NFR-4: Observability

  • NFR-4.1: All rate limit exceeded events SHOULD be logged
  • NFR-4.2: Logs SHOULD include key prefix, timestamp, and retry-after value
  • NFR-4.3: System SHOULD support metrics export for monitoring dashboards
  • NFR-4.4: Alerts SHOULD be configurable for rate limit spikes

NFR-5: Developer Experience

  • NFR-5.1: Configuration API MUST be type-safe
  • NFR-5.2: Rate limiting MUST be easily toggleable per endpoint
  • NFR-5.3: Test utilities SHOULD be provided for testing rate limit behavior

Technical Architecture

Component Diagram

┌─────────────────────────────────────────────────────────────────┐
│ tRPC Router │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌──────────────────┐ ┌───────────────┐ │
│ │ Request │───▶│ Rate Limit │───▶│ Procedure │ │
│ │ │ │ Middleware │ │ Handler │ │
│ └─────────────┘ └────────┬─────────┘ └───────────────┘ │
│ │ │
└──────────────────────────────┼──────────────────────────────────┘


┌────────────────────┐
│ Rate Limit │
│ Service │
├────────────────────
│ • checkRateLimit │
│ • createFingerprint│
│ • createLimiter │
└────────┬───────────┘

┌──────────────┴───────────────┐
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ Redis Limiter │ │ Memory Limiter │
│ (Primary) │ │ (Fallback) │
├──────────────────┤ ├──────────────────┤
│ • Distributed │ │ • Per-server │
│ • Persistent │ │ • Fast │
│ • Coordinated │ │ • No dependencies│
└──────────────────┘ └──────────────────┘

Possible Data Flow

File Structure

apps/web/src/server/
├── api/
│ └── trpc.ts # Rate limit middleware integration
└── modules/
└── rate-limit/
├── index.ts # Public exports
├── rate-limit.service.ts # Core rate limiting logic
├── types.ts # TypeScript interfaces
└── errors.ts # Custom error classes

packages/redis/
└── src/
└── index.ts # Redis client singleton

API Specification

info

The files have been scaffolded for you in the lab repository. You need to implement the logic as per the requirements.

Rate Limiter Configuration Interface

interface RateLimiterConfig {
/** Number of points (requests) allowed in the duration window */
points?: number;
/** Duration window in seconds for sustained rate */
duration?: number;
/** Number of points allowed for burst traffic */
burstPoints?: number;
/** Duration window in seconds for burst rate */
burstDuration?: number;
/** Prefix for Redis keys (for namespacing) */
keyPrefix?: string;
}

tRPC Meta Interface

interface Meta {
/**
* Rate limit options for this procedure.
* - undefined: Apply default rate limiting
* - RateLimiterConfig: Apply custom rate limiting
* - null: Disable rate limiting for this procedure
*/
rateLimitOptions?: RateLimiterConfig | null;
}

Rate Limit Service Functions

// Check rate limit for a key, throws TRPCRateLimitError if exceeded
checkRateLimit(params: {
key: string
options?: RateLimiterConfig
}): Promise<void>

// Create a fingerprint key from user/IP information
createRateLimitFingerprint(params: {
userId: string | undefined
ipAddress: string | null
}): string

Error Response

// HTTP 429 Too Many Requests
{
error: {
code: "TOO_MANY_REQUESTS",
message: "Rate limit exceeded. Please try again in 30 seconds.",
}
}

Best Practices Requirements

The implementation SHOULD follow the best practices:

Best PracticeRequirementValidation
BP-1: Middleware LayerRate limiting MUST be applied at the middleware layer before business logicCode review
BP-2: Composite KeysMUST use userId for authenticated, IP for unauthenticated requestsUnit tests
BP-3: Bursty LimitingMUST implement dual-bucket (sustained + burst) rate limitingUnit tests
BP-4: Redis StorageSHOULD use Redis for distributed coordination with memory fallbackIntegration tests
BP-5: Error ResponsesMUST return 429 status with retryAfterSecondsAPI tests
BP-6: Per-Endpoint ConfigMUST support custom limits via tRPC metadata fieldCode review
BP-7: Namespaced KeysMUST use keyPrefix to namespace rate limit keysUnit tests
BP-8: Multiple ScopesSHOULD support per-procedure and global rate limitsDesign review
BP-9: MonitoringMUST log rate limit exceeded eventsLog review
BP-10: Test SupportSHOULD skip rate limiting (or opt into rate limiting) in test environmentTest verification

Common Pitfalls to Avoid

PitfallMitigationValidation
Rate limiting after business logicMiddleware checks BEFORE handlerCode review
Trusting client identifiersDerive fingerprint from server context onlySecurity review
Single point of rate limitingApply default + custom limitsDesign review
No fallback for Redis failureInsurance limiter patternChaos testing
Blocking legitimate usersBursty limiting with appropriate defaultsUser testing

Acceptance Criteria

AC-1: Default Rate Limiting

  • All tRPC procedures are rate limited by default
  • Default limit is 2 requests per 10 seconds sustained
  • Default burst allowance is 5 requests per 10 seconds
  • Exceeding limit returns HTTP 429 with retryAfterSeconds

AC-2: Authentication Endpoints

  • Login endpoint has custom limit of 5 requests per 60 seconds
  • OTP request endpoint has limit of 3 requests per 10 minutes
  • Rate limit applies per-email for OTP requests

AC-3: Fingerprinting

  • Authenticated requests are keyed by userId
  • Unauthenticated requests are keyed by IP address
  • IPv6 addresses are properly sanitized
  • Fallback to "unknown" for missing identifiers

AC-4: Redis Integration

  • Rate limit counters are stored in Redis
  • Keys are properly namespaced with prefix
  • Memory fallback activates when Redis unavailable
  • No application crash on Redis connection failure

AC-5: Developer Experience

  • Procedures can opt-out with rateLimitOptions: null
  • Procedures can customize with rateLimitOptions: { ... }

AC-6: User Experience

  • Normal browsing (5 page loads in 10 seconds) doesn't trigger limits
  • Error message clearly explains rate limit and retry time
  • Frontend handles 429 errors gracefully with toast notification

AC-7: Observability

  • Rate limit exceeded events are logged with key and retry time
  • Logs do not contain sensitive user information
  • Key prefix is included in logs for filtering

Implementation Checklist

Phase 1: Core Infrastructure

  • Create packages/redis with Redis client singleton
  • Create rate-limit module directory structure
  • Implement RateLimiterConfig TypeScript interface
  • Implement TRPCRateLimitError custom error class

Phase 2: Rate Limit Service

  • Implement createRateLimiter with Redis + memory fallback
  • Implement BurstyRateLimiter with dual-bucket approach
  • Implement createRateLimitFingerprint function
  • Implement checkRateLimit function with error handling

Phase 3: Middleware Integration

  • Add Meta interface with rateLimitOptions
  • Implement rateLimitMiddleware in tRPC setup
  • Apply middleware to default procedure
  • Add test environment bypass

Phase 4: Endpoint Configuration

  • Configure authentication endpoints with custom limits
  • Configure OTP endpoints with per-email limits
  • Configure read endpoints with higher limits
  • Document all custom configurations

Phase 5: Frontend & Polish

  • Add frontend error handling for 429 responses
  • Add toast notifications for rate limit errors
  • Verify logging and monitoring
  • Update documentation

References