BrianIsaac's picture
feat: implement P1 features and production infrastructure
76897aa

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

Rate Limiting Module

Production-ready rate limiting for Gradio applications with Redis support and graceful fallback.

Features

  • Token Bucket Algorithm: Configurable capacity and refill rate
  • Thread-Safe: Works with concurrent requests
  • Async Support: Compatible with async/await handlers
  • Redis Integration: Distributed rate limiting with Lua scripts
  • Graceful Fallback: Automatic in-memory fallback when Redis unavailable
  • Multi-Tier: Support for anonymous, authenticated, and premium users
  • Gradio Integration: Built-in middleware for Gradio applications
  • Production-Ready: Comprehensive error handling and logging

Quick Start

from backend.rate_limiting import (
    TieredRateLimiter,
    GradioRateLimitMiddleware,
    UserTier
)
import gradio as gr

# Create rate limiter
limiter = TieredRateLimiter(
    tier_limits={
        UserTier.ANONYMOUS: (10, 0.1),  # 10 requests, 0.1 refill/sec
    },
    redis_url=None  # Optional Redis URL
)

# Create middleware
middleware = GradioRateLimitMiddleware(limiter)

# Use in Gradio handler
def my_handler(text: str, request: gr.Request = None):
    middleware.enforce(request)  # Raises gr.Error if limit exceeded
    # ... your handler code

Classes

ThreadSafeTokenBucket

In-memory token bucket with thread safety.

from backend.rate_limiting import ThreadSafeTokenBucket

bucket = ThreadSafeTokenBucket(capacity=10, refill_rate=1.0)
result = bucket.consume()

if result.allowed:
    print(f"Request allowed, {result.remaining} remaining")
else:
    print(f"Rate limited, retry after {result.retry_after}s")

AsyncTokenBucket

Async-compatible token bucket.

from backend.rate_limiting import AsyncTokenBucket

bucket = AsyncTokenBucket(capacity=10, refill_rate=1.0)
result = await bucket.consume()

HybridRateLimiter

Redis primary with in-memory fallback.

from backend.rate_limiting import HybridRateLimiter

limiter = HybridRateLimiter(
    capacity=10,
    refill_rate=1.0,
    redis_url="redis://localhost:6379/0",  # Optional
    key_prefix="myapp"
)

result = limiter.consume(identifier="user_123")

TieredRateLimiter

Multi-tier rate limiting.

from backend.rate_limiting import TieredRateLimiter, UserTier

limiter = TieredRateLimiter(
    tier_limits={
        UserTier.ANONYMOUS: (10, 0.1),
        UserTier.AUTHENTICATED: (50, 0.5),
        UserTier.PREMIUM: (200, 2.0),
    },
    redis_url="redis://localhost:6379/0"
)

result = limiter.consume("user_123", UserTier.AUTHENTICATED)

GradioRateLimitMiddleware

Gradio integration middleware.

from backend.rate_limiting import (
    TieredRateLimiter,
    GradioRateLimitMiddleware,
    UserTier
)

limiter = TieredRateLimiter(...)
middleware = GradioRateLimitMiddleware(limiter)

# Check rate limit
info = middleware.check_rate_limit(request)

# Enforce rate limit (raises gr.Error if exceeded)
middleware.enforce(
    request,
    tokens=1,
    error_message="Custom error message"
)

Configuration

Rate limits are configured via capacity and refill rate:

  • Capacity: Maximum number of tokens (burst requests)
  • Refill Rate: Tokens added per second (sustained rate)

Example configurations:

# 10 requests burst, 1 request per 10 seconds sustained
(capacity=10, refill_rate=0.1)

# 50 requests burst, 1 request per 2 seconds sustained
(capacity=50, refill_rate=0.5)

# 100 requests burst, 10 requests per second sustained
(capacity=100, refill_rate=10.0)

Redis Integration

The HybridRateLimiter uses Redis for distributed rate limiting:

  1. Lua scripts for atomic operations
  2. Automatic script caching
  3. Key expiration via TTL
  4. Connection pooling
  5. Graceful fallback to in-memory

Example Redis URL formats:

# Local Redis
redis_url="redis://localhost:6379/0"

# Redis with authentication
redis_url="redis://:password@localhost:6379/0"

# Redis SSL
redis_url="rediss://host:port/0"

# Upstash Redis
redis_url="rediss://user:pass@endpoint:port"

Error Handling

All components include comprehensive error handling:

  • Redis connection failures β†’ automatic fallback to in-memory
  • Invalid requests β†’ safe defaults
  • Rate limit exceeded β†’ clear error messages with retry timing
try:
    middleware.enforce(request)
except gr.Error as e:
    # Gradio will display this error to the user
    print(f"Rate limited: {e}")

Testing

Run the test suite:

python test_rate_limiting_simple.py

Best Practices

  1. Use appropriate tier limits: Set limits based on your application's needs
  2. Use Redis for production: Enables distributed rate limiting across instances
  3. Monitor logs: Watch for rate limit violations
  4. Customise error messages: Provide clear feedback to users
  5. Test fallback: Ensure in-memory fallback works when Redis is down

Example: Full Gradio Integration

import gradio as gr
from backend.config import settings
from backend.rate_limiting import (
    TieredRateLimiter,
    GradioRateLimitMiddleware,
    UserTier
)

# Initialize rate limiter
limiter = TieredRateLimiter(
    tier_limits={
        UserTier.ANONYMOUS: (
            settings.rate_limit_anonymous_capacity,
            settings.rate_limit_anonymous_refill_rate
        ),
    },
    redis_url=settings.redis_url
)

middleware = GradioRateLimitMiddleware(limiter)

# Gradio handler
async def analyse_portfolio(
    portfolio_text: str,
    request: gr.Request = None
):
    # Enforce rate limit
    middleware.enforce(request)

    # Your analysis code here
    return "Analysis complete"

# Gradio interface
with gr.Blocks() as demo:
    text_input = gr.Textbox()
    submit_btn = gr.Button("Analyse")
    output = gr.Textbox()

    submit_btn.click(
        analyse_portfolio,
        inputs=[text_input, request],  # Include request
        outputs=output
    )

demo.launch()

Logging

The module logs important events:

import logging

logger = logging.getLogger('backend.rate_limiting')
logger.setLevel(logging.INFO)

Log messages:

  • INFO: Rate limiter initialisation
  • WARNING: Rate limit exceeded
  • ERROR: Redis errors, fallback activation

Performance

  • In-memory: <1ms overhead per request
  • Redis: ~2-5ms overhead per request
  • Fallback: Automatic, no service interruption

Dependencies

  • redis>=5.0.0 (optional, for distributed rate limiting)
  • upstash-redis>=0.15.0 (optional, for serverless Redis)
  • gradio (for middleware integration)

License

Part of Portfolio Intelligence Platform.