| | --- |
| | title: jobin-dsri |
| | emoji: π§ͺ |
| | colorFrom: blue |
| | colorTo: green |
| | sdk: docker |
| | app_port: 8000 |
| | pinned: false |
| | license: apache-2.0 |
| | --- |
| | |
| | # ML Inference Service |
| |
|
| | FastAPI service for serving ML models over HTTP. Comes with ResNet-18 for image classification out of the box, but you can swap in any model you want. |
| |
|
| | ## Quick Start |
| |
|
| | **Install `uv`:** |
| | https://docs.astral.sh/uv/getting-started/installation/ |
| |
|
| | **Local development:** |
| | ```bash |
| | # Install dependencies |
| | make setup |
| | source venv/bin/activate |
| | |
| | # Download the example model |
| | make download |
| | |
| | # Run it |
| | make serve |
| | ``` |
| |
|
| | In a second terminal: |
| | ```bash |
| | # Process an example input |
| | ./prompt.sh cat.json |
| | ``` |
| |
|
| | Server runs on `http://127.0.0.1:8000`. Check `/docs` for the interactive API documentation. |
| |
|
| | **Docker:** |
| | ```bash |
| | # Build |
| | make docker-build |
| | |
| | # Run |
| | make docker-run |
| | ``` |
| |
|
| | ## Testing the API |
| |
|
| | ```bash |
| | # Using curl |
| | curl -X POST http://localhost:8000/predict \ |
| | -H "Content-Type: application/json" \ |
| | -d '{ |
| | "image": { |
| | "mediaType": "image/jpeg", |
| | "data": "<base64-encoded-image>" |
| | } |
| | }' |
| | ``` |
| |
|
| | Example response: |
| | ```json |
| | { |
| | "logprobs": [-0.859380304813385,-1.2701971530914307,-2.1918208599090576,-1.69235098361969], |
| | "localizationMask": { |
| | "mediaType":"image/png", |
| | "data":"iVBORw0KGgoAAAANSUhEUgAAA8AAAAKDAQAAAAD9Fl5AAAAAu0lEQVR4nO3NsREAMAgDMWD/nZMVKEwn1T5/FQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMCl3g5f+HC24TRhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAj70gwKsTlmdBwAAAABJRU5ErkJggg==" |
| | } |
| | } |
| | ``` |
| |
|
| | ## Project Structure |
| |
|
| | ``` |
| | example-submission/ |
| | βββ main.py # Entry point |
| | βββ app/ |
| | β βββ core/ |
| | β β βββ app.py # <= INSTANTIATE YOUR DETECTOR HERE |
| | β β βββ logging.py # Logging setup |
| | β βββ api/ |
| | β β βββ models.py # Request/response schemas |
| | β β βββ controllers.py # Business logic |
| | β β βββ routes/ |
| | β β βββ prediction.py # POST /predict |
| | β βββ services/ |
| | β βββ base.py # <= YOUR DETECTOR IMPLEMENTS THIS INTERFACE |
| | β βββ inference.py # Example service based on ResNet-18 |
| | βββ models/ |
| | β βββ microsoft/ |
| | β βββ resnet-18/ # Model weights and config |
| | βββ scripts/ |
| | β βββ model_download.bash |
| | β βββ generate_test_datasets.py |
| | β βββ test_datasets.py |
| | βββ Dockerfile |
| | βββ .env.example # Environment config template |
| | βββ cat.json # An example /predict request object |
| | βββ makefile |
| | βββ prompt.sh # Script that makes a /predict request |
| | βββ requirements.in |
| | βββ requirements.txt |
| | βββ response.json # An example /predict response object |
| | βββ |
| | ``` |
| |
|
| | ## How to Plug In Your Own Model |
| |
|
| | To integrate your model, implement the `InferenceService` abstract class defined in `app/services/base.py`. You can follow the example implementation in `app/services/inference.py`, which is based on ResNet-18. After implementing the required interface, instantiate your model in the `lifespan()` function in `app/core/app.py`, replacing the `ResNetInferenceService` instance. |
| |
|
| | ### Step 1: Create Your Service Class |
| |
|
| | ```python |
| | # app/services/your_model_service.py |
| | from app.services.base import InferenceService |
| | from app.api.models import ImageRequest, PredictionResponse |
| | |
| | class YourModelService(InferenceService[ImageRequest, PredictionResponse]): |
| | def __init__(self, model_name: str): |
| | self.model_name = model_name |
| | self.model_path = f"models/{model_name}" |
| | self.model = None |
| | self._is_loaded = False |
| | |
| | def load_model(self) -> None: |
| | """Load your model here. Called once at startup.""" |
| | self.model = load_your_model(self.model_path) |
| | self._is_loaded = True |
| | |
| | def predict(self, request: ImageRequest) -> PredictionResponse: |
| | """Actual inference happens here.""" |
| | image = decode_base64_image(request.image.data) |
| | result = self.model(image) |
| | |
| | logprobs = ... |
| | mask = ... |
| | |
| | return PredictionResponse( |
| | logprobs=logprobs, |
| | localizationMask=mask, |
| | ) |
| | |
| | @property |
| | def is_loaded(self) -> bool: |
| | return self._is_loaded |
| | ``` |
| |
|
| | ### Step 2: Register Your Service |
| |
|
| | Open `app/core/app.py` and find the lifespan function: |
| |
|
| | ```python |
| | # Change this line: |
| | service = ResNetInferenceService(model_name="microsoft/resnet-18") |
| | |
| | # To this: |
| | service = YourModelService(...) |
| | ``` |
| |
|
| | That's it. The `/predict` endpoint now serves your model. |
| |
|
| | ### Model Files |
| |
|
| | Put your model files under the `models/` directory: |
| |
|
| | ``` |
| | models/ |
| | βββ your-org/ |
| | βββ your-model/ |
| | βββ config.json |
| | βββ weights.bin |
| | βββ (other files) |
| | ``` |
| |
|
| | ## Configuration |
| |
|
| | Settings are managed via environment variables or a `.env` file. See `.env.example` for all available options. |
| |
|
| | **Default values:** |
| | - `APP_NAME`: "ML Inference Service" |
| | - `APP_VERSION`: "0.1.0" |
| | - `DEBUG`: false |
| | - `HOST`: "0.0.0.0" |
| | - `PORT`: 8000 |
| | - `MODEL_NAME`: "microsoft/resnet-18" |
| |
|
| | **To customize:** |
| | ```bash |
| | # Copy the example |
| | cp .env.example .env |
| | |
| | # Edit values |
| | vim .env |
| | ``` |
| |
|
| | Or set environment variables directly: |
| | ```bash |
| | export MODEL_NAME="google/vit-base-patch16-224" |
| | uvicorn main:app --reload |
| | ``` |
| |
|
| | ## Deployment |
| |
|
| | **Development:** |
| | ```bash |
| | uvicorn main:app --reload |
| | ``` |
| |
|
| | **Production:** |
| | ```bash |
| | gunicorn main:app -w 4 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000 |
| | ``` |
| |
|
| | The service runs on CPU by default. For GPU inference, install CUDA-enabled PyTorch and modify your service to move tensors to the GPU device. |
| |
|
| | **Docker:** |
| | - Multi-stage build keeps the image small |
| | - Runs as non-root user (`appuser`) |
| | - Python dependencies installed in user site-packages |
| | - Model files baked into the image |
| |
|
| | ## What Happens When You Start the Server |
| |
|
| | ``` |
| | INFO: Starting ML Inference Service... |
| | INFO: Initializing ResNet service: models/microsoft/resnet-18 |
| | INFO: Loading model from models/microsoft/resnet-18 |
| | INFO: Model loaded: 1000 classes |
| | INFO: Startup completed successfully |
| | INFO: Uvicorn running on http://0.0.0.0:8000 |
| | ``` |
| |
|
| | If you see "Model directory not found", check that your model files exist at the expected path with the full org/model structure. |
| |
|
| | ## API Reference |
| |
|
| | **Endpoint:** `POST /predict` |
| |
|
| | **Request:** |
| | ```json |
| | { |
| | "image": { |
| | "mediaType": "image/jpeg", // or "image/png" |
| | "data": "<base64 string>" |
| | } |
| | } |
| | ``` |
| |
|
| | **Response:** |
| | ```json |
| | { |
| | "logprobs": [float], // Log-probabilities of each label |
| | "localizationMask": { // [Optional] binary mask |
| | "mediaType": "image/png", // Always png |
| | "data": "<base64 string>" // Image data |
| | } |
| | } |
| | ``` |
| |
|
| | **Docs:** |
| | - Swagger UI: `http://localhost:8000/docs` |
| | - ReDoc: `http://localhost:8000/redoc` |
| | - OpenAPI JSON: `http://localhost:8000/openapi.json` |
| |
|
| | ## PyArrow Test Datasets |
| |
|
| | We've included a test dataset system for validating your model. It generates 100 standardized test cases covering normal inputs, edge cases, performance benchmarks, and model comparisons. |
| |
|
| | ### Generate Datasets |
| |
|
| | ```bash |
| | python scripts/generate_test_datasets.py |
| | ``` |
| |
|
| | This creates: |
| | - `scripts/test_datasets/*.parquet` - Test data (images, requests, expected responses) |
| | - `scripts/test_datasets/*_metadata.json` - Human-readable descriptions |
| | - `scripts/test_datasets/datasets_summary.json` - Overview of all datasets |
| |
|
| | ### Run Tests |
| |
|
| | ```bash |
| | # Start your service first |
| | make serve |
| | ``` |
| |
|
| | In another terminal: |
| |
|
| | ```bash |
| | # Quick test (5 samples per dataset) |
| | python scripts/test_datasets.py --quick |
| | |
| | # Full validation |
| | python scripts/test_datasets.py |
| | |
| | # Test specific category |
| | python scripts/test_datasets.py --category edge_case |
| | ``` |
| |
|
| | ### Dataset Categories (25 datasets each) |
| |
|
| | **1. Standard Tests** (`standard_test_*.parquet`) |
| | - Normal images: random patterns, shapes, gradients |
| | - Common sizes: 224x224, 256x256, 299x299, 384x384 |
| | - Formats: JPEG, PNG |
| | - Purpose: Baseline validation |
| |
|
| | **2. Edge Cases** (`edge_case_*.parquet`) |
| | - Tiny images (32x32, 1x1) |
| | - Huge images (2048x2048) |
| | - Extreme aspect ratios (1000x50) |
| | - Corrupted data, malformed requests |
| | - Purpose: Test error handling |
| |
|
| | **3. Performance Benchmarks** (`performance_test_*.parquet`) |
| | - Batch sizes: 1, 5, 10, 25, 50, 100 images |
| | - Latency and throughput tracking |
| | - Purpose: Performance profiling |
| |
|
| | **4. Model Comparisons** (`model_comparison_*.parquet`) |
| | - Same inputs across different architectures |
| | - Models: ResNet-18/50, ViT, ConvNext, Swin |
| | - Purpose: Cross-model benchmarking |
| |
|
| | ### Test Output |
| |
|
| | ``` |
| | DATASET TESTING SUMMARY |
| | ============================================================ |
| | Datasets tested: 100 |
| | Successful datasets: 95 |
| | Failed datasets: 5 |
| | Total samples: 1,247 |
| | Overall success rate: 87.3% |
| | Test duration: 45.2s |
| | |
| | Performance: |
| | Avg latency: 123.4ms |
| | Median latency: 98.7ms |
| | p95 latency: 342.1ms |
| | Max latency: 2,341.0ms |
| | Requests/sec: 27.6 |
| | |
| | Category breakdown: |
| | standard: 25 datasets, 94.2% avg success |
| | edge_case: 25 datasets, 76.8% avg success |
| | performance: 25 datasets, 91.1% avg success |
| | model_comparison: 25 datasets, 89.3% avg success |
| | ``` |
| |
|
| | ## Common Issues |
| |
|
| | **Port 8000 already in use:** |
| | ```bash |
| | # Find what's using it |
| | lsof -i :8000 |
| | |
| | # Or just use a different port |
| | uvicorn main:app --port 8080 |
| | ``` |
| |
|
| | **Model not loading:** |
| | - Check the path: models should be in `models/<org>/<model-name>/` |
| | - If you're trying to run the example ResNet-based model, make sure you ran `make download` to fetch the model weights. |
| | - Check logs for the exact error |
| |
|
| | **Slow inference:** |
| | - Inference runs on CPU by default |
| | - For GPU: install CUDA PyTorch and modify service to use GPU device |
| | - Consider using smaller models or quantization |
| |
|
| | ## License |
| |
|
| | Apache 2.0 |
| |
|