Text Generation
Transformers
Safetensors
PEFT
English
qwen2
code-review
security-analysis
static-analysis
python
code-quality
qlora
fine-tuned
sql-injection
vulnerability-detection
python-security
code-optimization
conversational
text-generation-inference
Instructions to use alenphilip/Code_Review_Assistant_Model with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use alenphilip/Code_Review_Assistant_Model with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="alenphilip/Code_Review_Assistant_Model") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("alenphilip/Code_Review_Assistant_Model") model = AutoModelForCausalLM.from_pretrained("alenphilip/Code_Review_Assistant_Model") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - PEFT
How to use alenphilip/Code_Review_Assistant_Model with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use alenphilip/Code_Review_Assistant_Model with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "alenphilip/Code_Review_Assistant_Model" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "alenphilip/Code_Review_Assistant_Model", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/alenphilip/Code_Review_Assistant_Model
- SGLang
How to use alenphilip/Code_Review_Assistant_Model with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "alenphilip/Code_Review_Assistant_Model" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "alenphilip/Code_Review_Assistant_Model", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "alenphilip/Code_Review_Assistant_Model" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "alenphilip/Code_Review_Assistant_Model", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use alenphilip/Code_Review_Assistant_Model with Docker Model Runner:
docker model run hf.co/alenphilip/Code_Review_Assistant_Model
| library_name: transformers | |
| license: cc-by-nc-4.0 | |
| tags: | |
| - code-review | |
| - security-analysis | |
| - static-analysis | |
| - python | |
| - code-quality | |
| - peft | |
| - qlora | |
| - fine-tuned | |
| - sql-injection | |
| - vulnerability-detection | |
| - python-security | |
| - code-optimization | |
| pipeline_tag: text-generation | |
| datasets: | |
| - alenphilip/Code-Review-Assistant | |
| - alenphilip/Code-Review-Assistant-Eval | |
| language: | |
| - en | |
| metrics: | |
| - rouge | |
| - bleu | |
| base_model: | |
| - Qwen/Qwen2.5-7B-Instruct | |
| # Code Review Assistant Model | |
| <!-- Provide a quick summary of what the model is/does. --> | |
| A specialized Python code review assistant fine-tuned for security analysis, performance optimization, and Pythonic code quality. The model identifies security vulnerabilities, performance issues, and provides corrected code examples with detailed explanations specifically for Python codebases. | |
| ## Model Details | |
| ### Model Description | |
| This model is a fine-tuned version of Qwen2.5-7B-Instruct, specifically optimized for Python code analysis. It excels at detecting security vulnerabilities, performance bottlenecks, and code quality issues while providing actionable fixes with corrected code examples. | |
| - **Developed by:** Alen Philip | |
| - **Model type:** Causal Language Model | |
| - **Language(s) (NLP):** English, with specialized Python code understanding | |
| - **License:** cc-by-nc-4.0 | |
| - **Finetuned from model:** Qwen/Qwen2.5-7B-Instruct | |
| - **Supported Languages:** Python only | |
| ### Model Sources | |
| - **Repository:** [Hugging Face Hub](https://huggingface.co/alenphilip/Code_Review_Assistant_Model) | |
| - **Base Model:** [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) | |
| - **Training Dataset:** [Code Review Dataset](https://huggingface.co/datasets/alenphilip/Code-Review-Assistant) | |
| - **Evaluation Dataset** [Code Review(Eval) Dataset](https://huggingface.co/datasets/alenphilip/Code-Review-Assistant-Eval) | |
| ## Uses | |
| ### Direct Use | |
| This model is specifically designed for: | |
| - Automated Python code review in development pipelines | |
| - Security vulnerability detection in Python code | |
| - Python code quality assessment and improvement suggestions | |
| - Performance optimization recommendations for Python applications | |
| - Educational purposes for learning Python best practices | |
| - Integration into Python IDEs and code editors | |
| ### Downstream Use | |
| The model can be integrated into: | |
| - CI/CD pipelines for automated Python code review | |
| - Python code quality monitoring tools | |
| - Security scanning platforms for Python applications | |
| - Educational platforms for Python programming | |
| - Code review assistance tools for Python developers | |
| ### Out-of-Scope Use | |
| - Analysis of non-Python programming languages | |
| - Non-code related text generation | |
| - Legal or compliance advice | |
| - Production deployment without human validation | |
| - Real-time security monitoring without additional safeguards | |
| ## Bias, Risks, and Limitations | |
| - **Language Specificity:** Only trained on Python code - will not perform well on other programming languages | |
| - **False Positives/Negatives:** May occasionally miss edge cases or flag non-issues | |
| - **Training Data Bias:** Reflects patterns and conventions present in the training dataset | |
| - **Security Critical Systems:** Should not be sole security measure for critical systems | |
| ### Recommendations | |
| Users should: | |
| - Always validate model suggestions with human review | |
| - Use as assistant tool rather than autonomous system | |
| - Test suggested fixes thoroughly before deployment | |
| - Combine with other security scanning tools for critical applications | |
| ## How to Get Started with the Model | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForCausalLM | |
| import torch | |
| model_name = "alenphilip/Code_Review_Assistant_Model" | |
| tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) | |
| model = AutoModelForCausalLM.from_pretrained( | |
| model_name, | |
| torch_dtype=torch.bfloat16, | |
| device_map="auto", | |
| trust_remote_code=True | |
| ) | |
| # Example usage for code review | |
| def review_python_code(code_snippet): | |
| messages = [ | |
| {"role": "system", "content": "You are a helpful AI assistant specialized in code review and security analysis."}, | |
| {"role": "user", "content": f"Review this Python code and provide improvements with fixed code:\n\n```python\n{code_snippet}\n```"} | |
| ] | |
| text = tokenizer.apply_chat_template( | |
| messages, | |
| tokenize=False, | |
| add_generation_prompt=False | |
| ) | |
| inputs = tokenizer(text, return_tensors="pt").to(model.device) | |
| outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.1) | |
| response = tokenizer.decode(outputs[0], skip_special_tokens=True) | |
| return response | |
| # Test with vulnerable code | |
| vulnerable_code = ''' | |
| def get_user_by_email(email): | |
| query = "SELECT * FROM users WHERE email = '" + email + "'" | |
| cursor.execute(query) | |
| return cursor.fetchone() | |
| ''' | |
| result = review_python_code(vulnerable_code) | |
| print(result) | |
| ``` | |
| #### OR | |
| ```python | |
| # Use a pipeline as a high-level helper | |
| from transformers import pipeline | |
| pipe = pipeline("text-generation", model="alenphilip/Code_Review_Assistant_Model") | |
| prompt = "Review this Python code and provide improvements with fixed code:\n\n```python\nclass LockManager:\n def __init__(self, lock1, lock2):\n self.lock1 = lock1\n self.lock2 = lock2\n\n def acquire_both(self):\n self.lock1.acquire()\n self.lock2.acquire() # This might fail\n\n def release_both(self):\n self.lock1.release()\n self.lock2.release()\n```" | |
| messages = [ | |
| {"role": "system", "content": "You are a helpful AI assistant specialized in code review and security analysis."}, | |
| {"role": "user", "content": prompt}, | |
| ] | |
| result = pipe(messages) | |
| conversation = result[0]['generated_text'] | |
| for message in conversation: | |
| print(f"\n{message['role'].upper()}:") | |
| print("-" * 50) | |
| print(message['content']) | |
| print() | |
| print("=" * 70) | |
| ``` | |
| # Training Details | |
| ## Training Data | |
| The model was trained on a comprehensive dataset of Python code review examples covering: | |
| ### 🔐 SECURITY | |
| - SQL Injection Prevention | |
| - XSS Prevention in Web Frameworks | |
| - Authentication Bypass Vulnerabilities | |
| - Insecure Deserialization | |
| - Command Injection Prevention | |
| - JWT Token Security | |
| - Hardcoded Secrets Detection | |
| - Input Validation & Sanitization | |
| - Secure File Upload Handling | |
| - Broken Access Control | |
| - Password Hashing & Storage | |
| ### ⚡ PERFORMANCE | |
| - Algorithm Complexity Optimization | |
| - Database Query Optimization | |
| - Memory Leak Detection | |
| - I/O Bound Operations Optimization | |
| - CPU Bound Operations Optimization | |
| - Async/Await Performance | |
| - Caching Strategies Implementation | |
| - Loop Optimization Techniques | |
| - Data Structure Selection | |
| - Concurrent Execution Patterns | |
| ### 🐍 PYTHONIC CODE | |
| - Type Hinting Implementation | |
| - Mutable Default Arguments | |
| - Context Manager Usage | |
| - Decorator Best Practices | |
| - List/Dict/Set Comprehensions | |
| - Class Design Principles | |
| - Dunder Method Implementation | |
| - Property Decorator Usage | |
| - Generator Expressions | |
| - Class vs Static Methods | |
| - Import Organization | |
| - Exception Handling & Hierarchy | |
| - EAFP vs LBYL Patterns | |
| - Basic syntax validation | |
| - Variable scope validation | |
| - Type Operation Compatibility | |
| ### 🔧 PRODUCTION RELIABILITY | |
| - Error Handling and Logging | |
| ## Training Procedure | |
| [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/alenphilip2071-google/huggingface/runs/d27nrifd) | |
| ### Training Hyperparameters | |
| - **Training regime:** bf16 mixed precision with SFT & QLoRA | |
| - **Base Model:** Qwen2.5-7B-Instruct | |
| - **LoRA Rank:** 32 | |
| - **LoRA Alpha:** 64 | |
| - **LoRA Dropout:** 0.1 | |
| - **Learning Rate:** 2e-4 | |
| - **Batch Size:** 16 (with gradient accumulation 4) | |
| - **Epochs:** 2 | |
| - **Max Sequence Length:** 2048 tokens | |
| - **Optimizer:** Paged AdamW 8-bit | |
| ### Speeds, Sizes, Times | |
| - **Base Model Size:** 7B parameters | |
| - **Adapter Size:** ~45MB | |
| - **Training Time:** ~68 minutes for 400 steps | |
| - **Training Examples:** 13,670 training, 1,726 evaluation | |
| ## Evaluation | |
| ### Metrics | |
| - **ROUGE-L:** 0.754 | |
| - **BLEU:** 61.99 | |
| - **Validation Loss:** 0.595 | |
| ## Results | |
| The model achieved strong performance on code review tasks, particularly excelling at: | |
| - Security vulnerability detection (SQL injection, XSS, etc.) | |
| - Pythonic code improvements | |
| - Performance optimization suggestions | |
| - Providing corrected code examples | |
| ## Summary | |
| The model demonstrates excellent capability in identifying and fixing common Python code issues, with particular strength in security vulnerability detection and code quality improvements. | |
| ## Environmental Impact | |
| Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact/#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). | |
| - Hardware Type: NVIDIA H100 80GB VRAM | |
| - Hours used: ~1.5 hours | |
| - Training Approach: QLoRA for efficient fine-tuning | |
| ## Technical Specifications | |
| ### Model Architecture and Objective | |
| - **Architecture:** Transformer-based causal language model | |
| - **Objective:** Supervised fine-tuning for code review tasks | |
| - **Context Window:** 32K tokens (base model) | |
| ### Compute Infrastructure | |
| **Hardware** | |
| - Training performed on GPU cluster with NVIDIA H100 80GB VRAM | |
| **Software** | |
| - Transformers, PEFT, TRL, BitsAndBytes | |
| - QLoRA for parameter-efficient fine-tuning | |
| ## Citation | |
| ```bibtex | |
| @misc{alen_philip_george_2025, | |
| author = {Alen Philip George}, | |
| title = {Code_Review_Assistant_Model (Revision 233d438)}, | |
| year = 2025, | |
| url = {https://huggingface.co/alenphilip/Code_Review_Assistant_Model}, | |
| doi = {10.57967/hf/6836}, | |
| publisher = {Hugging Face} | |
| } | |
| ``` | |
| ## Model Card Authors | |
| Alen Philip George | |
| ## Model Card Contact | |
| Hugging Face: [alenphilip](https://huggingface.co/alenphilip) | |
| LinkedIn: [alenphilipgeorge](https://linkedin.com/in/alen-philip-george-130226254) | |
| Email: [alenphilipgeorge@gmail.com](mailto:alenphilipgeorge@gmail.com) | |
| For questions about this model, please use the Hugging Face model repository discussions or contact via the above channels. |