You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

CVE Risk Scoring Model (Mistral-7B QLoRA)

Model Method Task License

πŸ“‹ Model Description

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 specifically optimized for CVE (Common Vulnerabilities and Exposures) risk assessment and CVSS scoring.

Key Features

  • 🎯 Predicts CVSS scores (0-10 scale)
  • 🚨 Classifies severity levels (Critical, High, Medium, Low)
  • πŸ” Analyzes attack vectors (Network, Adjacent, Local, Physical)
  • ⚑ Assesses exploitability (Access complexity, authentication requirements)
  • πŸ›‘οΈ Evaluates impact (Confidentiality, Integrity, Availability)
  • 🏷️ Identifies CWE categories (Common Weakness Enumeration)

πŸš€ Quick Start

Installation

pip install transformers torch peft bitsandbytes accelerate

Basic Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and tokenizer
model_id = "Swapnanil09/cve-risk-scoring-mistral-qlora"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Prepare prompt
prompt = """<s>[INST] You are a cybersecurity risk assessment model.

Analyze the following CVE and provide:
- CVSS score
- Severity
- Attack Vector
- Access Complexity
- Impact (Confidentiality, Integrity, Availability)

CVE Description:
A remote code execution vulnerability exists in Apache Log4j2 when configured to use a JNDI Lookup. An attacker can exploit this by sending a crafted request containing a malicious JNDI lookup string.
[/INST]"""

# Generate prediction
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    temperature=0.7,
    do_sample=True,
    top_p=0.9,
    repetition_penalty=1.1
)

prediction = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(prediction)

Expected Output

CVSS Score: 9.8
Severity: Critical
Attack Vector: NETWORK
Access Complexity: LOW
Access Authentication: NONE
Impact - Confidentiality: COMPLETE
Impact - Integrity: COMPLETE
Impact - Availability: COMPLETE
CWE: CWE-502 - Deserialization of Untrusted Data

πŸ“Š Performance Metrics

Metric Value
MAE (CVSS Score) 0.85
RMSE (CVSS Score) 1.23
Accuracy within Β±1.0 78.9%
Accuracy within Β±2.0 92.6%
Severity Classification Accuracy 85.0%

Detailed Classification Report

Severity Precision Recall F1-Score
Critical 0.89 0.85 0.87
High 0.83 0.88 0.85
Medium 0.82 0.79 0.80
Low 0.88 0.92 0.90

πŸ”§ Training Details

Base Model

  • Model: mistralai/Mistral-7B-Instruct-v0.2
  • Architecture: Decoder-only transformer (7B parameters)

Fine-Tuning Method

  • Technique: QLoRA (Quantized Low-Rank Adaptation)
  • Quantization: 4-bit NF4 with double quantization
  • LoRA Configuration:
    • Rank (r): 16
    • Alpha: 32
    • Target modules: q_proj, k_proj, v_proj, o_proj
    • Dropout: 0.05

Training Hyperparameters

  • Optimizer: AdamW (8-bit paged)
  • Learning Rate: 3e-4
  • Batch Size: 8 (per device)
  • Gradient Accumulation: 1
  • Epochs: 1
  • Max Sequence Length: 256 tokens
  • Precision: bfloat16
  • Warmup Steps: 10

Dataset

  • Source: Custom CVE dataset with CVSS v2/v3 annotations
  • Size: ~80,000 CVE entries
  • Train/Eval Split: 90/10
  • Features:
    • CVE descriptions
    • CVSS base scores
    • Attack vectors
    • Access complexity
    • Impact metrics (CIA triad)
    • CWE classifications

πŸ“š Use Cases

1. Security Operations Centers (SOC)

Automatically triage and prioritize vulnerability alerts based on predicted severity and CVSS scores.

2. Vulnerability Management

Assess newly discovered vulnerabilities before official CVSS scores are published.

3. Threat Intelligence

Analyze threat reports and security advisories to extract risk metrics.

4. DevSecOps Automation

Integrate into CI/CD pipelines for automated security assessment of dependencies.

5. Security Research

Analyze patterns in vulnerability characteristics and predict potential impact.

⚠️ Limitations and Biases

Known Limitations

  • Training data bias: Model is trained primarily on historical CVE data (pre-2025), which may not fully represent emerging vulnerability classes
  • Context window: Limited to 256-512 tokens, may truncate very detailed CVE descriptions
  • CVSS version: Primarily trained on CVSS v2 data; performance on CVSS v3.1/v4.0 may vary
  • Language: Optimized for English-language CVE descriptions only

Recommended Best Practices

  1. ⚠️ Do not use as sole source of truth - Always validate predictions with official CVSS scores when available
  2. πŸ” Human review required - Critical decisions should involve security expert review
  3. πŸ“Š Confidence thresholds - Implement confidence scoring for production use
  4. πŸ”„ Regular updates - Retrain periodically on new CVE data to maintain accuracy
  5. 🎯 Domain-specific tuning - Consider fine-tuning on organization-specific vulnerability data

πŸ”’ Ethical Considerations

This model is designed for defensive cybersecurity purposes only. Users are responsible for ensuring compliance with:

  • Applicable laws and regulations
  • Responsible disclosure practices
  • Ethical security research guidelines

Prohibited Uses:

  • Facilitating malicious attacks or exploitation
  • Circumventing security measures without authorization
  • Weaponizing vulnerability information

πŸ“– Citation

@misc{cve-risk-scoring-mistral-qlora,
  author = {Swapnanil Chatterjee},
  title = {CVE Risk Scoring Model: Fine-tuned Mistral-7B for Vulnerability Assessment},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/Swapnanil09/cve-risk-scoring-mistral-qlora}}
}

πŸ“œ License

This model is released under the Apache 2.0 License, inheriting from the base Mistral-7B model.

πŸ™ Acknowledgments

  • Mistral AI for the base Mistral-7B-Instruct model
  • NIST NVD for CVE database
  • MITRE for CWE classification system
  • Hugging Face for model hosting and tools

πŸ“ž Contact & Support

πŸ”„ Model Updates

Version Date Changes
v1.0 2025-01-XX Initial release

⚑ Built with QLoRA β€’ πŸ€— Hosted on HuggingFace β€’ πŸ”’ For Defensive Security Only
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Swapnanil09/cve-risk-scoring-mistral-qlora

Adapter
(1080)
this model

Evaluation results