tanaos-spam-detection-v1: A small but performant base spam detection model

This model was created by Tanaos with the Artifex Python library.

This is a multilingual spam detection model (it supports 15+ languages) based on distilbert-base-multilingual-cased and fine-tuned on a synthetic dataset to classify text as spam or not_spam. It is intended to be used as a first-layer spam filter for email systems, messaging applications or any other text-based communication platform.

The following categories are considered spam:

Unsolicited commercial advertisement or non-commercial proselytizing.
Fraudulent schemes. including get-rich-quick and pyramid schemes.
Phishing attempts. unrealistic offers or announcements.
Content with deceptive or misleading information.
Malware or harmful links.
Adult content or explicit material.
Excessive use of capitalization or punctuation to grab attention.

How to Use

Via the Artifex library (`pip install artifex`)

from artifex import Artifex

spam_detection = Artifex().spam_detection

print(spam_detection("You won an IPhone 16! Click here to claim your prize."))

# >>> [{'label': 'spam', 'score': 0.9945}]

Via the Transformers library

from transformers import pipeline

clf = pipeline("text-classification", model="tanaos/tanaos-spam-detection-v1")

print(clf("You won an IPhone 16! Click here to claim your prize."))

# >>> [{'label': 'spam', 'score': 0.9945}]

Model Description

Base model: distilbert/distilbert-base-multilingual-cased
Task: Text classification (spam detection)
Languages: Multilingual (15+ languages)
Fine-tuning data: A synthetic, custom dataset of spam and not spam examples.

Training Details

This model was trained using the Artifex Python library

pip install artifex

by providing the following instructions and generating 10,000 synthetic training samples:

from artifex import Artifex

spam_detection = Artifex().spam_detection

spam_detection.train(
    spam_content=[
        "Unsolicited commercial advertisement or non-commercial proselytizing",
        "Fraudulent schemes, including get-rich-quick and pyramid schemes",
        "Phishing attempts, unrealistic offers or announcements",
        "Content with deceptive or misleading information",
        "Malware or harmful links",
        "Adult content or explicit material",
        "Excessive use of capitalization or punctuation to grab attention",
    ],
    num_samples=10000
)

Intended Uses

This model is intended to:

Serve as a first-layer spam filter for email systems, messaging applications, or any other text-based communication platform.
Help reduce unwanted or harmful messages by classifying text as spam or not spam.

Not intended for:

Use in high-stakes scenarios where misclassification could lead to significant consequences without further human review.

Downloads last month: 19

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for tanaos/tanaos-spam-detection-v1

Base model

distilbert/distilbert-base-multilingual-cased

Finetuned

(386)

this model

tanaos
/

tanaos-spam-detection-v1

tanaos-spam-detection-v1: A small but performant base spam detection model

How to Use

Via the Artifex library (`pip install artifex`)

Via the Transformers library

Model Description

Training Details

Intended Uses

Model tree for tanaos/tanaos-spam-detection-v1

Dataset used to train tanaos/tanaos-spam-detection-v1

tanaos-spam-detection-v1: A small but performant base spam detection model

How to Use

Via the Artifex library (pip install artifex)

Via the Transformers library

Model Description

Training Details

Intended Uses

Model tree for tanaos/tanaos-spam-detection-v1

Dataset used to train tanaos/tanaos-spam-detection-v1

Via the Artifex library (`pip install artifex`)