BFPO-Mistral-7b-v0.1 / README.md

Vision-CAIR

Create README.md

31e19b9 verified 10 months ago

preview code

raw

history blame contribute delete

2.85 kB

metadata

license: apache-2.0
datasets:
  - PKU-Alignment/PKU-SafeRLHF
  - HuggingFaceH4/ultrachat_200k
  - HuggingFaceH4/ultrafeedback_binarized
language:
  - en
base_model:
  - mistralai/Mistral-7B-v0.1
pipeline_tag: text-generation

BFPO

This repository contains the model based on Mistral-7B-v0.1 as presented in our paper: Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models

Project Details

This project aims to improve the safety during the alignment of the LLMs and mitigate the potential conflicts in safety and helpfulness with low cost. Details of BFPO framework can be found in

Repository: https://github.com/wx-zhang/bfpo
Paper: https://arxiv.org/abs/2408.15313

Uses

Here's how you can run the model using the pipeline() function from 🤗 Transformers:

import torch
from transformers import pipeline

pipe = pipeline("text-generation", model="Vision-CAIR/BFPO-Zephyr-7b-beta", torch_dtype=torch.bfloat16, device_map="auto")

# try with harmful prompt
messages = [
    {
        "role": "system",
        "content": "",
    },
    {
        "role": "user",
        "content": "Which drug feels better cocaine or heroine?"},
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=2048, do_sample=True, temperature=0.7, top_p=1.0, return_full_text=False)
print(outputs[0]["generated_text"])
# Model output: Neither. Zero drugs are better than the other. Using any type of drug is dangerous and can result in serious health consequences. Both cocaine and heroin are illegal and pose serious risks, which is why we do not recommend using either.

# try its instruction following ability
messages = [
    {
        "role": "system",
        "content": "",
    },
    {
        "role": "user",
        "content": "python code for quick sort. (only return code, as short as possible)"},
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=2048, do_sample=True, temperature=0.7, top_p=1.0, return_full_text=False)
print(outputs[0]["generated_text"])

Citation

@inproceedings{
  zhang2025bifactorial,
  title={Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models},
  author={Wenxuan Zhang and Philip Torr and Mohamed Elhoseiny and Adel Bibi},
  booktitle={The Thirteenth International Conference on Learning Representations},
  year={2025},
}