YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

This is a text classification model, fully fine-tuned from a allenai/scibert_scivocab_uncased. It re-uses the main BERT model and fits an ordinal regression head on the [CLS] token. The model is fine-tuned on the certainty labels collected in Wurl et al (2024): Understanding Fine-Grained Distortions in Reports for Scientific Finding. The authors originally collect certainty annotations from humans using a 4-point Likert Scale ranging from (1) Uncertain to (4) Certain. Because the resulting datasets suffer from severe class imbalance, we merge the classes (1) Uncertain and (2) Somewhat Uncertain.

Dataset Statistics

There are 1330 examples in the training set and 334 in the test set. Each example is a sentence long. Examples are filtered from the copenlu/spiced dataset to exhibit final score greater or equal than 4.

The original base rates are as follows:

Class	Base Rate in Training set	Base Rate in Test set
0 - Uncertain	5.5970	7.1856
1 - Somewhat Uncertain	15.2985	17.6647
2 - Somewhat Certain	32.3881	33.2335
3 - Certain	46.7164	41.9162

After combining classes 0 and 1, we obtain the base rates below. Note that this mimicks the procedure adopted in the original paper.

Class	Base Rate in Training set	Base Rate in Test set
0 - Uncertain	20.8955	24.8503
1 - Somewhat Certain	32.3881	33.2335
2 - Certain	46.7164	41.9162

Hyperparameter Optimization

The published model represents one of the 29 models different configurations. The selected model maximizes Quadratic Weighted Kappa (implemented using cohen_kappa with quadratic weights), which is better adapted to ordinal problems, such as ordinal scales. Under this metric, a random model would score 0. We adopt this metric as opposed to accuracy or macro F1 to address class imbalances.

Here is the classification report and test set metrics:

17:44:36  INFO        test  loss=0.9565  acc=0.578  QWK=0.5004
17:44:36  INFO      
              precision    recall  f1-score   support

           0       0.58      0.51      0.54        83
           1       0.47      0.46      0.46       111
           2       0.65      0.71      0.68       140

    accuracy                           0.58       334
   macro avg       0.57      0.56      0.56       334
weighted avg       0.57      0.58      0.57       334

We conduct a hyperparameter sweep of the following hyperp

Freeze / Unfreeze
LR: 1e-6 through 1e-3
Batch Size: 16, 32
Hidden Size Dimensions: 256, 128
Warmup Ratio: 0.05, 0.1, 0.2, 0.3
Epochs 30 (with patience)

Usage

from transformers import AutoModelForSequenceClassification, AutoTokenizer

model = AutoModelForSequenceClassification.from_pretrained("cbelem/scibert-certainty-ordinal", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("cbelem/scibert-certainty-ordinal", trust_remote_code=True)

Downloads last month: -

Safetensors

Model size

0.1B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support