YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

This is a text classification model, fully fine-tuned from a allenai/scibert_scivocab_uncased. It re-uses the main BERT model and fits an ordinal regression head on the [CLS] token. The model is fine-tuned on the certainty labels collected in Wurl et al (2024): Understanding Fine-Grained Distortions in Reports for Scientific Finding. The authors originally collect certainty annotations from humans using a 4-point Likert Scale ranging from (1) Uncertain to (4) Certain. Because the resulting datasets suffer from severe class imbalance, we merge the classes (1) Uncertain and (2) Somewhat Uncertain.

Dataset Statistics

There are 1330 examples in the training set and 334 in the test set. Each example is a sentence long. Examples are filtered from the copenlu/spiced dataset to exhibit final score greater or equal than 4.

The original base rates are as follows:

Class Base Rate in Training set Base Rate in Test set
0 - Uncertain 5.5970 7.1856
1 - Somewhat Uncertain 15.2985 17.6647
2 - Somewhat Certain 32.3881 33.2335
3 - Certain 46.7164 41.9162

After combining classes 0 and 1, we obtain the base rates below. Note that this mimicks the procedure adopted in the original paper.

Class Base Rate in Training set Base Rate in Test set
0 - Uncertain 20.8955 24.8503
1 - Somewhat Certain 32.3881 33.2335
2 - Certain 46.7164 41.9162

Hyperparameter Optimization

The published model represents one of the 29 models different configurations. The selected model maximizes Quadratic Weighted Kappa (implemented using cohen_kappa with quadratic weights), which is better adapted to ordinal problems, such as ordinal scales. Under this metric, a random model would score 0. We adopt this metric as opposed to accuracy or macro F1 to address class imbalances.

Here is the classification report and test set metrics:

17:44:36  INFO        test  loss=0.9565  acc=0.578  QWK=0.5004
17:44:36  INFO      
              precision    recall  f1-score   support

           0       0.58      0.51      0.54        83
           1       0.47      0.46      0.46       111
           2       0.65      0.71      0.68       140

    accuracy                           0.58       334
   macro avg       0.57      0.56      0.56       334
weighted avg       0.57      0.58      0.57       334

We conduct a hyperparameter sweep of the following hyperp

  • Freeze / Unfreeze
  • LR: 1e-6 through 1e-3
  • Batch Size: 16, 32
  • Hidden Size Dimensions: 256, 128
  • Warmup Ratio: 0.05, 0.1, 0.2, 0.3
  • Epochs 30 (with patience)

Usage

from transformers import AutoModelForSequenceClassification, AutoTokenizer

model = AutoModelForSequenceClassification.from_pretrained("cbelem/scibert-certainty-ordinal", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("cbelem/scibert-certainty-ordinal", trust_remote_code=True)
Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support