YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
This is a text classification model, fully fine-tuned from a allenai/scibert_scivocab_uncased. It re-uses the main BERT model and fits an ordinal regression head on the [CLS] token. The model is fine-tuned on the certainty labels collected in Wurl et al (2024): Understanding Fine-Grained Distortions in Reports for Scientific Finding. The authors originally collect certainty annotations from humans using a 4-point Likert Scale ranging from (1) Uncertain to (4) Certain. Because the resulting datasets suffer from severe class imbalance, we merge the classes (1) Uncertain and (2) Somewhat Uncertain.
Dataset Statistics
There are 1330 examples in the training set and 334 in the test set. Each example is a sentence long. Examples are filtered from the copenlu/spiced dataset to exhibit final score greater or equal than 4.
The original base rates are as follows:
| Class | Base Rate in Training set | Base Rate in Test set |
|---|---|---|
| 0 - Uncertain | 5.5970 | 7.1856 |
| 1 - Somewhat Uncertain | 15.2985 | 17.6647 |
| 2 - Somewhat Certain | 32.3881 | 33.2335 |
| 3 - Certain | 46.7164 | 41.9162 |
After combining classes 0 and 1, we obtain the base rates below. Note that this mimicks the procedure adopted in the original paper.
| Class | Base Rate in Training set | Base Rate in Test set |
|---|---|---|
| 0 - Uncertain | 20.8955 | 24.8503 |
| 1 - Somewhat Certain | 32.3881 | 33.2335 |
| 2 - Certain | 46.7164 | 41.9162 |
Hyperparameter Optimization
The published model represents one of the 29 models different configurations. The selected model maximizes Quadratic Weighted Kappa (implemented using cohen_kappa with quadratic weights), which is better adapted to ordinal problems, such as ordinal scales. Under this metric, a random model would score 0. We adopt this metric as opposed to accuracy or macro F1 to address class imbalances.
Here is the classification report and test set metrics:
17:44:36 INFO test loss=0.9565 acc=0.578 QWK=0.5004
17:44:36 INFO
precision recall f1-score support
0 0.58 0.51 0.54 83
1 0.47 0.46 0.46 111
2 0.65 0.71 0.68 140
accuracy 0.58 334
macro avg 0.57 0.56 0.56 334
weighted avg 0.57 0.58 0.57 334
We conduct a hyperparameter sweep of the following hyperp
- Freeze / Unfreeze
- LR: 1e-6 through 1e-3
- Batch Size: 16, 32
- Hidden Size Dimensions: 256, 128
- Warmup Ratio: 0.05, 0.1, 0.2, 0.3
- Epochs 30 (with patience)
Usage
from transformers import AutoModelForSequenceClassification, AutoTokenizer
model = AutoModelForSequenceClassification.from_pretrained("cbelem/scibert-certainty-ordinal", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("cbelem/scibert-certainty-ordinal", trust_remote_code=True)
- Downloads last month
- -