Spaces:

evaluate-metric
/

nist_mt

Sleeping

App Files Files Community

nist_mt / README.md

lvwerra HF Staff

Update Space (evaluate main: 8b9373dc)

a9dbecb almost 3 years ago

preview code

raw

history blame contribute delete

2.88 kB

	---
	title: NIST_MT
	emoji: 🤗
	colorFrom: purple
	colorTo: red
	sdk: gradio
	sdk_version: 3.19.1
	app_file: app.py
	pinned: false
	tags:
	- evaluate
	- metric
	- machine-translation
	description:
	DARPA commissioned NIST to develop an MT evaluation facility based on the BLEU score.
	---

	# Metric Card for NIST's MT metric


	## Metric Description
	DARPA commissioned NIST to develop an MT evaluation facility based on the BLEU
	score. The official script used by NIST to compute BLEU and NIST score is
	mteval-14.pl. The main differences are:

	- BLEU uses geometric mean of the ngram overlaps, NIST uses arithmetic mean.
	- NIST has a different brevity penalty
	- NIST score from mteval-14.pl has a self-contained tokenizer (in the Hugging Face implementation we rely on NLTK's
	implementation of the NIST-specific tokenizer)

	## Intended Uses
	NIST was developed for machine translation evaluation.

	## How to Use

	```python
	import evaluate
	nist_mt = evaluate.load("nist_mt")
	hypothesis1 = "It is a guide to action which ensures that the military always obeys the commands of the party"
	reference1 = "It is a guide to action that ensures that the military will forever heed Party commands"
	reference2 = "It is the guiding principle which guarantees the military forces always being under the command of the Party"
	nist_mt.compute(hypothesis1, [reference1, reference2])
	# {'nist_mt': 3.3709935957649324}
	```

	### Inputs
	- predictions: tokenized predictions to score. For sentence-level NIST, a list of tokens (str);
	for corpus-level NIST, a list (sentences) of lists of tokens (str)
	- references: potentially multiple tokenized references for each prediction. For sentence-level NIST, a
	list (multiple potential references) of list of tokens (str); for corpus-level NIST, a list (corpus) of lists
	(multiple potential references) of lists of tokens (str)
	- n: highest n-gram order
	- tokenize_kwargs: arguments passed to the tokenizer (see: https://github.com/nltk/nltk/blob/90fa546ea600194f2799ee51eaf1b729c128711e/nltk/tokenize/nist.py#L139)

	### Output Values
	- nist_mt (`float`): NIST score

	Output Example:
	```python
	{'nist_mt': 3.3709935957649324}
	```


	## Citation
	```bibtex
	@inproceedings{10.5555/1289189.1289273,
	author = {Doddington, George},
	title = {Automatic Evaluation of Machine Translation Quality Using N-Gram Co-Occurrence Statistics},
	year = {2002},
	publisher = {Morgan Kaufmann Publishers Inc.},
	address = {San Francisco, CA, USA},
	booktitle = {Proceedings of the Second International Conference on Human Language Technology Research},
	pages = {138–145},
	numpages = {8},
	location = {San Diego, California},
	series = {HLT '02}
	}
	```

	## Further References

	This Hugging Face implementation uses [the NLTK implementation](https://github.com/nltk/nltk/blob/develop/nltk/translate/nist_score.py)