Spaces:
Sleeping
Sleeping
| title: NIST_MT | |
| emoji: π€ | |
| colorFrom: purple | |
| colorTo: red | |
| sdk: gradio | |
| sdk_version: 3.19.1 | |
| app_file: app.py | |
| pinned: false | |
| tags: | |
| - evaluate | |
| - metric | |
| - machine-translation | |
| description: | |
| DARPA commissioned NIST to develop an MT evaluation facility based on the BLEU score. | |
| # Metric Card for NIST's MT metric | |
| ## Metric Description | |
| DARPA commissioned NIST to develop an MT evaluation facility based on the BLEU | |
| score. The official script used by NIST to compute BLEU and NIST score is | |
| mteval-14.pl. The main differences are: | |
| - BLEU uses geometric mean of the ngram overlaps, NIST uses arithmetic mean. | |
| - NIST has a different brevity penalty | |
| - NIST score from mteval-14.pl has a self-contained tokenizer (in the Hugging Face implementation we rely on NLTK's | |
| implementation of the NIST-specific tokenizer) | |
| ## Intended Uses | |
| NIST was developed for machine translation evaluation. | |
| ## How to Use | |
| ```python | |
| import evaluate | |
| nist_mt = evaluate.load("nist_mt") | |
| hypothesis1 = "It is a guide to action which ensures that the military always obeys the commands of the party" | |
| reference1 = "It is a guide to action that ensures that the military will forever heed Party commands" | |
| reference2 = "It is the guiding principle which guarantees the military forces always being under the command of the Party" | |
| nist_mt.compute(hypothesis1, [reference1, reference2]) | |
| # {'nist_mt': 3.3709935957649324} | |
| ``` | |
| ### Inputs | |
| - **predictions**: tokenized predictions to score. For sentence-level NIST, a list of tokens (str); | |
| for corpus-level NIST, a list (sentences) of lists of tokens (str) | |
| - **references**: potentially multiple tokenized references for each prediction. For sentence-level NIST, a | |
| list (multiple potential references) of list of tokens (str); for corpus-level NIST, a list (corpus) of lists | |
| (multiple potential references) of lists of tokens (str) | |
| - **n**: highest n-gram order | |
| - **tokenize_kwargs**: arguments passed to the tokenizer (see: https://github.com/nltk/nltk/blob/90fa546ea600194f2799ee51eaf1b729c128711e/nltk/tokenize/nist.py#L139) | |
| ### Output Values | |
| - **nist_mt** (`float`): NIST score | |
| Output Example: | |
| ```python | |
| {'nist_mt': 3.3709935957649324} | |
| ``` | |
| ## Citation | |
| ```bibtex | |
| @inproceedings{10.5555/1289189.1289273, | |
| author = {Doddington, George}, | |
| title = {Automatic Evaluation of Machine Translation Quality Using N-Gram Co-Occurrence Statistics}, | |
| year = {2002}, | |
| publisher = {Morgan Kaufmann Publishers Inc.}, | |
| address = {San Francisco, CA, USA}, | |
| booktitle = {Proceedings of the Second International Conference on Human Language Technology Research}, | |
| pages = {138β145}, | |
| numpages = {8}, | |
| location = {San Diego, California}, | |
| series = {HLT '02} | |
| } | |
| ``` | |
| ## Further References | |
| This Hugging Face implementation uses [the NLTK implementation](https://github.com/nltk/nltk/blob/develop/nltk/translate/nist_score.py) | |