Incorrect comments in example
#4
by
mjspeck
- opened
Under FlavaForPreTraining the value for outputs.multimodal_embeddings is actually None, as opposed to what the adjacent comment implies : # Batch size X (Number of image patches + Text Sequence Length + 3) X Hidden size => 2 X 275 x 768. Why? Doesn't seem like the README author expected this.
I assume it has to do with this: inputs.bool_masked_pos.zero_()