Incorrect comments in example

by mjspeck - opened Feb 22, 2024

Feb 22, 2024

•

edited Feb 22, 2024

Under FlavaForPreTraining the value for outputs.multimodal_embeddings is actually None, as opposed to what the adjacent comment implies : # Batch size X (Number of image patches + Text Sequence Length + 3) X Hidden size => 2 X 275 x 768. Why? Doesn't seem like the README author expected this.

I assume it has to do with this: inputs.bool_masked_pos.zero_()

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment