How much GPU memory is required for 32k context embedding?

#32

by Labmem009 - opened Mar 4, 2024

Mar 4, 2024

I tried to use this model to get embedding of long text, but I failed many times with 6*A100 and DP for OOM. Is there any suggestion to allocate memory for long text?

intfloat

Owner Mar 6, 2024

For 32k context, it needs to run on an 80GB A100 GPU with float16 / bfloat16 and FlashAttention enabled, also the batch size needs to be reduced to 1.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment