Instructions to use intfloat/e5-mistral-7b-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use intfloat/e5-mistral-7b-instruct with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("intfloat/e5-mistral-7b-instruct") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Transformers
How to use intfloat/e5-mistral-7b-instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="intfloat/e5-mistral-7b-instruct")# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("intfloat/e5-mistral-7b-instruct") model = AutoModel.from_pretrained("intfloat/e5-mistral-7b-instruct") - Inference
- Notebooks
- Google Colab
- Kaggle
CUDA Memory error when using sentence transformers using Tesla V100-PCIE-32GB
Hello, I'm facing an cuda memory error while trying to embed documents (less than 4096 tokens).
I'm using sentence transformers to load the model, I'm using a Tesla V100-PCIE-32GB GPU.
Here is the error :
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 0; 31.74 GiB total capacity; 23.36 GiB already allocated; 11.06 MiB free; 23.37 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
any idea how to solve this ? or my GPU doesn't have enough memory ?
Hi, I encountered similar issues when quantizing this embedder to 4-bit and embedding a series of text with various lengths. I only have a 16GB V100 GPU to use, and the longest text in the series I have is less than 1k words. I noticed that the GPU memory wasn't released even with model outputs being moved to cpu and deleted plus running gc.collect() and torch.cuda.empty_cache() after. I appreciate for any idea on the solution or suggestion.
@cc-wei looks like you are doing batch processing.
I faced the same issue, no matter even if the batch size is as small as 4, the issue pertains
Please embed each chunk at a time.
def embed_texts(batch_texts, batch_number):
embeddings = []
with torch.no_grad():
batch_time = time.time()
for text in batch_texts: ###################### processing each chunk individually rather in batches
embedding = model.encode([text], prompt_name="web_search_query", convert_to_tensor=True, device=device)
embeddings.append(embedding.cpu())
elapsed_time = time.time() - batch_time
print(f"Time taken for embedding batch {batch_number}: {elapsed_time:.2f} seconds")
return torch.cat(embeddings)