MemReader-4B-f32-GGUF

MemReader-4B is a 4 billion parameter causal language model designed specifically for memory operations within the MemOS system, derived from a fine-tuned Qwen3-4B. It supports high-quality memory extraction from conversations and documents in both English and Chinese, with a large context window of 32,768 tokens. This model enables faster, more accurate memory extraction with over 70% reduced resource consumption compared to larger models like Qwen3-14B, while outperforming GPT-4o-mini. MemReader-4B facilitates local-only deployment for restricted environments, offering excellent system efficiency and suitability for summarizing and integrating memory information from chats and documents. It is readily usable through MemOS configurations or directly via Huggingface and supports seamless memory extraction workflows.

Model Files

File name Size Quant type
MemReader-4B.F32.gguf 16.1 GB F32
MemReader-4B.BF16.gguf 8.05 GB BF16
MemReader-4B.F16.gguf 8.05 GB F16
MemReader-4B.Q8_0.gguf 4.28 GB Q8_0
MemReader-4B.Q6_K.gguf 3.31 GB Q6_K
MemReader-4B.Q5_K_M.gguf 2.89 GB Q5_K_M
MemReader-4B.Q5_K_S.gguf 2.82 GB Q5_K_S
MemReader-4B.Q4_K_M.gguf 2.5 GB Q4_K_M
MemReader-4B.Q4_K_S.gguf 2.38 GB Q4_K_S
MemReader-4B.Q3_K_L.gguf 2.24 GB Q3_K_L
MemReader-4B.Q3_K_M.gguf 2.08 GB Q3_K_M
MemReader-4B.Q3_K_S.gguf 1.89 GB Q3_K_S
MemReader-4B.Q2_K.gguf 1.67 GB Q2_K

Quants Usage

(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)

Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):

image.png

Downloads last month
39
GGUF
Model size
4B params
Architecture
qwen3
Hardware compatibility
Log In to view the estimation

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

32-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for prithivMLmods/MemReader-4B-f32-GGUF

Base model

Qwen/Qwen3-4B-Base
Finetuned
Qwen/Qwen3-4B
Quantized
(3)
this model