threshold-4to2-compressor
4:2 compressor for high-speed multiplier trees. Reduces 4 input bits plus carry-in to 2 output bits plus carry-out while preserving arithmetic value.
Circuit
x y z w cin
β β β β β
ββββ¬ββββ΄βββ¬ββββ΄βββ¬ββββ β
β β β β
βΌ β β β
βββββββ β β β
βXOR β β β β
β(x,y)β β β β
ββββ¬βββ β β β
β β β β
βΌ βΌ β β
βββββββββββββββ β β
β XOR(xy,z) β β β
ββββββββ¬βββββββ β β
β β β
βΌ βΌ β
ββββββββββββββββ β
β XOR(xyz,w) β β
ββββββββ¬ββββββββ β
β β
βΌ βΌ
βββββββββββββββββββββββ
β XOR(xyzw, cin) βββββΊ Sum
βββββββββββββββββββββββ
cout = MAJ(x,y,z) (independent of w, cin)
carry = MAJ(XOR(x,y,z), w, cin)
Function
compress_4to2(x, y, z, w, cin) -> (sum, carry, cout)
Invariant: x + y + z + w + cin = sum + 2*carry + 2*cout
Truth Table (partial - 32 combinations)
| x | y | z | w | cin | sum | carry | cout | verify |
|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0=0 |
| 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1=1 |
| 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 2=2 |
| 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 3=3 |
| 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 4=4 |
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 5=5 |
Input sum range: 0 to 5 Output encoding: sum + 2carry + 2cout (range 0-5)
Mechanism
The 4:2 compressor is built from two cascaded 3:2 compressors with a twist:
Stage 1: Compress (x, y, z)
- sum1 = x XOR y XOR z
- cout = MAJ(x, y, z) β This goes to next column
Stage 2: Compress (sum1, w, cin)
- sum = sum1 XOR w XOR cin
- carry = MAJ(sum1, w, cin) β This goes to next column
Key insight: The cout is computed early and can propagate horizontally while sum/carry are still being computed.
Architecture
| Component | Function | Neurons | Layers |
|---|---|---|---|
| XOR(x,y) | First pair | 3 | 2 |
| XOR(xy,z) | Add third | 3 | 2 |
| MAJ(x,y,z) | cout | 1 | 1 |
| XOR(xyz,w) | Add fourth | 3 | 2 |
| XOR(xyzw,cin) | sum | 3 | 2 |
| MAJ(xyz,w,cin) | carry | 1 | 1 |
Total: 14 neurons
Parameters
| Inputs | 5 (x, y, z, w, cin) |
| Outputs | 3 (sum, carry, cout) |
| Neurons | 14 |
| Layers | 8 |
| Parameters | 44 |
| Magnitude | 46 |
Delay Analysis
Critical path for sum: 4 XOR stages = 8 layers Critical path for carry: 4 XOR stages + 1 MAJ = 9 layers Critical path for cout: 1 MAJ = 1 layer (very fast!)
The early cout enables fast horizontal carry propagation in multiplier arrays.
Usage
from safetensors.torch import load_file
import torch
w = load_file('model.safetensors')
def compress_4to2(x, y, z, w_in, cin):
# Implementation details in model.py
pass
# Example: sum of 5 bits
s, carry, cout = compress_4to2(1, 1, 1, 1, 1)
print(f"1+1+1+1+1 = {s} + 2*{carry} + 2*{cout} = {s + 2*carry + 2*cout}")
# Output: 1+1+1+1+1 = 1 + 2*1 + 2*1 = 5
Applications
- Booth multipliers (radix-4)
- Wallace/Dadda tree reduction
- FMA (fused multiply-add) units
- High-performance DSP
Comparison with 3:2 Compressor
| Property | 3:2 | 4:2 |
|---|---|---|
| Inputs | 3 | 5 (4 + cin) |
| Outputs | 2 | 3 (2 + cout) |
| Reduction ratio | 3β2 | 4β2 per column |
| Neurons | 7 | 14 |
| Tree depth for n bits | O(logβ.β n) | O(logβ n) |
4:2 compressors provide faster reduction in multiplier trees.
Files
threshold-4to2-compressor/
βββ model.safetensors
βββ model.py
βββ create_safetensors.py
βββ config.json
βββ README.md
License
MIT
- Downloads last month
- 8