Performance tuning¶
Measured on the repo's benchmark suite, single workstation:
| Workload | Throughput / latency |
|---|---|
| Small payload latency | mean 556 µs, p99 1075 µs |
| 1 MB array throughput | 7.7k msg/s, 8.0 GB/s |
| 4 MB array throughput | 2.25k msg/s, 9.4 GB/s |
| 1080p RGB | 1422 fps, 8.8 GB/s |
See Benchmarks to reproduce.
Copy-on-use¶
Decoded NumPy arrays alias the ZMQ frame memory — that's how large-payload throughput hits memcpy bandwidth. Consequence:
- Mutate the array?
arr = arr.copy()first. - Hold the array past the callback? Copy first.
Queue sizing¶
Per-socket HWM defaults to 10. Increase queue_size on high-rate producers whose subscribers are slow — but remember ZMQ drops silently at the HWM.
When to prefer the single-blob path¶
Tiny messages (primitives only, < 1 KB) see no benefit from multipart. The inline to_bytes path is fine there. The transport always uses multipart today.
uvloop¶
Installed by default on Unix. Drops tail latency on high-rate small messages.
Sync subscriber mode¶
For control loops needing sub-100 µs p99, see Sync subscriber mode. Bypasses asyncio + zmq.asyncio on the receive path.