A specialized FHE bootstrapping approach

Bootstrapping Bits
with CKKS

Authors: Youngjin Bae, Jung Hee Cheon, Jaehyung Kim, Damien Stehlé

Context: RLWE-based schemes (CKKS) offer high throughput via SIMD. LWE-based schemes (TFHE) offer low latency for binary circuits.

Thesis: By redesigning the EvalMod step explicitly for binary inputs, we unlock massive modulus savings and performance gains in CKKS.

02

Prior Work & The State of the Art

CKKS for Binary (BLEACH)

Encodes bits as real numbers with small noise ε
Binary gates implemented as real arithmetic
Relies on standard CKKS bootstrapping (black-box)
At high parallelism, beats DM/CGGI (AES in 11.5m vs 28s)

DM/CGGI (TFHE)

Native binary message space
Lower latency for sequential/thin circuits
~10.5ms per gate (CGGI16b)
LWE-format ciphertext with scaling factor q/4

The Compatibility Gap: CKKS uses scaling factor Δ ≪ q, DM/CGGI uses q/4. Standard CKKS bootstrapping requires a ~10-bit gap between Δ₀ and q₀, which is incredibly wasteful for simple binary bits.

03

What This Paper Contributes

1. BinBoot

Binary Bootstrapping

Targets ciphertexts encoding bits in MSBs
Replaces sine-based EvalMod with cosine function
Allows Δ₀ = q₀/2 (saves >100 bits of modulus)
Inherent quadratic error shrinkage

2. GateBoot

Combined Eval & Bootstrap

Adds input ciphertexts before bootstrapping
Bootstraps once with gate-specific trig function
Evaluates gate and refreshes ciphertext in one step

3. DM/CGGI Sync

Format Compatibility

Natively compatible with DM/CGGI
Combined with HERMES ring packing for massive parallel bootstraps
First full-slot bootstrapping at N = 2¹⁴

04

Modulus Engineering & Efficiency Gains

By reducing q₀ / Δ₀ from 2¹³ down to 2, the proposed BinBoot saves 12 bits of modulus for all levels corresponding to CtS and EvalMod. This translates directly to more functional depth.

Available Multiplication Depth

Extra depth available for actual computation inside a bootstrapping cycle.

BLEACH [DMPS24]

7

BinBoot (Naive)

13

BinBoot (Opt.)

29

Parameter Set	Base (q₀)	Depth
BLEACH	2⁵⁸	7
Proposed (Naive)	2⁴⁶	13
Proposed (Optimized)	2³¹	29

05

Latency vs. Throughput

Tested on Intel Xeon Gold 6242 @ 2.8GHz, Single-threaded.

High Throughput (N=2¹⁶)

Implementation	Time/gate
LMSS23 (CGGI)	16.49 ms
CGGI16b (TFHE)	10.5 ms
BLEACH improved	27.7 µs
BinBoot (This work)	17.6 µs

597× faster than native TFHE
1.57× faster than prior state-of-the-art (BLEACH)

Low Latency (N=2¹⁴)

First-ever full-slot CKKS bootstrapping at N=2¹⁴
BinBoot: 1.36s for 2¹³ slots
GateBoot: 1.39s for 2¹³ slots

DM/CGGI Crossover

Using GateBoot + HERMES ring packing.
BinBoot becomes favorable over native TFHE once ≥ 262 gates are evaluated in parallel.

06

Limitations & Future Directions

Current Limitations

Current RNS implementations treat all moduli as 64-bit words, masking some of the modulus savings in raw hardware runtime.
GateBoot lacks inherent noise cleaning (adding h₁ step consumes 2 extra multiplication levels).
HERMES ring packing used in simplest column method.

Future Work

Optimize RNS arithmetic for sub-64-bit moduli (batching consecutive small primes could cut NTT cost by ~2×).
Full optimization of HERMES + BinBoot for hybrid DM/CGGI circuits.
Explore programmable bootstrapping (multiple gates simultaneously).

FIN

Thank You

Presented By

011221311 — Abdul Taha Mahmud

011221571 — Shahrier Azad Shezan