01

A specialized FHE bootstrapping approach

Bootstrapping Bits
with CKKS

Authors: Youngjin Bae, Jung Hee Cheon, Jaehyung Kim, Damien Stehlé

Context: RLWE-based schemes (CKKS) offer high throughput via SIMD. LWE-based schemes (TFHE) offer low latency for binary circuits.

Thesis: By redesigning the EvalMod step explicitly for binary inputs, we unlock massive modulus savings and performance gains in CKKS.

02

Prior Work & The State of the Art

CKKS for Binary (BLEACH)

  • Encodes bits as real numbers with small noise ε
  • Binary gates implemented as real arithmetic
  • Relies on standard CKKS bootstrapping (black-box)
  • At high parallelism, beats DM/CGGI (AES in 11.5m vs 28s)

DM/CGGI (TFHE)

  • Native binary message space
  • Lower latency for sequential/thin circuits
  • ~10.5ms per gate (CGGI16b)
  • LWE-format ciphertext with scaling factor q/4

The Compatibility Gap: CKKS uses scaling factor Δ ≪ q, DM/CGGI uses q/4. Standard CKKS bootstrapping requires a ~10-bit gap between Δ₀ and q₀, which is incredibly wasteful for simple binary bits.

03

What This Paper Contributes

1. BinBoot

Binary Bootstrapping

  • Targets ciphertexts encoding bits in MSBs
  • Replaces sine-based EvalMod with cosine function
  • Allows Δ₀ = q₀/2 (saves >100 bits of modulus)
  • Inherent quadratic error shrinkage

2. GateBoot

Combined Eval & Bootstrap

  • Adds input ciphertexts before bootstrapping
  • Bootstraps once with gate-specific trig function
  • Evaluates gate and refreshes ciphertext in one step

3. DM/CGGI Sync

Format Compatibility

  • Natively compatible with DM/CGGI
  • Combined with HERMES ring packing for massive parallel bootstraps
  • First full-slot bootstrapping at N = 2¹⁴
04

Modulus Engineering & Efficiency Gains

By reducing q₀ / Δ₀ from 2¹³ down to 2, the proposed BinBoot saves 12 bits of modulus for all levels corresponding to CtS and EvalMod. This translates directly to more functional depth.

Available Multiplication Depth

Extra depth available for actual computation inside a bootstrapping cycle.

BLEACH [DMPS24]
7
BinBoot (Naive)
13
BinBoot (Opt.)
29
Parameter Set Base (q₀) Depth
BLEACH 2⁵⁸ 7
Proposed (Naive) 2⁴⁶ 13
Proposed (Optimized) 2³¹ 29
05

Latency vs. Throughput

Tested on Intel Xeon Gold 6242 @ 2.8GHz, Single-threaded.

High Throughput (N=2¹⁶)

ImplementationTime/gate
LMSS23 (CGGI)16.49 ms
CGGI16b (TFHE)10.5 ms
BLEACH improved27.7 µs
BinBoot (This work)17.6 µs
  • 597× faster than native TFHE
  • 1.57× faster than prior state-of-the-art (BLEACH)

Low Latency (N=2¹⁴)

  • First-ever full-slot CKKS bootstrapping at N=2¹⁴
  • BinBoot: 1.36s for 2¹³ slots
  • GateBoot: 1.39s for 2¹³ slots

DM/CGGI Crossover

  • Using GateBoot + HERMES ring packing.
  • BinBoot becomes favorable over native TFHE once ≥ 262 gates are evaluated in parallel.
06

Limitations & Future Directions

Current Limitations

  • Current RNS implementations treat all moduli as 64-bit words, masking some of the modulus savings in raw hardware runtime.
  • GateBoot lacks inherent noise cleaning (adding h₁ step consumes 2 extra multiplication levels).
  • HERMES ring packing used in simplest column method.

Future Work

  • Optimize RNS arithmetic for sub-64-bit moduli (batching consecutive small primes could cut NTT cost by ~2×).
  • Full optimization of HERMES + BinBoot for hybrid DM/CGGI circuits.
  • Explore programmable bootstrapping (multiple gates simultaneously).
FIN

Thank You

Presented By

011221311 Abdul Taha Mahmud

011221571 Shahrier Azad Shezan