## CryptoDB

### Yu Yu

#### Publications

**Year**

**Venue**

**Title**

2021

TOSC

Provable Security of SP Networks with Partial Non-Linear Layers
Abstract

Motivated by the recent trend towards low multiplicative complexity blockciphers (e.g., Zorro, CHES 2013; LowMC, EUROCRYPT 2015; HADES, EUROCRYPT 2020; MALICIOUS, CRYPTO 2020), we study their underlying structure partial SPNs, i.e., Substitution-Permutation Networks (SPNs) with parts of the substitution layer replaced by an identity mapping, and put forward the first provable security analysis for such partial SPNs built upon dedicated linear layers. For different instances of partial SPNs using MDS linear layers, we establish strong pseudorandom security as well as practical provable security against impossible differential attacks. By extending the well-established MDS code-based idea, we also propose the first principled design of linear layers that ensures optimal differential propagation. Our results formally confirm the conjecture that partial SPNs achieve the same security as normal SPNs while consuming less non-linearity, in a well-established framework.

2021

CRYPTO

Pushing the Limits of Valiant's Universal Circuits: Simpler, Tighter and More Compact
📺
Abstract

A universal circuit (UC) is a general-purpose circuit that can simulate arbitrary circuits (up to a certain size $n$). Valiant provides a $k$-way recursive construction of UCs (STOC 1976), where $k$ tunes the complexity of the recursion. More concretely, Valiant gives theoretical constructions of 2-way and 4-way UCs of asymptotic (multiplicative) sizes $5n\log n$ and $4.75 n\log n$ respectively, which matches the asymptotic lower bound $\Omega(n\log n)$ up to some constant factor.
Motivated by various privacy-preserving cryptographic applications, Kiss et al. (Eurocrypt 2016) validated the practicality of $2$-way universal circuits by giving example implementations for private function evaluation. G{\"{u}}nther et al. (Asiacrypt 2017) and Alhassan et al. (J. Cryptology 2020) implemented the 2-way/4-way hybrid UCs with various optimizations in place towards making universal circuits more practical. Zhao et al. (Asiacrypt 2019) optimized Valiant's 4-way UC to asymptotic size $4.5 n\log n$ and proved a lower bound $3.64 n\log n$ for UCs under the Valiant framework.
As the scale of computation goes beyond 10-million-gate ($n=10^7$) or even billion-gate level ($n=10^9$), the constant factor in UCs size plays an increasingly important role in application performance. In this work, we investigate Valiant's universal circuits and present an improved framework for constructing universal circuits with the following advantages.
[Simplicity.] Parameterization is no longer needed. In contrast to that previous implementations resorted to a hybrid construction combining $k=2$ and $k=4$ for a tradeoff between fine granularity and asymptotic size-efficiency, our construction gets the best of both worlds when configured at the lowest complexity (i.e., $k=2$).
[Compactness.] Our universal circuits have asymptotic size $3n\log n$, improving upon the best previously known $4.5n\log n$ by 33\% and beating the $3.64n\log n$ lower bound for UCs constructed under Valiant's framework (Zhao et al., Asiacrypt 2019).
[Tightness.] We show that under our new framework the UCs size is lower bounded by $2.95 n\log n$, which almost matches the $3n\log n$ circuit size of our $2$-way construction.
We implement the 2-way universal circuits and evaluate its performance with other implementations, which confirms our theoretical analysis.

2021

CRYPTO

Smoothing Out Binary Linear Codes and Worst-case Sub-exponential Hardness for LPN
📺
Abstract

Learning parity with noise (LPN) is a notorious (average-case) hard problem that has been well studied in learning theory, coding theory and cryptography since the early 90's. It further inspires the Learning with Errors (LWE) problem [Regev, STOC 2005], which has become one of the central building blocks for post-quantum cryptography and advanced cryptographic. Unlike LWE whose hardness can be reducible from worst-case lattice problems, no corresponding worst-case hardness results were known for LPN until very recently. At Eurocrypt 2019, Brakerski et al. [BLVW19] established the first feasibility result that the worst-case hardness of nearest codeword problem (NCP) (on balanced linear code) at the extremely low noise rate $\frac{\log^2 n}{n}$ implies the quasi-polynomial hardness of LPN at the extremely high noise rate $1/2-1/\poly(n)$. It remained open whether a worst-case to average-case reduction can be established for standard (constant-noise) LPN, ideally with sub-exponential hardness.
We start with a simple observation that the hardness of high-noise LPN over large fields is implied by that of the LWE of the same modulus, and is thus reducible from worst-case hardness of lattice problems. We then revisit [BLVW19], which is the main focus of this work. We first expand the underlying binary linear codes (of the NCP) to not only the balanced code considered in [BLVW19] but also to another code (in some sense dual to balanced code). At the core of our reduction is a new variant of smoothing lemma (for both binary codes) that circumvents the barriers (inherent in the underlying worst-case randomness extraction) and admits tradeoffs for a wider spectrum of parameter choices. In addition to the worst-case hardness result obtained in [BLVW19], we show that for any constant $0<c<1$ the constant-noise LPN problem is ($T=2^{\Omega(n^{1-c})},\epsilon=2^{-\Omega(n^{\min(c,1-c)})},q=2^{\Omega(n^{\min(c,1-c)})}$)-hard assuming that the NCP at the low-noise rate $\tau=n^{-c}$ is ($T'={2^{\Omega(\tau n)}}$, $\epsilon'={2^{-\Omega(\tau n)}}$,$m={2^{\Omega(\tau n)}}$)-hard in the worst case, where $T$, $\epsilon$, $q$ and $m$ are time complexity, success rate, sample complexity, and codeword length respectively. Moreover, refuting the worst-case hardness assumption would imply arbitrary polynomial speedups over the current state-of-the-art algorithms for solving the NCP (and LPN), which is a win-win result. Unfortunately, public-key encryptions and collision resistant hash functions need constant-noise LPN with ($T={2^{\omega(\sqrt{n})}}$, $\epsilon'={2^{-\omega(\sqrt{n})}}$,$q={2^{\sqrt{n}}}$)-hardness (Yu et al., CRYPTO 2016 \& ASIACRYPT 2019), which is almost (up to an arbitrary $\omega(1)$ factor in the exponent) what is reducible from the worst-case NCP when $c= 0.5$. We leave it as an open problem whether the gap can be closed or there is a separation in place.

2021

TCHES

Learning Parity with Physical Noise: Imperfections, Reductions and FPGA Prototype
📺
Abstract

Hard learning problems are important building blocks for the design of various cryptographic functionalities such as authentication protocols and post-quantum public key encryption. The standard implementations of such schemes add some controlled errors to simple (e.g., inner product) computations involving a public challenge and a secret key. Hard physical learning problems formalize the potential gains that could be obtained by leveraging inexact computing to directly generate erroneous samples. While they have good potential for improving the performances and physical security of more conventional samplers when implemented in specialized integrated circuits, it remains unknown whether physical defaults that inevitably occur in their instantiation can lead to security losses, nor whether their implementation can be viable on standard platforms such as FPGAs. We contribute to these questions in the context of the Learning Parity with Physical Noise (LPPN) problem by: (1) exhibiting new (output) data dependencies of the error probabilities that LPPN samples may suffer from; (2) formally showing that LPPN instances with such dependencies are as hard as the standard LPN problem; (3) analyzing an FPGA prototype of LPPN processor that satisfies basic security and performance requirements.

2020

TOSC

Efficient Side-Channel Secure Message Authentication with Better Bounds
📺
Abstract

We investigate constructing message authentication schemes from symmetric cryptographic primitives, with the goal of achieving security when most intermediate values during tag computation and verification are leaked (i.e., mode-level leakage-resilience). Existing efficient proposals typically follow the plain Hash-then-MAC paradigm T = TGenK(H(M)). When the domain of the MAC function TGenK is {0, 1}128, e.g., when instantiated with the AES, forgery is possible within time 264 and data complexity 1. To dismiss such cheap attacks, we propose two modes: LRW1-based Hash-then-MAC (LRWHM) that is built upon the LRW1 tweakable blockcipher of Liskov, Rivest, and Wagner, and Rekeying Hash-then-MAC (RHM) that employs internal rekeying. Built upon secure AES implementations, LRWHM is provably secure up to (beyond-birthday) 278.3 time complexity, while RHM is provably secure up to 2121 time. Thus in practice, their main security threat is expected to be side-channel key recovery attacks against the AES implementations. Finally, we benchmark the performance of instances of our modes based on the AES and SHA3 and confirm their efficiency.

2020

PKC

Tweaking the Asymmetry of Asymmetric-Key Cryptography on Lattices: KEMs and Signatures of Smaller Sizes
📺
Abstract

Currently, lattice-based cryptosystems are less efficient than their number-theoretic counterparts (based on RSA, discrete logarithm, etc.) in terms of key and ciphertext (signature) sizes. For adequate security the former typically needs thousands of bytes while in contrast the latter only requires at most hundreds of bytes. This significant difference has become one of the main concerns in replacing currently deployed public-key cryptosystems with lattice-based ones. Observing the inherent asymmetries in existing lattice-based cryptosystems, we propose asymmetric variants of the (module-)LWE and (module-)SIS assumptions, which yield further size-optimized KEM and signature schemes than those from standard counterparts. Following the framework of Lindner and Peikert (CT-RSA 2011) and the Crystals-Kyber proposal (EuroS&P 2018), we propose an IND-CCA secure KEM scheme from the hardness of the asymmetric module-LWE (AMLWE), whose asymmetry is fully exploited to obtain shorter public keys and ciphertexts. To target at a 128-bit quantum security, the public key (resp., ciphertext) of our KEM only has 896 bytes (resp., 992 bytes). Our signature scheme bears most resemblance to and improves upon the Crystals-Dilithium scheme (ToCHES 2018). By making full use of the underlying asymmetric module-LWE and module-SIS assumptions and carefully selecting the parameters, we construct an SUF-CMA secure signature scheme with shorter public keys and signatures. For a 128-bit quantum security, the public key (resp., signature) of our signature scheme only has 1312 bytes (resp., 2445 bytes). We adapt the best known attacks and their variants to our AMLWE and AMSIS problems and conduct a comprehensive and thorough analysis of several parameter choices (aiming at different security strengths) and their impacts on the sizes, security and error probability of lattice-based cryptosystems. Our analysis demonstrates that AMLWE and AMSIS problems admit more flexible and size-efficient choices of parameters than the respective standard versions.

2020

CRYPTO

Better Concrete Security for Half-Gates Garbling (in the Multi-Instance Setting)
📺
Abstract

We study the concrete security of high-performance implementations of half-gates garbling, which all rely on (hardware-accelerated) AES. We find that current instantiations using k-bit wire labels can be completely broken—in the sense that the circuit evaluator learns all the inputs of the circuit garbler—in time O(2k/C), where C is the total number of (non-free) gates that are garbled, possibly across multiple independent executions. The attack can be applied to existing circuit-garbling libraries using k = 80 when C ≈ $10^9$, and would require 267 machine-months and cost about $3500 to implement on the Google Cloud Platform. Since the attack can be entirely parallelized, the attack could be carried out in about a month using ≈ 250 machines.
With this as our motivation, we seek a way to instantiate the hash function in the half-gates scheme so as to achieve better concrete security. We present a construction based on AES that achieves optimal security in the single-instance setting (when only a single circuit is garbled). We also show how to modify the half-gates scheme so that its concrete security does not degrade in the multi-instance setting. Our modified scheme is as efficient as prior work in networks with up to 2 Gbps bandwidth.

2020

ASIACRYPT

Packed Multiplication: How to Amortize the Cost of Side-channel Masking?
📺
Abstract

Higher-order masking countermeasures provide strong provable security against side-channel attacks at the cost of incurring significant overheads, which largely hinders its applicability. Previous works towards remedying cost mostly concentrated on ``local'' calculations, i.e., optimizing the cost of computation units such as a single AND gate or a field multiplication. This paper explores a complementary ``global'' approach, i.e., considering multiple operations in the masked domain as a batch and reducing randomness and computational cost via amortization. In particular, we focus on the amortization of $\ell$ parallel field multiplications for appropriate integer $\ell > 1$, and design a kit named {\it packed multiplication} for implementing such a batch.
Higher-order masking countermeasures provide strong provable security against side-channel attacks at the cost of incurring significant overheads, which largely hinders its applicability. Previous works towards remedying cost mostly concentrated on ``local'' calculations, i.e., optimizing the cost of computation units such as a single AND gate or a field multiplication. This paper explores a complementary ``global'' approach, i.e., considering multiple operations in the masked domain as a batch and reducing randomness and computational cost via amortization.
In particular, we focus on the amortization of $\ell$ parallel field multiplications for appropriate integer $\ell > 1$, and design a kit named {\it packed multiplication} for implementing such a batch.
For $\ell+d\leq2^m$, when $\ell$ parallel multiplications over $\mathbb{F}_{2^{m}}$ with $d$-th order probing security are implemented, packed multiplication consumes $d^2+2\ell d + \ell$ bilinear multiplications and $2d^2 + d(d+1)/2$ random field variables, outperforming the state-of-the-art results with $O(\ell d^2)$ multiplications and $\ell \left \lfloor d^2/4\right \rfloor + \ell d$ randomness. To prove $d$-probing security for packed multiplications, we introduce some weaker security notions for multiple-inputs-multiple-outputs gadgets and use them as intermediate steps, which may be of independent interest.
As parallel field multiplications exist almost everywhere in symmetric cryptography, lifting optimizations from ``local'' to ``global'' substantially enlarges the space of improvements. To demonstrate, we showcase the method on the AES Subbytes step, GCM and TET (a popular disk encryption). Notably, when $d=8$, our implementation of AES Subbytes in ARM Cortex M architecture achieves a gain of up to $33\%$ in total speeds and saves up to $68\%$ random bits than the state-of-the-art bitsliced implementation reported at ASIACRYPT~2018.

2019

ASIACRYPT

Valiant’s Universal Circuits Revisited: An Overall Improvement and a Lower Bound
Abstract

A universal circuit (UC) is a general-purpose circuit that can simulate arbitrary circuits (up to a certain size n). At STOC 1976 Valiant presented a graph theoretic approach to the construction of UCs, where a UC is represented by an edge universal graph (EUG) and is recursively constructed using a dedicated graph object (referred to as supernode). As a main end result, Valiant constructed a 4-way supernode of size 19 and an EUG of size $$4.75n\log n$$ (omitting smaller terms), which remained the most size-efficient even to this day (after more than 4 decades).Motivated by the emerging applications of UCs in various privacy preserving computation scenarios, we revisit Valiant’s universal circuits, and propose a 4-way supernode of size 18, and an EUG of size $$4.5n\log n$$. As confirmed by our implementations, we reduce the size of universal circuits (and the number of AND gates) by more than 5% in general, and thus improve upon the efficiency of UC-based cryptographic applications accordingly. Our approach to the design of optimal supernodes is computer aided (rather than by hand as in previous works), which might be of independent interest. As a complement, we give lower bounds on the size of EUGs and UCs in Valiant’s framework, which significantly improves upon the generic lower bound on UC size and therefore reduces the gap between theory and practice of universal circuits.

2019

ASIACRYPT

Collision Resistant Hashing from Sub-exponential Learning Parity with Noise
Abstract

The Learning Parity with Noise (LPN) problem has recently found many cryptographic applications such as authentication protocols, pseudorandom generators/functions and even asymmetric tasks including public-key encryption (PKE) schemes and oblivious transfer (OT) protocols. It however remains a long-standing open problem whether LPN implies collision resistant hash (CRH) functions. Inspired by the recent work of Applebaum et al. (ITCS 2017), we introduce a general construction of CRH from LPN for various parameter choices. We show that, just to mention a few notable ones, under any of the following hardness assumptions (for the two most common variants of LPN) 1.constant-noise LPN is $$2^{n^{0.5+\varepsilon }}$$-hard for any constant $$\varepsilon >0$$;2.constant-noise LPN is $$2^{\varOmega (n/\log n)}$$-hard given $$q=\mathsf {poly}(n)$$ samples;3.low-noise LPN (of noise rate $$1/\sqrt{n}$$) is $$2^{\varOmega (\sqrt{n}/\log n)}$$-hard given $$q=\mathsf {poly}(n)$$ samples. there exists CRH functions with constant (or even poly-logarithmic) shrinkage, which can be implemented using polynomial-size depth-3 circuits with NOT, (unbounded fan-in) AND and XOR gates. Our technical route LPN $$\rightarrow $$ bSVP $$\rightarrow $$ CRH is reminiscent of the known reductions for the large-modulus analogue, i.e., LWE $$\rightarrow $$ SIS $$\rightarrow $$ CRH, where the binary Shortest Vector Problem (bSVP) was recently introduced by Applebaum et al. (ITCS 2017) that enables CRH in a similar manner to Ajtai’s CRH functions based on the Short Integer Solution (SIS) problem.Furthermore, under additional (arguably minimal) idealized assumptions such as small-domain random functions or random permutations (that trivially imply collision resistance), we still salvage a simple and elegant collision-resistance-preserving domain extender combining the best of the two worlds, namely, maximized (depth one) parallelizability and polynomial shrinkage. In particular, assume $$2^{n^{0.5+\varepsilon }}$$-hard constant-noise LPN or $$2^{n^{0.25+\varepsilon }}$$-hard low-noise LPN, we obtain a collision resistant hash function that evaluates in parallel only a single layer of small-domain random functions (or random permutations) and shrinks polynomially.

2015

CRYPTO

2013

ASIACRYPT

#### Program Committees

- Eurocrypt 2020
- Asiacrypt 2020
- PKC 2019
- TCC 2019
- Asiacrypt 2018
- TCC 2017

#### Coauthors

- Boaz Barak (1)
- Davide Bellizia (1)
- Gaëtan Cassiers (1)
- Yevgeniy Dodis (2)
- Shuqin Fan (1)
- Rong Fu (1)
- Dawu Gu (3)
- Zheng Guo (1)
- Chun Guo (5)
- Clément Hoffmann (1)
- Zhenkai Hu (1)
- Dina Kamel (1)
- Jonathan Katz (1)
- Hugo Krawczyk (1)
- Xiangxue Li (4)
- Wenling Liu (1)
- Junrong Liu (1)
- Hanlin Liu (3)
- Pierrick Méaux (1)
- Olivier Pereira (2)
- Krzysztof Pietrzak (1)
- François-Xavier Standaert (7)
- John P. Steinberger (1)
- Weijia Wang (4)
- Xiao Wang (2)
- Jian Weng (4)
- Chenkai Weng (1)
- Sen Xu (1)
- Kang Yang (1)
- Jiang Zhang (7)
- Zhenfeng Zhang (1)
- Shuoyao Zhao (2)