elibaum.com

some notes on pseudorandom correlation generators

25 Jun 2024

These are some notes from a talk. I will clean them up at some point in the future, but there are probably many mistakes. Use at your own risk! If (when) you find mistakes feel free to shoot me an email :)

  1. How to multiply, securely
    1. Beaver Triples [Beaver91]
  2. Interlude: oblivious transfer
    1. normal OT
    2. random OT
    3. oblivious linear evaluations (OLE)
  3. Beaver Triples from VOLE
  4. How to VOLE
  5. Silent VOLE
  6. Learning Parity with Noise
    1. Structured LPN
    2. LPN Dual
      1. Efficient calculation
  7. Putting it together
    1. final word on small VOLE
  8. Summary
  9. Extra tidbits

How to multiply, securely

Environment: (dishonest majority) two-party computation. Here, we’ll assume an additive sharing. Parties hold shares of \(x,y\) and want to compute shares of \(x\cdot y\):

\[\begin{gather*} x=[x]_1+[x]_2\\ y=[y]_1+[y]_2\\ z=[z]_1+[z]_2\overset {???} = [x\cdot y] \end{gather*}\]

Note on secret sharing notation: \([x]_i\) means a share of \(x\) held by party \(i\). I might drop the \(i\) below (\([x]\)), which just refers to both shares, in general.

Unlike addition, multiplication can’t be computed locally.

\[[x][y] =[x]_1[y]_1+[x]_2[y]_2\neq[xy]\]

The cross-terms are missing. How can we compute them?

Beaver Triples [Beaver91]

Assume we have a magic cloud that gives us Beaver Triples: shares of random numbers \([a],[b],[c]\) such that \(a\cdot b = c\). Then, we perform the following:

  1. Parties compute \([A]:=[x+a]\) and \([B]:=[y+b]\). Since \(a,b\) are random, this is effectively a one-time pad, so we can safely open to get cleartext \(A\) and \(B\).
  2. Parties locally compute
\[\begin{align*} A[y]-B[a]+[c]&=(x+a)[y]-(y+b)[a]+[c]\\ &=[xy]+[ay]-[ay]-[ab]+[c]\\ &=[xy] \end{align*}\]

Tada! The cost of this procedure is one triple and two opens.

Note: Similar to a one-time pad, you also cannot reuse triples! Using a triple twice leaks information about the inputs. Specifically, you’ll learn \(A_1=x_1+a\) and \(A_2=x_2+a\) and thus \(A_2-A_1=x_2-x_1; B_2-B_1=y_2-y_1\).

Interlude: oblivious transfer

normal OT

Security properties: sender learns nothing about \(b\) and receiver learns nothing about \(m_{1-b}\).

random OT

non-interactive randomized version

Now we can also think of the messages as being correlated, i.e., \(m_1=m_0 + \Delta\). Then the receiver actually gets \(m_b=m_0+b\Delta\).

rOT implies OT. We can build “chosen message” OT out of a single rOT plus a bit more communication, by using the rOT to mask our custom messages.

You might wonder if we even need Beaver Triples. Maybe there’s a simpler procedure (with no magical clouds involved) that just requires some simple local operations and a bit of communication (e.g., mirroring existing 3PC and 4PC multiplication protocols). Unfortunately, there’s a separation here: 2PC multiplication can be used to build oblivious transfer, and OT is known to require PK crypto or other heavy assumptions.

oblivious linear evaluations (OLE)

let’s allow \(b\in\mathbb F\), an arbitrary field. Then rOT instead looks like the joint evaluation of a random line at a random point.

Alternatively, we can view the output as \(m_b-m_0=b\Delta\), a random additive sharing of a random product. Maybe you can start to see how this is useful.

We can also build vector oblivious linear evaluations (VOLE), where instead \(\overrightarrow {m_b} - \overrightarrow {m_0} = \overrightarrow b\Delta\). We let \(\Delta\) be a constant scalar across all OLEs; this is fine for security.

Beaver Triples from VOLE

Now let’s assume we have a magic cloud in the sky that spits out VOLEs, \(\overrightarrow W\cdot x = \overrightarrow Y + \overrightarrow Z\). (I’m changing the sign here for convenience but that doesn’t matter. \(x\) is a scalar.)

Party 1 (the sender above) has \(x\) and \(Z\). Party 2 (the receiver above) has \(W\) and \(Y\). I’ll mark them with party-subscripts for clarity: the VOLE outputs \((x_1,Z_1);\ (W_2,Y_2)\) s.t. \(W_2\cdot x_1=Y_2+Z_1\).

To get a Beaver Triple, we need two instances of OLE, so let’s use the first two elements \((0,1)\) of each vector.

What do these two values sum to?

\[\begin{align*} c_1+c_2&=x_1^2 + Z_1[0]+Z_1[1]+W_2[0] W_2[1]+Y_2[0]+Y_2[1]\\ &=x_1^2 + (Y_2[0] + Z_1[0]) + (Y_2[1] + Z_1[1]) + W_2[0] W_2[1]\\ &=x_1^2 + W_2[0]x_1+W_2[1]x_1+W_2[0] W_2[1]\\ &=x_1(x_1+W_2[0])+W_2[1](x_1+W_2[0])\\ &=(x_1+W_2[1])(x_1+W_2[0]) \end{align*}\]

If we let…

…then we’ve got a Beaver triple! The first term of each \([c]\) share, above, is the party’s “local” component of the product (\([x]_i[y]_i\), in the notation used at the beginning). Each VOLE computes one cross term.

One change you could make is to swap the roles of the two parties in the two VOLEs (maybe have an “even” VOLE and an “odd” VOLE). Then both parties would have one scalar and one vector. I don’t think it’s actually an issue for Party 1’s share to always be the same (the VOLE scalar). But that could be wrong…

Also, instead of considering this operation element-by-element, we could do it for the entire vector, thus getting a (half-sized) vector of Beaver Triples.

How to VOLE

How do we actually instantiate the VOLE cloud?

  1. Terrible: use OT to “manually” perform all multiplications. We can compute multiplication using (effectively) the schoolbook method and local addition. This is extremely inefficient: for a length-\(N\) vector of \(\ell\)-bit values, we need \(n\ell\) OTs and thus \(O(n)\) public key operations.

  2. Much better: use OT extension like MASCOT ([KOS16]) or [IKNP03]. At a very high level, we can use hashing and a PRG to reduce \(2^{o(\lambda)}\) OTs to \(\lambda\) calls of \(\lambda\)-bit OT (where \(\lambda\) is a security parameter). In practice, this means we usually perform 128 “base” OTs, and then get effectively infinite OTs (technically a bit less than \(2^{128}\), so, uh…). This is great for communication, but is a lot of compute - still \(O(N)\) calls to a PRG.

    The IKNP reduction looks something like this:

    • Send a random matrix in the opposite direction (receiver \(\rightarrow\) sender) and use a hash function over its rows to reduce \(N\) calls of \(\ell\)-bit OT to \(\lambda\) calls of \(N\)-bit OT
    • Use a PRG with stretch \(N/\lambda\) to reduce \(\lambda\) calls of \(N\)-bit OT to \(\lambda\) calls of \(\lambda\)-bit OT
    • Actually perform those \(\lambda\) OTs.

    MASCOT makes this maliciously secure.

  3. Awesome: Silent VOLE!

Silent VOLE

…with pseudorandom correlation generators (PCGs)1

First, what’s a PCG? It has two algorithms:

For security, we want to argue that one party’s key + its output should leak nothing about the other party’s, besides the fact that the outputs are correlated. There’s a funny thing with the proof here; because this would require a simulator to effectively “compress” high-entropy randomness into short seeds, PCGs cannot be proven simulation secure. Instead we make an indistinguishability argument. (I don’t really understand super well yet why the simulation proof fails.)

\(\mathcal C\) can be any correlation you want. Here, we’re looking at VOLEs, but you might also want rOT correlations, sharings-of-zero, square numbers, matrix multiplication triples, authenticated Beaver triples, etc.

Good news: we can build PCGs!

Bad news: they only have an expansion factor of around 2.

Good news: with an additional assumption we can make the expansion orders of magnitude better.

Learning Parity with Noise

That assumption is Learning Parity with Noise, or LPN. Briefly, LPN says that given a public random matrix \(A\), seed \(\vec s\), sparse noise \(\vec e\) (of weight \(t\)), and random vector \(\vec r\),

\[(A, A\vec s +\vec e)\approx(A,\vec r)\]

where \(\approx\) means indistinguishable.

This looks basically identical to LWE, so it’s worth clarifying some of the differences:

LPN relates to the average-case problem of decoding linear codes. (In LWE, the decoding step rounds to the nearest lattice point. The analog under LPN is decoding an error-correcting code.) LPN is known to be worst-case NP-hard, but seems like a strictly less powerful assumption than LWE: every known LPN-based construction can also be instantiated with LWE, but LPN does not give FHE, at least under current techniques.

Different variants of LPN also have different powers. “Constant noise” LPN — where \(t=O(1)\) — only gives authentication schemes. However, we can get public-key crypto in the “low noise” (\(t=O\left(n^{-1/2}\right)\)) regime.

This writeup is helpful.

What is \(A\)? It’s the generator matrix for a linear code (a code where the sum of two codewords is itself a valid codeword.) Let’s call it \(G\) instead.

To use a linear code when communicating over a noisy channel, we just multiply our message by the generator. Its inverse is the decoder.

\[\begin{align*} cw=G\cdot m\quad\rightarrow\text{...noisy channel...}\rightarrow\quad m&\approx G^{-1}(cw+e)\\ &\approx G^{-1}cw+G^{-1}e\\ &\approx m+0 \end{align*}\]

(We can assume the error vector \(e\) is sparse because the noisy channel only flips a few bits.)

Linear codes are widely used in communication systems for their efficiency and error performance. A lot of research has gone into creating efficient schemes for linear codes, which we can take advantage of here to build PCGs. An important fact about effective coding schemes is that we want them to create “random-looking” codewords (then, any noise from the channel is “spread” across the entire message and can be easily corrected).

As currently stated, LPN isn’t getting us much. Most codes of interest to us have a rate between (maybe) \(1/5\) and \(1/2\). This means we can get, at best, \(5\times\) expansion from the seed to our PCG output. We also need to perform a relatively expensive matrix multiplication, so the Expand step won’t even be that efficient.

Structured LPN

To fix this issue, we introduce structured LPN. The idea came from the fact that that using random generator matrices isn’t efficient for communications. Instead, a number of linear codes have been developed – with highly structured generators – that nonetheless create “random looking” codewords. This structure, in turn, admits more efficient encoding algorithms than just naive matrix multiplication.

Concretely, the codes we’ll use in practice for PCGs will have generator matrices of size approximately \(1\text M\times 2\text M\), but with only a few dozen non-zero elements per column. This allows for much more efficient representation and computation.

This setting begins to diverge a bit from the usual constraints of coding theory and communications, where we’re encoding (maybe) a few hundred bits at a time. Also, PCGs don’t require efficient decoding, so we don’t have to worry about the encode/decode tradeoff that is usually important for (e.g.) radio communications like 5G and WiFi.

Structured LPN is of course a much stronger assumption, and necessitates a more heuristic argument of security. We’ll discuss this more below.

LPN Dual

The definition above is known as the “LPN primal”. There is also a dual notion of LPN which gives significantly more efficient constructions. (LPN primal only gives subquadratic stretch, while Dual gives arbitrary poly stretch.)

Define a matrix \(H\) such that \(HG=0\). (Such a matrix doesn’t exist, in general, but for linear codes \(G\) it will; \(H\) is the “parity-check” matrix and we use it to check for errors in received codewords.)

\[\begin{align*} HG&=0\\ Gs+e&\approx r&\text{(LPN primal)}\\ H(Gs+e)&\approx Hr\\ HGs+He&\approx\hat r\\ He&\approx \hat r \end{align*}\]

Unstructured primal LPN seems eminently believable. But structured dual LPN is pretty counterintuitive to me: we’re multiplying an extremely sparse matrix by another sparse vector and somehow getting something pseudorandom! But such is the magic of linear codes. (As a partial answer, think about \(\vec e\) as taking a random \(t\)-sum of elements in \(H\). Related to subset sum, I guess?)

Efficient calculation

Even though \(\vec e\) is sparse, we still need \(O(n^2)\) operations to do a matrix multiplication (assuming \(H\in\mathbb F^{n\times n}\), or so). This is where the structured codes come in, and we use encoding algorithms rather than naive multiplication.

Other papers might change the underlying structure of the problem (e.g., rings instead of fields, [BCG+20]) but the basic idea is the same: much of the research in this area is designing new codes to plug into the PCG framework. I’d imagine certain codes might also be more friendly towards certain correlations.

In general, there are a few properties \(H\) needs to satisfy:

  1. \(H\) is the parity-check matrix of a structured linear code. Codes with a “repeat-accumulate” structure seem to be particularly well-suited.

  2. \(H\cdot \vec e\) is indistinguishable from random…

  3. …and computable in linear-ish time (in the size of the output, \(\tilde O(N)\))

For #2, it turns out that we need the code to have a “high minimum codeword distance”. And due to the properties of linear codes, high min distance implies high min hamming weight (If \(X,X’\) have min distance, then \(X-X’=Z\) is a codeword of min weight). So, #2 reduces to finding codewords with minimum hamming weight. This is how papers like [CRR21] can make security mistakes: it’s impossible to exhaustively check all codewords, so your heuristic might miss something.

While it may not seem like it, we now have all the tools to build silent VOLE.

Putting it together

First, we use the magic VOLE cloud (sorry) to generate a small number of sparse VOLEs. Let’s call that \(\mathrm wx=\mathrm y+\mathrm z\). This will form the keys for our PCG: \(k_0=(x,\mathrm z);\ k_1=(\mathrm w,\mathrm y)\).

While these will technically be large vectors, they’ll be extremely sparse, so we can use a distributed point function (DPF) to encode them. A DPF can store a \(t\)-sparse length-\(N\) vector in about \(O(t^2\log N)\) space. The communication overhead for the Setup stage will mostly consist of exchanging DPF seeds.

Next, in the Expand stage, we use \(H\) to apply the linear code. This is an entirely offline operation! LPN tells us that the result will be indistinguishable from random. Let \(W=H\mathrm w\) (similarly for \(Y,Z\)).

\[\begin{align*} \mathrm w x &= \mathrm y + \mathrm z\\ H\mathrm w x &= H(\mathrm y + \mathrm z)\\ Wx &= Y+Z \end{align*}\]

There we have it!

In this case, the PCG seeds themselves are a (small number of) instances of the target correlation \(\mathcal C\). But this isn’t true in general. In some constructions, we may perform additional processing on the result of encoding to arrive at the usable correlation.

final word on small VOLE

Ok, but this is just pushing the answer further down the road. How do we instantiate the small VOLE cloud? Well, that can be one of the “inefficient” manual protocols. You can use OT or homomorphic encryption, for example, to get just a few VOLEs (really, to set up the DPF seeds) to seed the PCG.

Maybe we only do ~128 base (= expensive) OTs, then use OT extension to multiply all of the seed VOLEs (to get a million VOLEs, we might need a few thousand seed VOLEs).

Summary

  1. Structured LPN with DPF allow for efficient, compact PCGs
  2. PCGs plus OT extension gives silent VOLE
  3. VOLE gives Beaver Triples
  4. Beaver Triples gives multiplication

Extra tidbits

A few items I didn’t mention above:

DPFs: a point function is a function which is zero everywhere except for one location. We write \(f_{\alpha,\beta}\) to represent the point function which has \(f(\alpha)=\beta\) and \(\forall x\neq\alpha : f(x)=0\). A distributed point function is a way of sharing such a function between multiple parties. Usually we want DPFs to hide both \(\alpha,\beta\) from both parties; in the PCG context, it is sufficient for one of the parties to know \(\alpha\). This formulation is called “known-index DPF”.

There are a few ways of building DPFs. Some constructions like [Ds17] come from the ORAM literature. One efficient construction uses GGM-style trees (but with a punctured PRF), where one party knows a “correction word” which gives \(\beta\), but not where to apply it; the other party knows the location \(\alpha\) but not the correction word. Alternatively, we can create a “grid” construction following a similar design, but where we instead share \(\sqrt N\) seeds to a PRG with an output length of \(\sqrt N\); again, only one party knows the correction word. Elette Boyle has a nice writeup discussing this.

Multiparty PCGs: PCGs are only super efficient in the two-party environment. [AS22] develops PCGs for arbitrary parties by create an \(n\)-party DPF. Unfortunately, while two-party (tree-based) constructions have seeds of size \(O(\log N)\) (for \(N\) target correlations), \(n\)-party constructions require \(O(\sqrt N)\)-sized seeds. Concretely, [BCG+20] achieves 1.25 MB seeds whereas [AS22] needs hundreds of MBs (and up to GBs in some scenarios). Additionally, [AS22] only becomes “worth it” (e.g., amortized expansion \(>1\)) after a few million triples.

Bootstrapping: we don’t need to re-run expensive Setup when we run out of correlations. Instead, we can use the last few (thousand) correlations to “bootstrap” a new PCG. I’m not sure about the exact implementation here; this might be specific to VOLEs (or, at least, correlations that imply multiplication).

Malicious security & authenticated triples: is actually not that hard. The construction of [BCG+20] requires only a 2x overhead in communication and computation to obtain authenticated triples. It’s possible to build authenticated triples in a black-box manner, directly from VOLE, but the more efficient option is to rebuild the PCG for an “authenticated triples” correlation itself. We use a MAC to authenticate each value in the triple, where the MAC \(m_x=\alpha x\) for some global MAC key \(\alpha\).

Regular noise: another assumption we can make is LPN with regular noise. Say we want a length-\(N\) \(t\)-sparse vector. Rather than having the nonzero indices uniformly distributed over the entire vector, we can divide the vector into \(t\) chunks, and randomly put one nonzero element into each chunk. Then, rather than encoding a length-\(N\) vector as the sum of \(t\) DPFs of length \(N\) (seed size \(O(t^2\log N)\)), we only need to concatenate \(t\) DPFs of length \(N/t\) (seed size \(O(t\log N/t)\)).

  1. we made it!