FWIW: I created a github repo for compact zero-knowledge proofs that could be useful for privacy-preserving ML models of reasonable size (https://github.com/logannye/space-efficient-zero-knowledge-p...). Unfortunately, FHE's computational overhead is still prohibitive for running ML workloads except on very small models. Hoping to help make ZKML a little more practical.
I was under the impression that, for any FHE scheme with "good" security, (a) there was a finite and not very large limit to the number of operations you could do on encrypted data before the result became undecryptable, and (b) each operation on the encrypted side was a lot more expensive than the corresponding operation on plaintext numbers or whatever.
Am I wrong? I freely admit I don't know how it's supposed to work inside, because I've never taken the time to learn, because I believed those limitations made it unusable for most purposes.
Yet the abstract suggests that FHE is useful for running machine learning models, and I assume that means models of significant size.
The difference between homomorphic schemes and fully homomorphic schemes is that FHE can be bootstrapped; there's a circuit that can be homomorphically evaluated that removes the noise from an encrypted value, allowing any homomorphic calculation's result to have its noise removed for further computation.
My understanding is largely ten years old and high level and only for one kind of fully homomorphic encryption. Things have changed and there is more than one kind.
I heard it described as a system that encrypts each bit and then evaluates the "encrypted bit" in a virtual gate-based circuit that implements the desired operations that one wants applied to the plaintext. The key to (de|en)crypt plaintext will be at least one gigabyte. Processing this exponentially larger data is why FHE based on the system I've described is so slow.
So, if you wanted to, say, add numbers, that would involve implementing a full adder [0] circuit in the FHE system.
Both of these are correct-ish. You can do a renornalization that resets the operation counter without decrypting on FHE schemes, so in that sense there is no strict limit on operation count. However, FHE operations are still about 6 orders of magnitude more expensive than normal, so you are not going to be running an LLM, for instance, any time soon. A small classifier maybe.
LLMs are at the current forefront of FHE research. There are a few papers doing some tweaked versions of BERT in <1 minute per token. Which is only ~4 orders of magnitude slower than cleartext.
This paper uses a very heavily modified version of an encoder-only BERT model. Forward pass on a single 4090 is cited there at 13 seconds after switching softmax out for a different kernel (21 seconds with softmax). They are missing a non-FHE baseline, but that model has only about 35 million parameters when you look at its size. At FP16, you would expect this to be about 100x faster than a normal BERT because it's so damn small. On a 4090, that model's forward pass probably runs at something like 100k-1M tokens per second given some batching. It sounds like 6 orders of magnitude is still about right.
Given individual LLM parameters are not easily interpreted, naturally obfuscated by the diffuse nature of their impact, I would think leaning into that would be a more efficient route.
Obfuscating input and output formats could be very effective.
Obfuscation layers can be incorporated into training. With an input (output) layer that passes information forward, but whose output (input) is optimized to have statistically flat characteristics, resistant to attempts to interpret.
Nothing like apparent pure noise for obfuscation!
The core of the model would then be trained, and infer, on the obfuscated data.
When used, the core model would publicly operate on obfuscated data. While the obfuscation/de-obfuscation layers would be used privately.
In addition to obfuscating, the pre and post-layers could also reduce data dimensionality. Naturally increasing obfuscation and reducing data transfer costs. It is a really good fit.
Even the most elaborate obfuscation layers will be orders and orders of magnitude faster than today's homomorphic approaches.
(Given the natural level parameter obfuscation, and the highly limited set of operations for most deep models, I wouldn't be surprised if efficient homomorphic approaches were found in the future.)
Moore's Law roughly states that we get a doubling of speed every 2 years.
If we're 6 orders of magnitude off, then we need to double our speed 20 times (2^20 = 1,048,576), which would give us speeds approximately in line with 40 years ago. Unless my understanding is completely off.
The rule of thumb is "about a 100000x slowdown". With Moore's law of 2 years that means it would operate at speeds of computers from about 40 years ago. Although really that's still making it seem like it's faster than it is. Making direct comparisons is hard.
Let's admit for a second that the problem around computational cost is solved and using FHE is similar to using plaintext data.
My question might be very naive but I'd like to better understand the impact of FHE, discussions here seem to revolve very much around the use of FHE in ML, but are there other uses for FHE?
For example, could it be used for everyday work in an OS or a messaging app?
There's no value to it in circumstances where you control all the hardware processing data, so "everyday work in an OS" - only if that OS is hosted on someone else's hardware, "a messaging app" - only if you expect some of the messages or metadata to undergo processing on someone else's hardware.
It seems wildly unlikely that the performance characteristics will improve dramatically, so in practice the uses are going to remain somewhat niche.
> There's no value to it in circumstances where you control all the hardware processing data
But what about the case where you don't have so much control about what runs next to your program? Could it be possible for an attacker to run a program in order to extract some data when your program is run?
Also, could FHE offer some protection against vulnerabilities like Meltdown and Spectre?
> It seems wildly unlikely that the performance characteristics will improve dramatically
Why? Are there some specific signs for this already? I had the impression that everytime people tend to believe that with technology they get proven wrong later.
The tipical, and also most useful, example use case for FHE is running computational tasks on some cloud service without having to trust it. And yes, it would provide protection against Meltdown and Spectre (if performed on the hardware running the computation), as the attacker would be able to only extract encrypted data.
The data has to be decrypted at some point in order to display it... unless we're envisioning FHME hardware in the monitor as well - honestly I think we're well across the threshold into fantasy already though.
Of course the data has to be decrypted, but in this case you would decrypt it on your client machine, so that you don't need to trust the cloud provider or other third parties using VMs on the same server (side channel attacks can sometimes be exploited from another VM running on the same hardware, although this is rarely considered as part of one's threat model).
What is the computational burden of FHE over doing the same operation in plaintext? I realize that many cloud proponants think that FHE may allow them to work with data without security worries (if it is all encrypted, and we dont have the keys, it aint our problem) but if FHE requires a 100x or 1000x increase in processor capacity then i am not sure it will be practical at scale.
Oh. It really is that bad still. So if the question is between wrapping the plaintext in layers of security, or building out a million new server instances to do it via FHE, i know which one everyone will choose.
Accelerators are being developed that claim to get down to 10x, though i think they will be more like 100-1000x, which would still be a huge improvement considering how people use LLMs today for basic tasks like string matching.
Are those accelerators software-only? 10x could let 4$ VPS run server side checks for backup software (evil clients cant clean backups) and git forges (eg, dont allow X to push to main).
You may wish to build a protocol where third parties can asynchronously operate on user data. You may also want to have separation between the end app and the compute layer for legal or practical purposes. Finally, you may not want to store large payloads on client devices.
I'm giving you general reasons why this is the case. For our own app, we hope to build a protocol where third parties can operate async on user data (with consent).
Any program which you apply FHE to needs to be expressed as a circuit, which implies that the time taken to run a computation needs to be fixed in advance. It's therefore impossible to express a branch instruction (or "if" statement, if you prefer).
The circuits are built out of "+" and "×" gates, which are enough to express any polynomial. In turn, these are enough to approximate any continuous function (Weierstrass's approximation theorem). In turn, every computable function on the real numbers is a continuous function - so FHE is very powerful.
Differentiability isn’t a requirement for homomorphism I don’t think.
Homomorphism just means say I have a bijective function [1] f: A -> B and a binary operator * in A and *’ in B, f is homomorphic if f(a1*a2) = f(a1)*’f(a2). Loosely speaking it “preserves structure”.
So if f is my encryption then I can do *’ outside the encryption and I know because f is homomorphic that the result is identical to doing * inside the encryption. So you need your encryption to be an isomorphism [2]and you need to have ”outside the encryption “ variants of any operation you want to do inside the encryption. That is a different requirement to differentiability.
1: bijective means it’s a one to one correspondence
2: a bijection that has the homomorphism property is called an isomorphism because it makes set A equivalent to set B in our example.
ReLU, commonly used in neural networks, is not differentiable at zero but it's still able to be approximated by expressions that are efficiently FHE-evaluable. You don't truly care about differentiability here, if you're being pedantic.
Very insightful comment, though. LLMs run under FHE (or just fully local LLMs) are a great step forwards for mankind. Everyone should have the right to interact with LLMs privately. That is an ideal to strive for.
I was surprised that for almost 300 pages there were only 26 references listed in the back. Not the end of the world by any means, clearly a ton of work went into this, but I find it useful to see from references how it overlaps with other subjects I may know more about
There were (at least) two posts from arxiv.org on the front page at the time, and when I was updating the title on the other one I must have applied it to this one instead. I've fixed it now and re-upped it onto the front page so I can have its full exposure on the front page with its correct title.
FWIW: I created a github repo for compact zero-knowledge proofs that could be useful for privacy-preserving ML models of reasonable size (https://github.com/logannye/space-efficient-zero-knowledge-p...). Unfortunately, FHE's computational overhead is still prohibitive for running ML workloads except on very small models. Hoping to help make ZKML a little more practical.
This sounds super interesting. Can you elaborate on how you apply ZK to ML? (or can you point me to any resources?)
Did you check Zama.ai's work on FHE?
I was under the impression that, for any FHE scheme with "good" security, (a) there was a finite and not very large limit to the number of operations you could do on encrypted data before the result became undecryptable, and (b) each operation on the encrypted side was a lot more expensive than the corresponding operation on plaintext numbers or whatever.
Am I wrong? I freely admit I don't know how it's supposed to work inside, because I've never taken the time to learn, because I believed those limitations made it unusable for most purposes.
Yet the abstract suggests that FHE is useful for running machine learning models, and I assume that means models of significant size.
The difference between homomorphic schemes and fully homomorphic schemes is that FHE can be bootstrapped; there's a circuit that can be homomorphically evaluated that removes the noise from an encrypted value, allowing any homomorphic calculation's result to have its noise removed for further computation.
My understanding is largely ten years old and high level and only for one kind of fully homomorphic encryption. Things have changed and there is more than one kind.
I heard it described as a system that encrypts each bit and then evaluates the "encrypted bit" in a virtual gate-based circuit that implements the desired operations that one wants applied to the plaintext. The key to (de|en)crypt plaintext will be at least one gigabyte. Processing this exponentially larger data is why FHE based on the system I've described is so slow.
So, if you wanted to, say, add numbers, that would involve implementing a full adder [0] circuit in the FHE system.
[0] https://en.wikipedia.org/wiki/Adder_(electronics)#/media/Fil...
For a better overview that is shorter than the linked 250 page paper, I encourage you to consider Jeremy Kun's 2024 overview [1]
[1] https://www.jeremykun.com/2024/05/04/fhe-overview/
Both of these are correct-ish. You can do a renornalization that resets the operation counter without decrypting on FHE schemes, so in that sense there is no strict limit on operation count. However, FHE operations are still about 6 orders of magnitude more expensive than normal, so you are not going to be running an LLM, for instance, any time soon. A small classifier maybe.
LLMs are at the current forefront of FHE research. There are a few papers doing some tweaked versions of BERT in <1 minute per token. Which is only ~4 orders of magnitude slower than cleartext.
https://arxiv.org/html/2410.02486v1#S5
This paper uses a very heavily modified version of an encoder-only BERT model. Forward pass on a single 4090 is cited there at 13 seconds after switching softmax out for a different kernel (21 seconds with softmax). They are missing a non-FHE baseline, but that model has only about 35 million parameters when you look at its size. At FP16, you would expect this to be about 100x faster than a normal BERT because it's so damn small. On a 4090, that model's forward pass probably runs at something like 100k-1M tokens per second given some batching. It sounds like 6 orders of magnitude is still about right.
Given individual LLM parameters are not easily interpreted, naturally obfuscated by the diffuse nature of their impact, I would think leaning into that would be a more efficient route.
Obfuscating input and output formats could be very effective.
Obfuscation layers can be incorporated into training. With an input (output) layer that passes information forward, but whose output (input) is optimized to have statistically flat characteristics, resistant to attempts to interpret.
Nothing like apparent pure noise for obfuscation!
The core of the model would then be trained, and infer, on the obfuscated data.
When used, the core model would publicly operate on obfuscated data. While the obfuscation/de-obfuscation layers would be used privately.
In addition to obfuscating, the pre and post-layers could also reduce data dimensionality. Naturally increasing obfuscation and reducing data transfer costs. It is a really good fit.
Even the most elaborate obfuscation layers will be orders and orders of magnitude faster than today's homomorphic approaches.
(Given the natural level parameter obfuscation, and the highly limited set of operations for most deep models, I wouldn't be surprised if efficient homomorphic approaches were found in the future.)
Does this mean, according to Moore's Law, FHE can operate at speeds from 6 years ago?
Moore's Law roughly states that we get a doubling of speed every 2 years.
If we're 6 orders of magnitude off, then we need to double our speed 20 times (2^20 = 1,048,576), which would give us speeds approximately in line with 40 years ago. Unless my understanding is completely off.
The rule of thumb is "about a 100000x slowdown". With Moore's law of 2 years that means it would operate at speeds of computers from about 40 years ago. Although really that's still making it seem like it's faster than it is. Making direct comparisons is hard.
the goalpost moved and it's not private anymore, just private enough.
[dead]
Let's admit for a second that the problem around computational cost is solved and using FHE is similar to using plaintext data.
My question might be very naive but I'd like to better understand the impact of FHE, discussions here seem to revolve very much around the use of FHE in ML, but are there other uses for FHE?
For example, could it be used for everyday work in an OS or a messaging app?
Also, is it the path for true obsfuscation?
That's a big stretch for the premise, but...
There's no value to it in circumstances where you control all the hardware processing data, so "everyday work in an OS" - only if that OS is hosted on someone else's hardware, "a messaging app" - only if you expect some of the messages or metadata to undergo processing on someone else's hardware.
It seems wildly unlikely that the performance characteristics will improve dramatically, so in practice the uses are going to remain somewhat niche.
> There's no value to it in circumstances where you control all the hardware processing data
But what about the case where you don't have so much control about what runs next to your program? Could it be possible for an attacker to run a program in order to extract some data when your program is run?
Also, could FHE offer some protection against vulnerabilities like Meltdown and Spectre?
> It seems wildly unlikely that the performance characteristics will improve dramatically
Why? Are there some specific signs for this already? I had the impression that everytime people tend to believe that with technology they get proven wrong later.
The tipical, and also most useful, example use case for FHE is running computational tasks on some cloud service without having to trust it. And yes, it would provide protection against Meltdown and Spectre (if performed on the hardware running the computation), as the attacker would be able to only extract encrypted data.
The data has to be decrypted at some point in order to display it... unless we're envisioning FHME hardware in the monitor as well - honestly I think we're well across the threshold into fantasy already though.
Of course the data has to be decrypted, but in this case you would decrypt it on your client machine, so that you don't need to trust the cloud provider or other third parties using VMs on the same server (side channel attacks can sometimes be exploited from another VM running on the same hardware, although this is rarely considered as part of one's threat model).
What is the computational burden of FHE over doing the same operation in plaintext? I realize that many cloud proponants think that FHE may allow them to work with data without security worries (if it is all encrypted, and we dont have the keys, it aint our problem) but if FHE requires a 100x or 1000x increase in processor capacity then i am not sure it will be practical at scale.
It’s at least a million times slower than non-encrypted computation. 1000x or 100x would be a huge progress.
Oh. It really is that bad still. So if the question is between wrapping the plaintext in layers of security, or building out a million new server instances to do it via FHE, i know which one everyone will choose.
It's so bad that the only way FHE can get more efficient is to use a non-conventional compute technology. Some want to do it in optical donain.
It is not that bad these days, closer to 10,000x.
Accelerators are being developed that claim to get down to 10x, though i think they will be more like 100-1000x, which would still be a huge improvement considering how people use LLMs today for basic tasks like string matching.
Are those accelerators software-only? 10x could let 4$ VPS run server side checks for backup software (evil clients cant clean backups) and git forges (eg, dont allow X to push to main).
It's really not that bad. We're close to using FHE in a production consumer app.
https://vishakh.blog/2025/08/06/lessons-from-using-fhe-to-bu...
if you're talking about doing database queries on a 5mb database, why not just ship the database client side and have them do the computation?
You may wish to build a protocol where third parties can asynchronously operate on user data. You may also want to have separation between the end app and the compute layer for legal or practical purposes. Finally, you may not want to store large payloads on client devices.
5mb is hardly a "large payload"
I'm giving you general reasons why this is the case. For our own app, we hope to build a protocol where third parties can operate async on user data (with consent).
Funny thing is
Since neural networks are differentiable, they can be homomorphically encrypted!
That’s right, your LLM can be made to secretly produce stuff hehe
That's pretty cool, but isn't any computable function can be computed via FHE, so I'm not sure the differentiable part is necessary.
Any program which you apply FHE to needs to be expressed as a circuit, which implies that the time taken to run a computation needs to be fixed in advance. It's therefore impossible to express a branch instruction (or "if" statement, if you prefer).
The circuits are built out of "+" and "×" gates, which are enough to express any polynomial. In turn, these are enough to approximate any continuous function (Weierstrass's approximation theorem). In turn, every computable function on the real numbers is a continuous function - so FHE is very powerful.
> In turn, every computable function on the real numbers is a continuous function
That doesn't seem right. Consider the function f(x: ℝ) = 1 if x ≥ 0, 0 otherwise. That's computable but not continuous.
That's uncomputable because equality of real numbers is undecidable. Think infinite strings of digits.
Differentiability isn’t a requirement for homomorphism I don’t think.
Homomorphism just means say I have a bijective function [1] f: A -> B and a binary operator * in A and *’ in B, f is homomorphic if f(a1*a2) = f(a1)*’f(a2). Loosely speaking it “preserves structure”.
So if f is my encryption then I can do *’ outside the encryption and I know because f is homomorphic that the result is identical to doing * inside the encryption. So you need your encryption to be an isomorphism [2]and you need to have ”outside the encryption “ variants of any operation you want to do inside the encryption. That is a different requirement to differentiability.
1: bijective means it’s a one to one correspondence
2: a bijection that has the homomorphism property is called an isomorphism because it makes set A equivalent to set B in our example.
ReLU, commonly used in neural networks, is not differentiable at zero but it's still able to be approximated by expressions that are efficiently FHE-evaluable. You don't truly care about differentiability here, if you're being pedantic.
Very insightful comment, though. LLMs run under FHE (or just fully local LLMs) are a great step forwards for mankind. Everyone should have the right to interact with LLMs privately. That is an ideal to strive for.
man, imagine having time to read such papers. genuinly would read it but i know this alone is like 30h study
I was surprised that for almost 300 pages there were only 26 references listed in the back. Not the end of the world by any means, clearly a ton of work went into this, but I find it useful to see from references how it overlaps with other subjects I may know more about
Direct link to the book:
https://fhetextbook.github.io/
Is the title broken?
I see “Unified Line and Paragraph Detection by Graph Convolutional Networks (2022)”
Sorry about this. That was my screwup.
There were (at least) two posts from arxiv.org on the front page at the time, and when I was updating the title on the other one I must have applied it to this one instead. I've fixed it now and re-upped it onto the front page so I can have its full exposure on the front page with its correct title.
I see the same, and there is a posting of that title (and linking to the correct paper) also on HN frontpage. wondering what's going on.
You're not alone. I saw that FHE paper earlier, so... what's going on?
Sorry for not responding earlier. This is probably a bug but it's super weird... I just emailed the mods about this.