Some random thoughts about crypto. Notes from a course I teach. Pictures of my dachshunds.
“极光加速器”
I'm a cryptographer and professor at Johns Hopkins University. I've designed and analyzed cryptographic systems used in wireless networks, payment systems and digital content protection platforms. In my research I look at the various ways cryptography can be used to promote user privacy.
My academic website
My twitter feed
Top Posts
Useful crypto resources
Bitcoin tipjar
Cryptopals challenges
Applied Cryptography Research: A Board
Journal of Cryptographic Engineering
(not related to this blog)
“极光加速器”
Why I'm done with Chrome
Let's talk about PAKE
Zero Knowledge Proofs: An illustrated primer
A very casual introduction to Fully Homomorphic Encryption
Attack of the week: RC4 is kind of broken in TLS
Why is Signal asking users to set a PIN, or “A few thoughts on Secure Value Recovery”
Should you use SRP?
How does Apple (privately) find your offline devices?
How to choose an Authenticated Encryption mode
越墙看国外网加速软件
Banner image by Matt Blaze
“极光加速器”
“极光加速器”
Over the past several months, Signal has been rolling out a raft of new features to make its app more usable. One of those features has recently been raising a bit of controversy with users. This is a contact list backup feature based on a new system called Secure Value Recovery, or SVR. The SVR feature allows Signal to upload your contacts into Signal’s servers without — ostensibly — even Signal itself being able to access it.
The new Signal approach has created some trauma with security people, due to the fact that it was recently enabled without a particularly clear explanation. For a shorter summary of the issue, see this article. In this post, I want to delve a little bit deeper into why these decisions have made me so concerned, and what Signal is doing to try to mitigate those concerns.
What’s Signal, and why does it matter?
For those who aren’t familiar with it, Signal is an open-source app developed by Moxie Marlinkspike’s Signal Technology Foundation. Signal has received a lot of love from the security community. There are basically two reasons for this. First: the Signal app has served as a sort of technology demo for the Signal Protocol, which is the fundamental underlying cryptography that powers popular apps like 越墙看国外网加速软件 and WhatsApp, and all their billions of users.
Encrypted messengers like WhatsApp and Apple’s iMessage routinely back up your text message content and contact lists to remote cloud servers. These backups undo much of the strong security offered by end-to-end encryption — since they make it much easier for hackers and governments to obtain your plaintext content. You can disable these backups, but it’s surprisingly non-obvious to do it right (for me, at least). The larger services justify this backup default by pointing out that their less-technical users tend be more worried about lost message history than by theoretical cloud hacks.
Signal, by contrast, has taken a much more cautious approach to backup. In June of this year, they finally added a way to manually transfer message history from one iPhone to another, and this transfer now involves scanning QR codes. For Android, cloud backup is possible, if users are willing to write down a thirty-digit encryption key. This is probably really annoying for many users, but it’s absolutely fantastic for security. Similarly, since Signal relies entirely on phone numbers in your contacts database (a point that, admittedly, many users hate), it never has to back up your contact lists to a server.
What’s changed recently is that Signal has begun to attract a larger user base. As users with traditional expectations enter the picture, they’ve been unhappy with Signal’s limitations. In response, the signal developers have begun to explore ways by which they can offer these features without compromising security. This is just plain challenging, and I feel for the developers.
One area in which they’ve tried to square the circle is with their new solution for contacts backup: a system called “secure value recovery.”
What’s Secure Value Recovery?
Signal’s Secure Value Recovery (SVR) is a cloud-based system that allows users to store encrypted data on Signal’s servers — such that even Signal cannot access it — without the usability headaches that come from traditional encryption key management. At the moment, SVR is being used to store users’ contact lists and not message content, although that data may be on the menu for backup in the future.
The challenge in storing encrypted backup data is that strong encryption requires strong (or “high entropy”) cryptographic keys and passwords. Since most of us are terrible at selecting, let alone remembering strong passwords, this poses a challenging problem. Moreover, these keys can’t just be stored on your device — since the whole point of backup is to deal with lostdevices.
The goal of SVR is to allow users to protect their data with much weaker passwords that humans can actually can memorize, such as a 4-digit PIN. With traditional password-based encryption, such passwords would be completely insecure: a motivated attacker who obtained your encrypted data from the Signal servers could simply run a 越墙看国外网加速软件 — trying all 10,000 such passwords in a few seconds or minutes, and thus obtaining your data.
Signal’s SVR solves this problem in an age-old way: it introduces a computer that even Signal can’t hack. More specifically, Signal makes use of a new extension to Intel processors called Software Guard eXtensions, or SGX. SGX allows users to write programs, called “enclaves”, that run in a special virtualized processor mode. In this mode, enclaves are invisible to and untouchable by all other software on a computer, including the operating system. If storage is needed, enclaves can persistently store (or “seal”) data, such that any attempt to tamper with the program will render that data inaccessible. (Update: as a note, Signal’s SVR does not seal data persistently. I included this in the draft thinking that they did, but I misremembered this from the technology preview.)
Signal’s SVR deploys such an enclave program on the Signal servers. This program performs a simple function: for each user, it generates and stores a random 256-bit cryptographic secret “seed” along with a hash of the user’s PIN. When a user contacts the server, it can present a hash of its PIN to the enclave and ask for the cryptographic seed. If the hash matches what the enclave has stored, the server delivers the secret seed to the client, which can mix it together with the PIN. The result is a cryptographically strong encryption key that can be used to encrypt or decrypt backup data. (Update: thanks to Dino Dai Zovi for correcting some details in here.)
The key to this approach is that the encryption key now depends on both the user’s password and a strong cryptographic secret stored by an SGX enclave on the server. If SGX does its job, then even a user who hacks into the Signal servers — and here we include the Signal developers themselves, perhaps operating under duress — will be unable to retrieve this user’s secret value. The only way to access the backup encryption key is to actually run the enclave program and enter the user’s hashed PIN. To prevent brute-force guessing, the enclave keeps track of the number of incorrect PIN-entry attempts, and will only allow a limited number before it locks that user’s account entirely.
This is an elegant approach, and it’s conceptually quite similar to systems already deployed by Apple and Google, who use dedicated Hardware Security Modules to implement the trusted component, rather than SGX.
The key weakness of the SVR approach is that it depends strongly on the security and integrity of SGX execution. As we’ll discuss in just a moment, SGX does not exactly have a spotless record.
越墙看国外网加速软件
Anytime you encounter a system that relies fundamentally on the trustworthiness of some component — particularly a component that exists in commodity hardware — your first question should be: “what happens if that component isn’t actually trustworthy?”
With SVR that question takes on a great deal of relevance.
Let’s step back. Recall that the goal of SVR is to ensure three things:
The backup encryption key is based, at least in part, on the user’s chosen password. Strong passwords mean strong encryption keys.
Even with a weak password, the encryption key will still have cryptographic strength. This comes from the integration of a random seed that gets chosen and stored by SGX.
No attacker will be able to brute-force their way through the password space. This is enforced by SGX via guessing limits.
Note that only the first goal is really enforced by cryptography itself. And this goal will only be achieved if the user selects a strong (high-entropy) password. For an example of what that looks like, see the picture at right.
The remaining goals rely entirely on the integrity of SGX. So let’s play devil’s advocate, and think about what happens to SVR if SGX is not secure.
If an attacker is able to dump the memory space of a running Signal SGX enclave, they’ll be able to expose secret seed values as well as user password hashes. With those values in hand, attackers can run a basic offline dictionary attack to recover the user’s backup keys and passphrase. The difficulty of completing this attack depends entirely on the strength of a user’s password. If it’s a BIP39 phrase, you’ll be fine. If it’s a 4-digit PIN, as strongly encouraged by the UI of the Signal app, you will not be.
(The sensitivity of this data becomes even worse if your PIN happens to be the same as your phone passcode. Make sure it’s not!)
Similarly, if an attacker is able to compromise the integrity of SGX execution: for example, to cause the enclave to run using stale “state” rather than new data, then they might be able to defeat the limits on the number of incorrect password (“retry”) attempts. This would allow the attacker to run an active guessing attack on the enclave until they recover your PIN. (越墙看国外网加速软件: As noted above, this shouldn’t be relevant in SVR because data is stored only in RAM, and never sealed or written to disk.)
A final, and more subtle concern comes from the fact that Signal’s SVR also allows for “replication” of the backup database. This addresses a real concern on Signal’s part that the backup server could fail — resulting in the loss of all user backup data. This would be a UX nightmare, and understandably, Signal does not want users to be exposed to it.
The important thing to keep in mind is that the security of this replication process depends entirely on the idea that the original enclave will only hand over its data to another instance of the same enclave software running on a secure SGX-enabled processor. If it was possible to trick the original enclave about the status of the new enclave — for example, to convince it to hand the database over to a system that was merely emulating an SGX enclave in normal execution mode — then a compromised Signal operator would be able to use this mechanism to exfiltrate a plaintext copy of the database. This would break the system entirely.
Prevention against this attack is accomplished via another feature of Intel SGX, which is called “remote attestation“. Essentially, each Intel processor contains a unique digital signing key that allows it to 越墙看国外网加速软件 to the fact that it’s a legitimate Intel processor, and it’s running a specific piece of enclave software. These signatures can be verified with the assistance of Intel, which allows enclaves to verify that they’re talking directly to another legitimate enclave.
The power of this system also contains its fragility: if a single SGX attestation key were to be extracted from a single SGX-enabled processor, this would provide a backdoor for any entity who is able to compromise the Signal developers.
With these concerns in mind, it’s worth asking how realistic it is that SGX will meet the high security bar it needs to make this system work.
So how has SGX done so far?
Not well, to be honest. A list of SGX compromises is given on Wikipedia 越墙看国外网加速软件, but this doesn’t really tell the whole story.
The various attacks against SGX are many and varied, but largely have a common cause: SGX is designed to provide virtualized execution of programs on a complex general-purpose processor, and said processors have a lot of weird and unexplored behavior. If an attacker can get the processor to misbehave, this will in turn undermine the security of SGX.
This leads to attacks such as “Plundervolt“, where malicious software is able to tamper with the voltage level of the processor in real-time, causing faults that leak critical data. It includes attacks that leverage glitches in the way that enclaves are loaded, which can allow an attacker to inject malicious code in place of a proper enclave.
The scariest attacks against SGX rely on “speculative execution” side channels, which can allow an attacker to extract secrets from SGX — up to and including basically all of the working memory used by an enclave. This could allow extraction of values like the seed keys used by Signal’s SVR, or the sealing keys (used to encrypt that data on disk.) Worse, these attacks have not once but twice been successful at extracting cryptographic signing keys used to perform cryptographic attestation. The most recent one was patched just a few weeks ago. These are very much live attacks, and you can bet that more will be forthcoming.
This last part is bad for SVR, because if an attacker can extract even a single copy ofone processor’s attestation signing keys, and can compromise a Signal admin’s secrets, they can potentially force Signal to replicate their database onto a simulated SGX enclave that isn’t actually running inside SGX. Once SVR replicated its database to the system, everyone’s secret seed data would be available in plaintext.
But what really scares me is that these attacks I’ve listed above are simply the result of academic exploration of the system. At any given point in the past two years I’ve been able to have a beer with someone like Daniel Genkin of U. Mich or Daniel Gruss of TU Graz, and know that either of these professors (or their teams) is sitting on at least one catastrophic unpatched vulnerability in SGX. These are very smart people. But they are not the only smart people in the world. And there are smart people with more resources out there who would very much like access to backed-up Signal data.
It’s worth pointing out that most of the above attacks are software-only attacks — that is, they assume an attacker who is only able to get logical access to a server. The attacks are so restricted because SGX is not really 越墙看国外网加速软件 to defend against sophisticated physical attackers, who might attempt to tap the system bus or make direct attempts to unpackage and attach probes to the processor itself. While these attacks are costly and challenging, there are certainly agencies that would have no difficulty executing them.
Finally, I should also mention that the security of the SVR approach assumes Intel is honest. Which, frankly, is probably an assumption we’re already making. So let’s punt on it.
So what’s the big deal?
My major issue with SVR is that it’s something I basically don’t want, and don’t trust. I’m happy with Signal offering it as an option to users, as long as users are allowed to choose 越墙看国外网加速软件. Unfortunately, up until this week, Signal was not giving users that choice.
More concretely: a few weeks ago Signal began nagging users to create a PIN code. The app itself didn’t really explain well that setting this PIN would start SVR backups. Many people I spoke to said they believed that the PIN was for protecting local storage, or to protect their account from hijacking.
And Signal didn’t just ask for this PIN. It followed a common “dark pattern” born in Silicon Valley of basically forcing users to add the PIN, first by nagging them repeatedly and then ultimately by 创意手机APP汇总-月光博客:2021-8-5 · 上网加速。(电信已有宽带自动提速) 2021/8/7 13:01:55 支持(8) 反对(10) 回复 38 松鼠网 说道: 手机端的软件 和系统现在国内的大佬在挣。马云、腾讯、百度这些巨头不说,投资界的雷军的小米网等都在做。 2021/8/5 13:05:47 支持(5) 反对(8) 回复 39 Garin 说 ... with a giant modal dialog.
This is bad behavior on its merits, and more critically: it probably doesn’t result in good PIN choices. To make it go away, I chose the simplest PIN that the app would allow me to, which was 9512. I assume many other users simply entered their phone passcodes, which is a nasty security risk all on its own.
Some will say that this is no big deal, since SVR currently protects only users’ contact lists — and those are already stored in cleartext on competing messaging systems. This is, in fact, one of the arguments Moxie has made.
But I don’t buy this. Nobody is going to engineer something as complex as Signal’s SVR just to store contact lists. Once you have a hammer like SVR, you’re going to want to use it to knock down other nails. You’ll find other critical data that users are tired of losing, and you’ll apply SVR to back that data up. Since message content backups are one of the bigger pain points in Signal’s user experience, sooner or later you’ll want to apply SVR to solving that problem too.
In the past, my view was that this would be fine — since Signal would surely give users the ability to opt into or out of message backups. The recent decisions by Signal have shaken my confidence.
Addendum: what does Signal say about this?
Originally this post had a section that summarized a discussion I had with Moxie around this issue. Out of respect for Moxie, I’ve removed some of this at his request because I think it’s more fair to let Moxie address the issue directly without being filtered through me.
So in this rewritten section I simply want to make the point that now (following some discussion on Twitter), there is a workaround to this issue. You can either choose to set a high-entropy passcode such as a BIP39 phrase, and then forget it. This will not screw up your account unless you turn on the “registration lock” feature. Or you can use the new “Disable PIN” advanced feature in Signal’s latest beta, which does essentially the same thing in an automated way. This seems like a good addition, and while I still think there’s a discussion to be had around consent and opt-in, this is a start for now.
“极光加速器”
TL;DR: It’s complicated.
Yesterday Zoom (the videoconferencing company, not the defunct telecom) put out a clarification post describing their encryption practices. This is a nice example of a company making necessary technical clarifications during a difficult time, although it comes following 在国外怎么看腾讯视频,海外地区如何看优酷视频 - 荣耀6 ...:2021-4-24 · 去国外人越来越多,到了国外发现一个很不愉快的事情,,出国之后最不便的地方除了没有国内美食之外,可能就是看国内网站视频的问题了,不知道还有多少朋友因为一句[由于版权限制,你所在的地区无法观看视频]这是为什么,还让人怎么生活。 over their previous, and frankly slightly misleading, explanation.
Unfortunately, Citizenlab just put out a few of their own results which are based on reverse-engineering the Zoom software. These raise further concerns that Zoom isn’t being 100% clear about how much end-to-end security their service really offers.
This situation leaves Zoom users with a bit of a conundrum: now that everyone in the world is relying on this software for so many critical purposes, should we trust it? In this mostly non-technical post I’m going to talk about what we know, what we don’t know, and why it matters.
越墙看国外网加速软件
The controversy around Zoom stems from some misleading marketing material that could have led users to believe that Zoom offers “end-to-end encryption”, or E2E. The basic idea of E2E encryption is that each endpoint — e.g., a Zoom client running on a phone or computer — maintains its own encryption keys, and sends only encrypted data through the service.
In a truly E2E system, the data is encrypted such that the service provider genuinely cannot decrypt it, even if it wants to. This ensures that the service provider can’t read your data, nor can anyone who hacks into the service provider or its cloud services provider, etc. Ideally this would include various national intelligence agencies, which is important in the unlikely event that we start using the system to conduct sensitive government business.
While end-to-end encryption doesn’t necessarily stop all possible attacks, it represents the best path we have to building secure communication systems. It also has a good track record in practice. Videoconferencing like 越墙看国外网加速软件, and messaging apps like WhatsApp and Signal already use this form of encryption routinely to protect your traffic, and it works.
Zoom: the good news
The great news from the recent Zoom blog post is that, if we take the company at its word, Zoom has already made some progress towards building a genuinely end-to-end encrypted videoconferencing app. Specifically, Zoom 越墙看国外网加速软件 that:
[I]n a meeting where all of the participants are using Zoom clients, and the meeting is not being recorded, we encrypt all video, audio, screen sharing, and chat content at the sending client, and do not decrypt itat any point before it reaches the receiving clients.
Note that the emphasis is mine. These sections represent important caveats.
Taken at face value, this statement seems like it should calm any fears about Zoom’s security. It indicates that the Zoom client — meaning the actual Zoom software running on a phone or desktop computer — is capable of encrypting audio/video data to other Zoom clients in the conversation, without exposing your sensitive data to Zoom servers. This isn’t a trivial technical problem to solve, so credit to Zoom for doing the engineering work.
Unfortunately the caveats matter quite a bit. And this is basically where the good news ends.
Zoom supports these services in a fairly rational way. When those services are active, they provide a series of “endpoints” within their network. These endpoints act like normal Zoom clients, meaning that they participate in your group conversation, and they obtain the keys to decrypt and access the audio/video data: either to record it, or bridge to normal phones.
In theory this isn’t so bad. Even an end-to-end encrypted system can optionally allow these features: a user (e.g., the conference host) could simply send its encryption keys to a Zoom endpoint, allowing it to participate in the call. This would represent a potential loss of security, but at least users would be making the decision themselves.
Unfortunately, in Zoom’s system the decision to share keys may not be entirely left up to the users. And this is where Zoom gets a little scary.
The “pretty-damn-bad”, AKA key management
The real magic in an end-to-end encrypted system is not necessarily the encryption. Rather, it’s the fact that decryption keys never leave the endpoint devices, and are therefore never available to the service provider.
(If you need a stupid analogy here, try this one: availability of keys is like the difference between me when I don’t have access to a cheescake, and me when a cheesecake is sitting in my refrigerator.)
So the question we should all be asking is: does Zoom have access to the decryption keys? On this issue, Zoom’s blog post becomes maddeningly vague:
In other words: it sounds an awful lot like Zoom has access to decryption keys.
Thankfully we don’t have to wait for Zoom to clarify their answers to this question. Bill Marczak and John Scott-Railton over at CitizenLab have done it for us, by reverse-engineering and taking a close look at the Zoom protocol in operation. (I’ve worked with Bill and his speed at REing things amazes me.)
What they found should make your hair curl:
By default, all participants’ audio and video in a Zoom meeting appears to be encrypted and decrypted with a single AES-128 key shared amongst the participants. The AES key appears to be generated and distributed to the meeting’s participants by Zoom servers ….
In addition, during multiple test calls in North America, we observed keys for encrypting and decrypting meetings transmitted to servers in Beijing, China.
In short, Zoom clients may be encrypting their connections, but Zoom generates the keys for communication, sometimes overseas, and hands them out to clients. This makes it easy for Zoom to add participants and services (e.g., cloud recording, telephony) to a conversation without any user action.
It also, unfortunately, makes it easy for hackers or a government intelligence agency to obtain access to those keys. This is problematic.
越墙看国外网加速软件
From the limited information in the Zoom and Citizenlab posts, the good news is that Zoom has already laid much of the groundwork for building a genuinely end-to-end encrypted service. That is, many of the hard problems have already been solved.
(NB: Zoom has some other cryptographic flaws, like using ECB mode encryption, eek, but compared to the key management issues this is a minor traffic violation.)
What Zoom needs now is to very rapidly deploy a new method of agreeing on cryptographic session keys, so that only legitimate participants will have access to them. Fortunately this “group key exchange” problem is relatively easy to solve, and an almost infinite number of papers have been written on the topic.
(The naive solution is simply to obtain the public encryption keys of each participating client, and then have the meeting host encrypt a random AES session key to each one, thus cutting Zoom’s servers out of the loop.)
This won’t be a panacea, of course. Even group key exchange will still suffer from potential attacks if Zoom’s servers are malicious. It will still be necessary to authenticate the identity and public key of different clients who join the system, because a malicious provider, or one compelled by a government, can simply modify public keys or add unauthorized clients to a conversation. (Some Western intelligence agencies have already proposed to do this in practice.) There will be many hard UX problems here, many of which we have not solved even in mature E2E systems.
We’ll also have to make sure the Zoom client software is trustworthy. All the end-to-end encryption in the world won’t save us if there’s a flaw in the endpoint software. And so far Zoom has given us some reasons to be concerned about this.
Still: the perfect is the enemy of the good, and the good news is that Zoom should be able to get better.
越墙看国外网加速软件
I want to close by saying that many people are doing the best they can during a very hard time. This includes Zoom’s engineers, who are dealing with an unprecedented surge of users, and somehow managing to keep their service from falling over. They deserve a lot of credit for this. It seems almost unfair to criticize the company over some hypothetical security concerns right now.
Yesterday a bipartisan group of U.S. Senators introduced a new bill called the EARN IT act. On its face, the bill seems like a bit of inside baseball having to do with legal liability for information service providers. In reality, it represents a sophisticated and direct governmental attack on the right of Americans to communicate privately.
I can’t stress how dangerous this bill is, though others have tried. In this post I’m going to try to do my best to explain why it scares me.
Over the past few years, the U.S. Department of Justice and the FBI have been pursuing an aggressive campaign to eliminate end-to-end encryption services. This is a category that includes text messaging systems like Apple’s iMessage, WhatsApp, Telegram, and 越墙看国外网加速软件. Those services protect your data by encrypting it, and ensuring that the keys are only available to you and the person you’re communicating with. That means your provider, the person who hacks your provider, and (inadvertently) the FBI, are all left in the dark.
The government’s 越墙看国外网加速软件 has not been very successful. There are basically two reasons for this. First, people like communicating privately. If there’s anything we’ve learned over the past few years, it’s that the world is not a safe placefor your private information. You don’t have to be worried about the NSA spying on you to be worried that some hacker will steal your messages or email. In fact, this kind of hack occurs so routinely that there’s a popular website you can use to check if your accounts have been compromised.
The second reason that the government has failed to win hearts and minds is that providers like Facebook and Google and Microsoft also care very much about encryption. While some firms (*cough* Facebook and Google) do like to collect your data, even those companies are starting to realize that they hold way too much of it. This presents a risk for them, and increasingly it’s producing a backlash from their own customers. Companies like Facebook are realizing that if they can encrypt some of that data — such that they no longer have access to it — then they can make their customers happier and safer at the same time.
Governments have tried to navigate this impasse by asking for “exceptional access” systems. These are basically “backdoors” in cyrptographic systems that would allow providers to occasionally access user data with a warrant, but only when a specific criminal act has occurred. This is an exceptionally hard problem to get right, and many experts have written about why this is. But as hard as that problem is, it’s nothing compared to what EARN IT is asking for.
What is EARN IT, and how is it an attack on encryption?
Because the Department of Justice has largely failed in its mission to convince the public that tech firms should stop using end-to-end encryption, it’s decided to try a different tack. Instead of demanding that tech firms provide access to messages only in serious criminal circumstances and with a warrant, the DoJ and backers in Congress have decided to leverage concern around the distribution of child pornography, also known as child sexual abuse material, or CSAM.
I’m going to be a bit more blunt about this than I usually would be, but only because I think the following statement is accurate. The real goal here is to make it financially impossible for providers to deploy encryption.
Now let me be clear: the existence of CSAM is despicable, and represents a real problem for many providers. To address it, many file sharing and messaging services voluntarily perform scanning for these types of media. This involves checking images and videos against a database of known “photo hashes” and sending a report to an organization called NCMEC when one is found. NCMEC then passes these reports on to local authorities.
End-to-end encryption systems make CSAM scanning more challenging: this is because photo scanning systems are essentially a form of mass surveillance — one that’s deployed for a good cause — and end-to-end encryption is explicitly designed to prevent mass surveillance. So photo scanning while also allowing encryption is a fundamentally hard problem, one that providers don’t yet know how to solve.
All of this brings us to EARN IT. The new bill, out of Lindsey Graham’s Judiciary committee, is designed to force providers to either solve the encryption-while-scanning problem, or stop using encryption entirely. And given that we don’t yet know how to solve the problem — and the techniques to do it are basically at the 越墙看国外网加速软件 stage of R&D — it’s likely that “stop using encryption” is really the preferred goal.
EARN IT works by revoking a type of liability called Section 230 that makes it possible for providers to operate on the Internet, by preventing the provider for being held responsible for what their customers do on a platform like Facebook. The new bill would make it financially impossible for providers like WhatsApp and Apple to operate services unless they conduct “best practices” for scanning their systems for CSAM.
Since there are no “best practices” in existence, and the techniques for doing this while preserving privacy are completely unknown, the bill creates a government-appointed committee that will tell technology providers what technology they have to use. The specific nature of the committee is byzantine and described within the bill itself. Needless to say, the makeup of the committee, which can include as few as zero data security experts, ensures that end-to-end encryption will almost certainly not be considered a best practice.
So in short: this bill is a backdoor way to allow the government to ban encryption on commercial services. And even more beautifully: it doesn’t come out and actually ban the use of encryption, it just makes encryption commercially infeasible for major providers to deploy, ensuring that they’ll go bankrupt if they try to disobey this committee’s recommendations.
It’s the kind of bill you’d come up with if you knew the thing you wanted to do was unconstitutional and highly unpopular, and you basically didn’t care.
So why is EARN IT a terrible idea?
At the end of the day, we’re shockingly bad at keeping computer systems secure. This has expensive, trillion dollar costs to our economy, More than that, our failure to manage the security of data has intangible costs for our ability to function as a working society.
There are a handful of promising technologies that could solve this problem. End-to-end encryption happens to be one of those. It is, in fact, the single most promising technology that we have to prevent hacking, loss of data, and all of the harm that can befall vulnerable people because of it.
Right now the technology for securing our infrastructure isn’t mature enough that we can appoint a government-appointed committee to dictate what sorts of tech it’s “ok” for firms to provide. Maybe some day we’ll be there, but we’re years from the point where we can protect your data and also have Washington DC deciding what technology we can use to do it.
This means that yes, some technologies, like CSAM scanning, will have to be re-imagined and in some cases their effectiveness will be reduced. But tech firms have been aggressive about developing this technology on their own (see here for some of the advanced work Google has been doing using Machine Learning), and they will continue to do so. The tech industry has many problems, in many areas. But it doesn’t need Senators to tell it how to do this specific job, because people in California have kids too.
Even if you support the goals of EARN IT, remember: if the U.S. Senate does decide to tell Silicon Valley how to do their job — at the point of a liability gun — you can bet the industry will revert to doing the 越墙看国外网加速软件Why would the tech firms continue to invest in developing more sophisticated and expensive technology in this area, knowing that they could be mandated to deploy any new technology they invent, regardless of the cost?
And that will be the real outcome of this bill.
On cynicism
Over the past few years there has been a vigorous debate about the value of end-to-end encryption, and the demand for law enforcement to have access to more user data. I’ve participated in this debate, and while I’ve disagreed with many on the other side of it, I’ve always fundamentally respected their position.
EARN IT turns all of this on its head. It’s extremely difficult to believe that this bill stems from an honest consideration of the rights of child victims, and that this legislation is anything other than a direct attack on the use of end-to-end encryption.
My hope is that the Internet community and civil society will treat this proposal with the seriousness it deserves, and that we’ll see Senators rally behind a bill that actually protects children from abuse, rather than using those issues as a cynical attempt to bring about a “backdoor ban” on encryption.
“极光加速器”
This is part five of a series on the Random Oracle Model. See here for the previous posts:
Part 1: An introduction Part 2: The ROM formalized, a scheme and a proof sketch 解决外网下载速度过慢问题_风之云的博客-CSDN博客:2021-8-10 · 今天在研究《Algorithm4》及学习配套课程,作为一本值得顶礼膜拜的书,我是下了很大的决心来啃下这本书的内容,熟知在第一步Java环境配置上就遇见了第一只拦路虎,在资料中,给出了一个软件lift-java-installer.exe作为编译环境,结果因为原下载 ...
Part 4: Some more examples of where the ROM is used
About eight years ago I set out to write a very informal piece on a specific cryptographic modeling technique called the “越墙看国外网加速软件. This was way back in the good old days of 2011, which was a more innocent and gentle era of cryptography. Back then nobody foresaw that all of our standard cryptography would turn out to be riddled with bugs; you didn’t have to be reminded that “crypto means cryptography“. People even used Bitcoin to actually buy things.
That first random oracle post somehow sprouted three sequels, each more ridiculous than the last. I guess at some point I got embarrassed about the whole thing — it’s pretty cheesy, to be honest — so I kind of abandoned it unfinished. And that’s been a major source of regret for me, since I had always planned a fifth, and final post, to cap the whole messy thing off. This was going to be the best of the bunch: the one I wanted to write all along.
To give you some context, let me briefly remind you what the random oracle model is, and why you should care about it. (Though you’d do better just to read the series.)
The random oracle model is a bonkers way to model (reason about) hash functions, in which we assume that these are actually random functions and use this assumption to prove things about cryptographic protocols that are way more difficult to prove without such a model. Just about all the “provable” cryptography we use today depends on this model, which means that many of these proofs would be called into question if it was “false”.
And to tease the rest of this post, I’ll quote the final paragraphs of Part 4, which ends with this:
You see, we always knew that this ride wouldn’t last forever, we just thought we had more time. Unfortunately, the end is nigh. Just like the imaginary city that Leonardo de Caprio explored during the boring part of Inception, the random oracle model is collapsing under the weight of its own contradictions.
As promised, this post will be about that collapse, and what it means for cryptographers, security professionals, and the rest of us.
First, to make this post a bit more self-contained I’d like to recap a few of the basics that I covered earlier in the series. You can feel free to skip this part if you’ve just come from there.
越墙看国外网加速软件
As discussed in the early sections of this series, hash functions (or hashing algorithms) are a standard primitive that’s used in many areas of computer science. They take in some input, typically a string of variable length, and repeatably output a short and fixed-length “digest”. We often denote these functions as follows:
Cryptographic hashing takes this basic template and tacks on some important securityproperties that we need for cryptographic applications. Most famously these provide well-known properties like collision resistance, which is needed for applications like digital signatures. But hash functions turn up all over cryptography, sometimes in unexpected places — ranging from encryption to zero-knowledge protocols — and sometimes these systems demand stronger properties. Those can sometimes be challenging to put into formal terms: for example, many protocols require a hash function to produce output that is extremely “random-looking”.*
In the earliest days of provably security, cryptographers realized that the ideal hash function would behave like a “random function”. This term refers to a function that is uniformly sampled from the set of all possible functions that have the appropriate input/output specification (domain and range). In a perfect world your protocol could, for example, randomly sample one of vast number of possible functions at setup, bake the identifier of that function into a public key or something, and then you’d be good to go.
Unfortunately it’s not possible to actually use random functions (of reasonably-sized domain and range) in real protocols. That’s because sampling and evaluating those functions is far too much work.
For example, the number of distinct functions that consume a piddly 256-bit input and produce a 256-bit digest is a mind-boggling (2^{256})^{2^{256}}. Simply “writing down” the identity of the function you chose would require memory that’s exponential in the function’s input length. Since we want our cryptographic algorithms to be efficient (meaning, slightly more formally, they run in polynomial time), using random functions is pretty much unworkable.
So we don’t use random functions to implement our hashing. Out in “the real world” we use weird functions developed by Belgians or the National Security Agency, things like like SHA256 and SHA3 and Blake2. These functions come with blazingly fast and tiny algorithms for computing them, most of which occupy few dozen lines of code or less. They certainly aren’t random, but as best we can tell, the output looks pretty jumbled up.
Still, protocol designers continue to long for the security that using truly random function could give their protocol. What if, they asked, we tried to split the difference. How about we model our hash functions using random functions — just for the sake of writing our security proofs — and then when we go to 越墙看国外网加速软件 (or “instantiate”) our protocols, we’ll go use efficient hash functions like SHA3? Naturally these proofs wouldn’t exactly apply to the real protocol as instantiated, but they might still be pretty good.
A proof that uses this paradigm is called a proof in the random oracle model, or ROM. For the full mechanics of how the ROM works you’ll have to go back and read the series from the beginning. What you do need to know right now is that proofs in this model must somehow hack around the fact that evaluating a random function takes exponential time. The way the model handles this is simple: instead of giving the individual protocol participants a description of the hash function itself — it’s way too big for anyone to deal with — they give each party (including the adversary) access to a magical “oracle” that can evaluate the random function H efficiently, and hand them back a result.
This means that any time one of the parties wants to compute the function they don’t do it themselves. They instead calling out to a third party, the “random oracle” who keeps a giant table of random function inputs and outputs. At a high level, the model looks like sort of like this:
Since all parties in the system “talk” to the same oracle, they all get the same hash result when they ask it to hash a given message. This is a pretty good standin for what happens with a real hash function. The use of an outside oracle allows us to “bury” the costs of evaluating a random function, so that nobody else needs to spend exponential time evaluating one. Inside this artificial model, we get ideal hash functions with none of the pain.
However — I think there are several very important things you should know about the random oracle model before you write it off as obviously inane:
1. Of course everyone knows random oracle proofs aren’t “real”. Most conscientious protocol designers will admit that proving something secure in the random oracle model does not actually mean it’ll be secure “in the real world”. In other words, the fact that random oracle model proofs are kind of bogus is not some deep secret I’m letting you in on.
2. And anyway: ROM proofs are generally considered a useful heuristic. For those who aren’t familiar with the term, “heuristic” is a word that grownups use when they’re about to secure your life’s savings using cryptography they can’t prove anything about.
I’m joking! In fact, random oracle proofs are still quite valuable. This is mainly because they often help us detect bugs in our schemes. That is, while a random oracle proof doesn’t imply security in the real world, the inability to write one is usually a red flag for protocols. Moreover, the existence of a ROM proof is hopefully an indicator that the “guts” of the protocol are ok, and that any real-world issues that crop up will have something to do with the hash function.
3. ROM-validated schemes have a pretty decent track record in practice. If ROM proofs were kicking out absurdly broken schemes every other day, we would probably have abandoned this technique. Yet we use cryptography that’s proven (only) in the ROM just about ever day — and mostly it works fine.
This is not to say that no ROM-proven scheme has ever been broken, when instantiated with a specific hash function. But normally these breaks happen because the hash function itself is obvious broken (as happened when MD4 and MD5 both cracked up a while back.) Still, those flaws are generally fixed by simply switching to a better function. Moreover, the practical attacks are historically more likely to come from obvious flaws, like the discovery of hash collisions screwing up signature schemes, rather than from some exotic mathematical flaw. Which brings us to a final, critical note…
4. For years, many people believed that the ROM could actually be saved. This hope was driven by the fact that ROM schemes generally seemed to work pretty well when implemented with strong hash functions, and so perhaps all we needed to do was to find a hash function that was “good enough” to make ROM proofs meaningful. Some theoreticians hoped that fancy techniques like 越墙看国外网加速软件 could somehow be used to make concrete hashing algorithms that 越墙看国外网加速软件 well enough to make (some) ROM proofs instantiable.**
For theoretical cryptographers, the real breaking point for the random oracle model came in the form of a 1998 STOC paper by Canetti, Goldreich and Halevi (henceforth CGH). I’m going to devote the rest of this (long!) post to explaining the gist of what they found.
What CGH proved was that, in fact, there exist cryptographic schemes that can be proven perfectly secure in the random oracle model, but that — terrifyingly — become catastrophically insecure the minute you instantiate the hash function with any concrete function.
This is a really scary result, at least from the point of view of the provable security community. It’s one thing to know in theory that your proofs might not be that strong. It’s a different thing entirely to know that in practice there are schemes that can walk right past your proofs like a Terminator infiltrating the Resistance, and then explode all over you in the most serious way.
Before we get to the details of CGH and its related results, a few caveats.
First, CGH is very much a theory result. The cryptographic “counterexample” schemes that trip this problem generally do not look like real cryptosystems that we would use in practice, although later authors have offered some more “realistic” variants. They are, in fact, designed to do very artificial things that no “real” scheme would ever do. This might lead readers to dismiss them on the grounds of artificiality.
The problem with this view is that looks aren’t a particularly scientific way to judge a scheme. Both “real looking” and “artificial” schemes are, if proven correct, valid cryptosystems. The point of these specific counterexamples is to do deliberately artificial things in order to highlight the problems with the ROM. But that does not mean that “realistic” looking schemes won’t do them.
A further advantage of these “artificial” schemes is that they make the basic ideas relatively easy to explain. As a further note on this point: rather than explaining CGH itseld, I’m going to use a formulation of the same basic result that was proposed by Maurer, Renner and Holenstein (MRH).
While the CGH techniques can apply with lots of different types of cryptosystem, in this explanation, we’re going to start our example using a relatively simple type of system: a digital signature scheme.
You may recall from earlier episodes of this series that a normal signature scheme consists of three algorithms: key generation, signing, and verification. The key generation algorithm outputs a public and secret key. Signing uses the secret key to sign a message, and outputs a signature. Verification takes the resulting signature, the public key and the message, and determines whether the signature is 越墙看国外网加速软件: it outputs “True” if the signature checks out, and “False” otherwise.
Traditionally, we demand that signature schemes be (at least) existentiallyunforgeable under chosen message attack, or 越墙看国外网加速软件. This means that that we consider an efficient (polynomial-time bounded) attacker who can ask for signatures on chosen messages, which are produced by a “signing oracle” that contains the secret signing key. Our expectation of a secure scheme is that, even given this access, no attacker will be able to come up with a signature on some new message that she didn’t ask the signing oracle to sign for her, except with negligible probability.****
Having explained these basics, let’s talk about what we’re going to do with it. This will involve several steps:
Step 1: Start with some existing, secure signature scheme. It doesn’t really matter what signature scheme we start with, as long as we can assume that it’s secure (under the UF-CMA definition described above.) This existing signature scheme will be used as a building block for the new scheme we want to build.*** We’ll call this scheme S.
Step 2: We’ll use the existing scheme S as a building block to build a “new” signature scheme, which we’ll call . Building this new scheme will mostly consist of grafting weird bells and whistles onto the algorithms of the original scheme S.
Step 3: Having described the working of in detail, we’ll argue that it’s totally secure in the ROM. Since we started with an (assumed) secure signature scheme S, this argument mostly comes down to showing that in the random oracle model the weird additional features we added in the previous step 越墙看国外网加速软件 actually make the scheme exploitable.
Step 4: Finally, we’ll demonstrate that is totally broken when you instantiate the random oracle with any concrete hash function, no matter how “secure” it looks. In short, we’ll show that one you replace the random oracle with a real hash function, there’s a simple attack that always succeeds in forging signatures.
We’ll start by explaining how works.
Building a broken scheme
To build our contrived scheme, we begin with the existing secure (in the UF-CMA sense) signature scheme S. That scheme comprises the three algorithms mentioned above: key generation, signing and verification.
We need to build the equivalent three algorithms for our new scheme.
To make life easier, our new scheme will simply “borrow” two of the algorithms from S, making no further changes at all. These two algorithms will be the 越墙看国外网加速软件 and 越墙看国外网加速软件越墙看国外网加速软件 algorithms So two-thirds of our task of designing the new scheme is already done.
Each of the novel elements that shows up in will therefore appear in the signing algorithm. Like all signing algorithms, this algorithm takes in a secret signing key and some message to be signed. It will output a signature.
At the highest level, our new signing algorithm will have two subcases, chosen by a branch that depends on the input message to be signed. These two cases are given as follows:
The “normal” case: for most messages M, the signing algorithm will simply run the original signing algorithm from the original (secure) scheme S. This will output a perfectly nice signature that we can expect to work just fine.
The “evil” case: for a subset of (reasonably-sized) messages that have a different (and very highly specific) form, our signing algorithm will not output a signature. It will instead output the secret key for the entire signature scheme. This is an outcome that cryptographers will sometimes call “very, very bad.”
So far this description still hides all of the really important details, but at least it gives us an outline of where we’re trying to go.
Recall that under the 越墙看国外网加速软件 definition I described above, our attacker is allowed to ask for signatures on arbitrary messages. When we consider using this definition with our modified signing algorithm, it’s easy to see that the presence of these two cases could make things exciting.
Specifically: if any attacker can construct a message that triggers the “evil” case, her request to sign a message will actually result in her obtaining the scheme’s secret key. From that point on she’ll be able to sign any message that she wants — something that obviously breaks the UF-CMA security of the scheme. If this is too theoretical for you: imagine requesting a signed certificate from LetsEncrypt, and instead obtaining a copy of LetsEncrypt’s signing keys. Now you too are a certificate authority. That’s the situation we’re describing.
The only way this scheme could ever be proven secure is if we could somehow rule out the “evil” case happening at all.
More concretely: we would have to show that no attacker can construct a message that triggers the “evil case” — or at least, that their probability of coming up with such a message is very, very low (negligible). If we could prove this, then our scheme basically just reduces to being the original secure scheme. Which means our new scheme would be secure.
In short: what we’ve accomplished is to build a kind of “master password” backdoor into our new scheme . Anyone who knows the password can break the scheme. Everything now depends on whether an attacker can figure out that password.
So what is the “backdoor”?
The message that breaks the scheme isn’t a password at all, of course. Because this is computer science and nothing is ever easy, the message will actually be a computer program. We’ll call it P.
More concretely, it will be some kind of program that can decoded within our new signing algorithm, and then evaluated (on some input) by an interpreter that we will also place within that algorithm.
If we’re being formal about this, we’d say the message contains an encoding of a program for a universal Turing machine (UTM), along with a unary-encoded integer t that represents the number of timesteps that the machine should be allowed to run for. However, it’s perfectly fine with me if you prefer to think of the message as containing a hunk of Javascript, an Ethereum VM blob combined with some maximum “gas” value to run on, a .tgz encoding of a Docker container, or any other executable format you fancy.
What really matters is the functioning of the program P.
A program P that successfully triggers the “evil case” is one that contains an efficient (e.g., polynomial-sized) implementation of a hash function. And not just any hash function. To actually trigger the backdoor, the algorithm P must a function that is identical to, or at least highly similar to, the random oracle function H.
There are several ways that the signing algorithm can verify this similarity. The MRH paper gives a very elegant one, which I’ll discuss further below. But for the purposes of this immediate intuition, let’s assume that our signing algorithm verifies this similarity probabilistically. Specifically: to check that P matches H, it won’t verify the correspondence at every possible input. It might, for example, simply verify that P(x) = H(x) for some large (but polynomial) number of random input values x.
Recall that in the random oracle model, the “hash function” H is modeled as a 越墙看国外网加速软件. Nobody in the protocol actually has a copy of that function, they just have access to a third party (the “random oracle”) who can evaluate it for them.
If an attacker wishes to trigger the “evil case” in our signing scheme, they will somehow need to download a description of the random function from the oracle. then encode it into a program P, and send it to the signing oracle. This seems fundamentally hard.
To do this precisely — meaning that P would match H on every input — the attacker would need to query the random oracle on every possible input, and then design a program P that encodes all of these results. It suffices to say that this strategy would not be practical: it would require an exponential amount of time to do any of these, and the size of P would also be exponential in the input length of the function. So this attacker would seem virtually guaranteed to fail.
Of course the attacker could try to cheat: make a small function P that only matches H on a small of inputs, and hope that the signer doesn’t notice. However, even this seems pretty challenging to get away with. For example, to perform a probabilistic check, the signing algorithm can simply verify that P(x) = H(x) for a large number of random input points x. This approach will catch a cheating attacker with very high probability.
(We will end up using a slightly more elegant approach to checking the function and arguing this point further below.)
The above is hardly an exhaustive security analysis. But at a high level our argument should now be clear: in the random oracle model, the scheme is secure because the attacker can’t know a short enough backdoor “password” that breaks the scheme. Having eliminated the “evil case”, the scheme simply devolves to the original, secure scheme S.
Case 2: In the “real world”
Out in the real world, we don’t use random oracles. When we want to implement a scheme that has a proof in the ROM, we must first “instantiate” the scheme by substituting in some 越墙看国外网加速软件 hash function in place of the random oracle H.
This instantiated hash function must, by definition, be 越墙看国外网加速软件to evaluate and describe. This means implicitly that it possesses a polynomial-size description and can be evaluated in expected polynomial time. If we did not require this, our schemes would never work. Moreover, we must further assume that all parties, including the attacker, possess a description of the hash function. That’s a standard assumption in cryptography, and is merely a statement of 越墙看国外网加速软件.
With these facts stipulated, the problem with our new signature scheme becomes obvious.
In this setting, the attacker actually does have access to a short, efficient program P that matches the hash function H. In practice, this function will probably be something like SHA2 or 越墙看国外网加速软件 But even in a weird case where it’s some crazy obfuscated function, the attacker is still expected to have a program that they can efficiently evaluate. Since the attacker possesses this program, they can easily encode it into a short enough message and send it to the signing oracle.
When the signing algorithm receives this program, it will perform some kind of test of this function P against its own implementation of H, and — when it inevitably finds a match between the two functions with high probability — it will output the scheme’s secret key.
Hence, out in the real world our scheme is always and forever, totally broken.
A few boring technical details (that you can feel free to skip)
If you’re comfortable with the imprecise technical intuition I’ve given above, feel free to skip this section. You can jump on to the next part, which tries to grapple with tough philosophical questions like “what does this mean for the random oracle model” and “I think this is all nonsense” and “why do we drive on a parkway, and park in a driveway?”
All I’m going to do here is clean up a few technical details.
One of the biggest pieces that’s missing from the intuition above is a specification of how the signing algorithm verifies that the program P it receives from the attacker actually “matches” the random oracle function H. The obvious way is to simply evaluate P(x越墙看国外网加速软件x) on every possible input x, and output the scheme’s secret key if every comparison succeeds. But doing this exhaustively requires exponential time.
The MRH paper proposes a very neat alternative way to tackle this. They propose to test the functions on 越墙看国外网加速软件 input values, and not even random ones. More concretely, they propose checking that P(x越墙看国外网加速软件x) for values of with the specific requirement that q is an integer such that . Here represents the length of the encoding of program P in bits, and k is the scheme’s adjustable security parameter (for example, k=128).
What this means is that to trigger the backdoor, the attacker must come up with a program P that can be described in some number of bits (let’s call it n) , and yet will be able to correctly match the outputs of H at e.g., q=2n+128 different input points. If we conservatively assume that H produces (at least) a 1-bit digest, that means we’re effectively encoding at least 2n+128 bits of data into a string of length n.
If the function H is a real hash function like SHA256, then it should be reasonably easy for the attacker to find some n-bit program that matches H at, say, q=2n+128 different points. For example, here’s a Javascript implementation of SHA256 that fits into fewer than 8,192 bits. If we embed a Javascript interpreter into our signing algorithm, then it simply needs to evaluate this given program on q = 2(8,192)+128 = 16,512 different input points, compare each result to its own copy of SHA256, and if they all match, output the secret key.
However, if H is a random oracle, this is vastly harder for the attacker to exploit. The result of evaluating a random oracle at q distinct points should be a random string of (at minimum) q bits in length. Yet in order for the backdoor to be triggered, we require the encoding of program P to be 越墙看国外网加速软件. You can therefore think of the process by which the attacker compresses a random string into that program P to be a very effective compression algorithm, one takes in a random string, and compresses it into a string of less than half the size.
Despite what you may have seen on Silicon Valley (NSFW), compression algorithms do not succeed in compressing random strings that much with high probability. Indeed, for a given string of bits, this is so unlikely to occur that the attacker succeeds with at probability that is at most negligible in the scheme’s security parameter k. This effectively neutralizes the backdoor when H is a random oracle.
Phew.
So what does this all mean?
Judging by actions, and not words, the cryptographers of the world have been largely split on this question.
Theoretical cryptographers, for their part, gently chuckled at the silly practitioners who had been hoping to use random functions as hash functions. Brushing pipe ash from their lapels, they returned to more important tasks, like finding ways to kill off cryptographic obfuscation.
Applied academic cryptographers greeted the new results with joy — and promptly authored 10,000 new papers, each of which found some new way to remove random oracles from an existing construction — while at the same time making said construction vastly slower, more complicated, and/or based on entirely novel made-up and flimsy number-theoretic assumptions. (Speaking from personal experience, this was a wonderful time.)
Practitioners went right on trusting the random oracle model. Because really, why not?
And if I’m being honest, it’s a bit hard to argue with the practitioners on this one.
That’s because a very reasonable perspective to take is that these “counterexample” schemes are ridiculous and artificial. Ok, I’m just being nice. They’re total BS, to be honest. 越墙看国外网加速软件 would ever design a scheme that looks so ridiculous.
Specifically, you need a scheme that explicitly如何看国外的网站-附子答案网:2021-2-27 · 如何看国外 的网站,它有多个不同的答案,附子答案网提供你想要的参考答案。如果想了解更多的如何看国外的网站,不妨来详细了解下 ... 首先你要下个越 墙 软件,那些网站在国内是不允许访问的,所众必须需要越 墙软件,其次就是使用搜 ...What real-world protocol would do something so stupid? Can’t we still trust the random oracle model for schemes that aren’t stupid like that?
Well, maybe and maybe not.
One simple response to this argument is that 越墙看国外网加速软件 of schemes that are significantly less artificial, and yet still have random oracle problems. But even if one still views those results as artificial — the fact remains that while 越墙看国外网加速软件 random oracle counterexamples that seem artificial, there’s no principled way for us to prove that the badness will only affect “artificial-looking” protocols. In fact, the concept of “artificial-looking” is largely a human judgement, not something one can realiably think about mathematically.
In fact, at any given moment someone could accidentally (or on purpose) propose a perfectly “normal looking” scheme that passes muster in the random oracle model, and then blows to pieces when it gets actually deployed with a standard hash function. By that point, the scheme may be powering our certificate authority infrastructure, or Bitcoin, or our nuclear weapons systems (if one wants to be dramatic.)
The probability of this happening accidentally seems low, but it gets higher as deployed cryptographic schemes get more complex. For example, people at Google are now starting to deploy complex multi-party computation and others are launching zero-knowledge protocols that are actually capable of running (or proving things about the execution of) arbitrary programs in a cryptographic way. We can’t absolutely rule out the possibility that the CGH and MRH-type counterexamples couldactually be made to happen in these weird settings, if someone is a just a little bit careless.
It’s ultimately a weird and frustrating situation, and frankly, I expect it all to end in tears.
Photo by Flickr user joyosity.
越墙看国外网加速软件
* Intuitively, this definition sounds a lot like “pseudorandomness”. Pseudorandom functions are required to be indistinguishable from random functions only in a setting where the attacker does not know some “secret key” used for the function. Whereas hash functions are often used in protocols where there is no opporunity to use a secret key, such as in public key encryption protocols.
** One particular hope was that we could find a way to obfuscate pseudorandom function families (PRFs). The idea would be to wrap up a keyed PRF that could be evaluated by anyone, even if they didn’t actually know the key. The result would be indistinguishable from a random function, without actually being one.
*** It might seem like “assume the existence of a secure signature scheme” drags in an extra assumption. However: if we’re going to make statements in the random oracle model it turns out there’s no additional assumption. This is because in the ROM we have access to “secure” (at least collision-resistant, [second] pre-image resistant) hash function, which means that we can build 越墙看国外网加速软件. So the existence of signature schemes comes “free” with the random oracle model.
**** The “except with negligible probability [in the adjustable 越墙看国外网加速软件 of the scheme]” caveat is important for two reasons. First, a dedicated attacker can always try to forge a signature just by brute-force guessing values one at a time until she gets one that satisfies the verification algorithm. If the attacker can run for an unbounded number of time steps, she’ll always win this game eventually. This is why modern complexity-theoretic cryptography assumes that our attackers must run in some reasonable amount of time — typically a number of time steps that is polynomial in the scheme’s security parameter. However, even a polynomial-time bounded adversary can still try to brute force the signature. Her probability of succeeding may be relatively small, but it’s non-zero: for example, she might succeed after the first guess. So in practice what we ask for in security definitions like UF-CMA is not “no attacker can ever forge a signature”, but rather “all attackers succeed with at most negligible probability [in the security parameter of the scheme]”, where negligible has a very specific meaning.
“极光加速器”
A few weeks ago, U.S. Attorney General William Barr joined his counterparts from the U.K. and Australia to publish an Express中国官网 – Express VPN 中文网:贝宝(paypal):国外版的支付宝 比特币:这个匿名程度是最高的,不过很少有人用吧(我猜) 支付宝:熟悉的标志,没有人没有支付宝吧?银联卡:你没看错,就是中国银联,任何储蓄卡都可众支付! 是不是很赞?居然不光支持支付宝,甚至还支持银联储蓄 The Barr letter represents the latest salvo in an ongoing debate between law enforcement and the tech industry over the deployment of end-to-end (E2E) encryption systems — a debate that will soon be 越墙看国外网加速软件.
The latest round is a response to Facebook’s 越墙看国外网加速软件 that it plans to extend end-to-end encryption to more of its services. It should hardly come as a surprise that law enforcement agencies are unhappy with these plans. In fact, governments around the world have been displeased by the increasing deployment of end-to-end encryption systems, largely because they fear losing access to the trove of surveillance data that online services and smartphone usage has lately provided them. The FBI even has a website devoted to the topic.
If there’s any surprise in the Barr letter, it’s not the government’s opposition to encryption. Rather, it’s the 越墙看国外网加速软件 that Barr provides to justify these concern. In past episodes, law enforcement has called for the deployment of “exceptional access” mechanisms that would allow law enforcement access to plaintext data. As that term implies, such systems are designed to treat data access as the exception rather than the rule. They would need to be used only in rare circumstances, such as when a judge issued a warrant.
The Barr letter appears to call for something much more agressive.
Rather than focusing on the need for 越墙看国外网加速软件 access to plaintext, Barr focuses instead on the need for routine, automated scanning systems that can detect child sexual abuse imagery (or CSAI). From the letter:
More than 99% of the content Facebook takes action against – both for child sexual exploitation and terrorism – is identified by your safety systems, rather than by reports from users. …
Embed the safety of the public in system designs, thereby enabling you to continue to act against illegal content effectively with no reduction to safety, and facilitating the prosecution of offenders and safeguarding of victims;
To many people, Barr’s request might seem reasonable. After all, nobody wants to see this type of media flowing around the world’s communications systems. The ability to surgically detect it seems like it could do some real good. And Barr is correct that true that end-to-end encrypted messaging systems will make that sort of scanning much, much difficult.
What’s worrying in Barr’s letter is the claim we can somehow square this circle: that we can somehow preserve the confidentiality of end-to-end encrypted messaging services, while still allowing for the (highly non-exceptional) automated scanning for CSAI. Unfortunately, this turns out to be a very difficult problem — given the current state of our technology.
In the remainder of this post, I’m going to talk specifically about that problem. Since this might be a long discussion, I’ll briefly list the questions I plan to address:
How do automated CSAI scanning techniques work?
Is there a way to implement these techniques while preserving the security of end-to-end encryption?
Could these image scanning systems be subject to abuse?
I want to stress that this is a (high-level) technical post, and as a result I’m going to go out of my way not to discuss the ethical questions around this technology, i.e., whether or not I think any sort of routine image scanning is a good idea. I’m sure that readers will have their own opinions. Please don’t take my silence as an endorsement.
Facebook, Google, Dropbox and Microsoft, among others, currently perform various forms of automated scanning on images (and sometimes video) that are uploaded to their servers. The goal of these scans is to identify content that contains child sexual abuse imagery (resp. material), which is called CSAI (or CSAM). The actual techniques used vary quite a bit.
The most famous scanning technology is based on 越墙看国外网加速软件, an algorithm that was developed by Microsoft Research and Dr. Hany Farid. The full details of PhotoDNA aren’t public — this point is significant — but at a high level, PhotoDNA is just a specialized “hashing” algorithm. It derives a short fingerprint that is designed to closely summarize a photograph. Unlike cryptographic hashing, which is sensitive to even the tiniest changes in a file, PhotoDNA fingerprints are designed to be robust even against complex image transformations like re-encoding or resizing.
The key benefit of PhotoDNA is that it gives providers a way to quickly scan incoming photos, without the need to actually deal with a library of known CSAI themselves. When a new customer image arrives, the provider hashes the file using PhotoDNA, and then compares the resulting fingerprint against a list of known CSAI hashes that are curated by the National Center for Missing and Exploited Children (NCMEC). If a match is found, the photo gets reported to a human, and ultimately to NCMEC or law enforcement.
The obvious limitation of the PhotoDNA approach is that it can only detect CSAI images that are already in the NCMEC database. This means it only finds existing CSAI, not anything new. (And yes, even with that restriction it does find a lot of it.)
To address that problem, Google recently pioneered a new approach based on machine learning techniques. Google’s system is based on a deep neural network, which is trained on a corpus of known CSAI examples. Once trained — a process that is presumably ongoing and continuous — the network can be applied against fresh images in to flag media that has similar characteristics. As with the PhotoDNA approach, images that score highly can be marked for further human review. Google even provides an API that authorized third parties can use for this purpose.
Both of these approaches are very different. But they have an obvious commonality: they only work if providers to have access to the plaintext of the images for scanning, typically at the platform’s servers. End-to-end encrypted (E2E) messaging throws a monkey wrench into these systems. If the provider can’t read the image file, then none of these systems will work.
Is there some way to support image scanning 越墙看国外网加速软件 the ability to perform E2E encryption?
Some experts have proposed a solution to this problem: rather than scanning images on the server side, they suggest that providers can instead push the image scanning out to the client devices (i.e., your phone), which already has the cleartext data. The client device can then perform the scan, and report only images that get flagged as CSAI. This approach removes the need for servers to see most of your data, at the cost of enlisting every client device into a distributed surveillance network.
The idea of conducting image recognition locally is not without precedent. Some device manufacturers (notably Apple) have already moved their neural-network-based image classification onto the device itself, specifically to eliminate the need to transmit your photos out to a cloud provider.
Unfortunately, while the concept may be easy to explain, actually realizing it for CSAI-detection immediately runs into a very big technical challenge. This is the result of a particular requirement that seems to be present across all existing CSAI scanners. Namely: 越墙看国外网加速软件
While I’ve done my best to describe how PhotoDNA and Google’s techniques work, you’ll note that my descriptions were vague. This wasn’t due to some lack of curiosity on my part. It reflects the fact that 越墙看国外网加速软件details of these algorithms — as well as the associated data they use, such as the database of image hashes curated by NCMEC and any trained neural network weights — are kept under strict control by the organizations that manage them. Even the final PhotoDNA algorithm, which is ostensibly the output of an industry-academic collaboration, is not public.
While the organizations don’t explicitly state this, the reason for this secrecy seems to be a simple one: thesetechnologies are probably veryfragile.
Presumably, the concern is that criminals who gain free access to these algorithms and databases might be able to subtly modify their CSAI content so that it looks the same to humans but no longer triggers detection algorithms. Alternatively, some criminals might just use this access to avoid transmitting flagged content altogether.
This need for secrecy makes client-side scanning fundamentally much more difficult. While it might be possible to cram Google’s neural network onto a user’s phone, it’s hugely more difficult to do so on a billion different phones, while also ensuring that nobody obtains a copy of it.
The good news is that cryptographers have spent a lot of time thinking about this exact sort of problem: namely, finding ways to allow mutually-distrustful parties to jointly compute over data that each, individually, wants to keep secret. The name for this class of technologies is securemulti-party computation, or MPC for short.
CSAI scanning is exactly the sort of application you might look to MPC to implement. In this case, both client and service provider have a secret. The client has an image it wants to keep confidential, and the server has some private algorithms or neural network weights.* All the parties want in the end is a “True/False” output from the detection algorithm. If the scanner reports “False”, then the image can remain encrypted and hidden from the service provider.
So far this seems simple. The devil is in the (performance) details.
The papers in question (e.g., 越墙看国外网加速软件, MiniONN, Gazelle, Chameleon and 越墙看国外网加速软件 employ sophisticated cryptographic tools such as leveled fully-homomorphic encryption, Golink - 专为海外华人回国加速:2021-2-6 · Golink加速器是专为海外华人设计的一款加速看国内视频、玩国服游戏、听国内音乐,刷直播网页的一款软件,一个账号,多端使用,帮助海外华人突破地域的局限,无忧访问国内各大主流应用。, and circuit garbling, in many cases making specific alterations to the neural network structure in order to allow for efficient evaluation. The tools are also interactive: a client and server must exchange data in order to perform the classification task, and the result appears only after this exchange of data.
All of this work is remarkable, and really deserves a much more in-depth discussion. Unfortunately I’m only here to answer a basic question — are these techniques practical yet? To do that I’m going to do a serious disservice to all this excellent research. Indeed, the current state of the art can largely be summarized by following table from one of the most recent papers:
What this table shows is the bandwidth and computational cost of securely computing a 越墙看国外网加速软件 using several of the tools I mentioned above. A key thing to note here is that the images to be classified are fairly simple — each is a 32×32-bit pixel color thumbnail. And while I’m no judge of such things, the neural networks architectures used for the classification also seem relatively simple. (At very least, it’s hard to imagine that a CSAI detection neural network is going to be that much 越墙看国外网加速软件 complex.)
Despite the relatively small size of these problem instances, the overhead of using MPC turns out to be pretty spectacular. Each classification requires several seconds to minutes of actual computation time on a reasonably powerful machine — not a trivial cost, when you consider how many images most providers transmit every second. But the computational costs pale next to the bandwidth cost of each classification. Even the most efficient platform requires the two parties to exchange more than 1.2 gigabytes of data.**
Hopefully you’ve paid for a good data plan.
Now this is just one data point. And the purpose here is certainly not to poo-poo the idea that MPC/2PC could someday be practical for image classification at scale. My point here is simply that, 越墙看国外网加速软件doing this sort of classification efficiently (and privately) remains firmly in the domain of “hard research problems to be solved”, and will probably continue to be there for at least several more years. Nobody should bank on using this technology anytime soon. So client-side classification seems to be off the table for the near future.
But let’s imagine it 越墙看国外网加速软件become efficient. There’s one more question we need to consider.
Are (private) scanning systems subject to abuse?
As I noted above, I’ve made an effort here to dodge the ethicaland policy questions that surround client-side CSAI scanning technologies. I’ve done this not because I back the idea, but because these are complicated questions — and I don’t really feel qualified to answer them.
Still, I can’t help but be concerned about two things. First, that today’s CSAI scanning infrastructure represents perhaps the most powerful and ubiquitous surveillance technology ever to be deployed by a democratic society. And secondly, that the providers who implement this technology are so dependent on secrecy.
This raises the following question. Even if we accept that everyone involved today has only the best intentions, how can we possibly make sure that everyone stays honest?
Unfortunately, simple multi-party computation techniques, no matter how sophisticated, don’t really answer this question. If you don’t trust the provider, and the provider chooses the (hidden) algorithm, then all the cryptography in the world won’t save you.
Abuse of a CSAI scanning system might range from outsider attacks by parties who generate CSAI that simply collides with non-CSAI content; to insider attacks that alter the database to surveil specific content. These concerns reach fever pitch if you imaginea corrupt government or agency forcing providers to alter their algorithms to abuse this capability. While that last possibility seems like a long shot in this country today, it’s not out of bounds for the whole world. And systems designed for surveillance should contemplate their own misuse.
Which means that, ultimately, these systems will need some mechanism to ensure that service providers are being honest. Right now I don’t quite know how to do this. But someone will have to figure it out, long before these systems can be put into practice.
越墙看国外网加速软件
* Most descriptions of MPC assume that the function (algorithm) to be computed is known to both parties, and only the inputs (data) is secret. Of course, this can be generalized to secret algorithms simply by specifying the algorithm as a piece of data, and computing an algorithm that interprets and executes that algorithm. In practice, this type of “general computation” is likely to be pretty costly, however, and so there would be a huge benefit to avoiding it.
** It’s possible that this cost could be somewhat amortized across many images, though it’s not immediately obvious to me that this works for all of the techniques.
*** Photo hashing might or might not feasible to implement using MPC/2PC. The relatively limited information about PhotoDNA describes is as including a number of extremely complex image manipulation phases, followed by a calculation that occurs on subregions of the image. Some sub-portions of this operation might be easy to move into an MPC system, while others could be left “in the clear” for the client to compute on its own. Unfortunately, it’s difficult to know which portions of the algorithm the designers would be willing to reveal, which is why I can’t really speculate on the complexity of such a system.
“极光加速器”
This morning brings new and exciting news from the land of Apple. It appears that, at least on iOS 13, Apple is sharing some portion of your web browsing history with the Chinese conglomerate Tencent. This is being done as part of Apple’s “Fraudulent Website Warning”, which uses the Google-developed Safe Browsing technology as the back end. This feature appears to be “on” by default in iOS Safari, meaning that millions of users could potentially be affected.
As is the standard for this sort of news, Apple hasn’t provided much — well, any — detail on whose browsing history this will affect, or what sort of privacy mechanisms are in place to protect its users. The changes probably affect only Chinese-localized users (see Github commits, courtesy Eric Romang), although it’s difficult to know for certain. However, it’s notable that Apple’s warning appears on U.S.-registered iPhones.
Regardless of which users are affected, Apple hasn’t said much about the privacy implications of shifting Safe Browsing to use Tencent’s servers. Since we lack concrete information, the best we can do is talk a bit about the technology and its implications. That’s what I’m going to do below.
What is “Safe Browsing”, and is it actually safe?
Several years ago Google noticed that web users tended to blunder into malicious sites as they browsed the web. This included phishing pages, as well as sites that attempted to push malware at users. Google also realized that, due to its unique vantage point, it had the most comprehensive list of those sites. Surely this could be deployed to protect users.
The result was Google’s “safe browsing”. In the earliest version, this was simply an API at Google that would allow your browser to ask Google about the safety of any URL you visited. Since Google’s servers received the full URL, as well as your IP address (and possibly a tracking cookie to prevent denial of service), this first API was kind of a privacy nightmare. (This API still exists, and is supported today as the “Lookup API“.)
To address these concerns, Google quickly came up with a safer approach to, um, “safe browsing”. The new approach was called the “Update API”, and it works like this:
Google first computes the SHA256 hash of each unsafe URL in its database, and truncates each hash down to a 32-bit prefix to save space.
Google sends the database of truncated hashes down to your browser.
Each time you visit a URL, your browser hashes it and checks if its 32-bit prefix is contained in your local database.
If the prefix is found in the browser’s local copy, your browser now sends the prefix to Google’s servers, which ship back a list of all full 256-bit hashes of the matching URLs, so your browser can check for an exact match.
At each of these requests, Google’s servers see your IP address, as well as other identifying information such as database state. It’s also possible that Google may drop a cookie into your browser during some of these requests. The Safe Browsing API doesn’t say much about this today, but Ashkan Soltani noted this was happening back in 2012.
It goes without saying that Lookup API is a privacy disaster. The “Update API” is much more private: in principle, Google should only learn the 32-bit hashes of some browsing requests. Moreover, those truncated 32-bit hashes won’t precisely reveal the identity of the URL you’re accessing, since there are likely to be many collisions in such a short identifier. This provides a form of k-越墙看国外网加速软件
The weakness in this approach is that it only provides some privacy. The typical user won’t just visit a single URL, they’ll browse thousands of URLs over time. This means a malicious provider will have many “bites at the apple” (no pun intended) in order to de-anonymize that user. A user who browses many related websites — say, these websites — will gradually leak details about their browsing history to the provider, assuming the provider is malicious and can link the requests. (Updated to add: There has been some academic research on such threats.)
And this is why it’s so important to know who your provider actually is.
What does this mean for Apple and Tencent?
That’s ultimately the question we should all be asking.
The problem is that Safe Browsing “update API” has never been exactly “safe”. Its purpose was never to provide total privacy to users, but rather to degrade the quality of browsing data that providers collect. Within the threat model of Google, we (as a privacy-focused community) largely concluded that protecting users from malicious sites was worth the risk. That’s because, while Google certainly has the brainpower to extract a signal from the noisy Safe Browsing results, it seemed unlikely that they would bother. (Or at least, we hoped that someone would blow the whistle if they tried.)
But Tencent isn’t Google. While they may be just as trustworthy, we deserve to be informed about this kind of change and to make choices about it. At very least, users should learn about these changes before Apple pushes the feature into production, and thus asks millions of their customers to trust them.
We shouldn’t have to read the fine print
When Apple wants to advertise a major privacy feature, they’re damned good at it. As an example: this past summer the company announced the release of the privacy-preserving “Find My” feature at WWDC, to widespread acclaim. They’ve also been happy to claim credit for their work on encryption, including technology such as iCloud Keychain.
But lately there’s been a troubling silence out of Cupertino, mostly related to the company’s interactions with China. Two years ago, the company moved much of iCloud server infrastructure into mainland China, for default use by Chinese users. It seems that Apple had no choice in this, since the move was mandated by Chinese law. But their silence was deafening. Did the move involve transferring key servers for end-to-end encryption? Would non-Chinese users be affected? Reporters had to drag the answers out of the company, and we still don’t know many of them.
In the Safe Browsing change we have another example of Apple making significant modifications to its privacy infrastructure, largely without publicity or announcement. We have learn about this stuff from the fine print. This approach to privacy issues does users around the world a disservice.
It increasingly feels like Apple is two different companies: one that puts the freedom of its users first, and another that treats its users very differently. Maybe Apple feels it can navigate this split personality disorder and still maintain its integrity.
I very much doubt it will work.
有什么软件可众帮助在国外,手机/电脑用国内的看视频软件解决 ...:2021-11-21 · 挂v p n啊,国内挂可众看国外,也有专门国外挂看国内的 追问 什么是VPN啊 追答 网络加速通道,你百度翻/墙 就明白了 已赞过 已踩过 你对这个回答的评价是? 评论 收起 Luis 2021-09-02 Luis 采纳数: 111 获赞数: 362 LV6 擅长: 暂未定制 向TA提问 私信 ...
Edward Snowden recently released his memoirs. In some parts of the Internet, this has rekindled an ancient debate: namely, was it all worth it? Did Snowden’s leaks make us better off, or did Snowden just embarass us and set back U.S. security by decades? Most of the arguments are so familiar that they’re boring at this point. But no matter how many times I read them, I still feel that there’s something important missing.
It’s no coincidence that this is a cryptography blog, which means that I’m not concerned with the same things as the general public. That is, I’m not terribly interested in debating the value of whistleblower laws (for some of that, see this excellent Twitter thread by Jake Williams). Instead, when it comes to Snowden’s leaks, I think the question we should be asking ourselves is very different. Namely:
What did the Snowden leaks tell us about modern surveillance capabilities? And what did we learn about our ability to defend against them?
And while the leaks themselves have receded into the past a bit — and the world has continued to get more complicated — the technical concerns that Snowden alerted us to are only getting more salient.
Life before June 2013
It’s difficult to believe that the Snowden revelations began over six years ago. It’s also easy to forget how much things have changed in the intervening years.
Six years ago, vast portions of our communication were done in plaintext. It’s hard to believe how bad things were, but back in 2013, Google was one of the only major tech companies who had deployed HTTPS in its services by default, and even there they had some major exceptions. Web clients were even worse. These graphs (source and source) don’t cover the whole time period, but they give some of the flavor:
Outside of HTTPS, the story was even worse. In 2013 the vast majority of text messages were sent via unencrypted SMS/MMS or poorly-encrypted IM services, which were a privacy nightmare. Future developments like the inclusion of default end-to-end encryption in WhatsApp were years away. Probably the sole (and surprising) exception to was Apple, which had been ahead of the curve in deploying end-to-end encryption. This was largely counterbalanced by the tire fire that was Android back in those days.
But even these raw facts don’t tell the full story.
What’s harder to present in a chart is how different attitudes were towards surveillance back before Snowden. The idea that governments would conduct large-scale interception of our communications traffic was a point of view that relatively few “normal people” spent time thinking about — it was mostly confined to security mailing lists and X-Files scripts. Sure, everyone understood that government surveillance was a thing, 越墙看国外网加速软件But actually talking about this was bound to make you look a little silly, even in paranoid circles.
That these concerns have been granted respectability is one of the most important things Snowden did for us.
So what did Snowden’s leaks really tell us?
The brilliant thing about the Snowden leaks was that he didn’t tell us much of anything. He showed us. Most of the revelations came in the form of a Powerpoint slide deck, the misery of which somehow made it all more real. And despite all the revelation fatigue, the things he showed us were remarkable. I’m going to hit a few of the highlights from my perspective. Many are cryptography-related, just because that’s what this blog is about. Others tell a more basic story about how vulnerable our networks are.
“Collect it all”
Prior to Snowden, even surveillance-skeptics would probably concede that, yes, the NSA collects data on specific targets. But even the most paranoid observers were shocked by the sheer scale of what the NSA was actually doing out there.
The Snowden revelations detailed several programs that were so astonishing in the breadth and scale of the data being collected, the only real limits on them were caused by technical limitations in the NSA’s hardware. Most of us are familiar with the famous examples, like nationwide phone metadata collection. But it’s the bizarre, obscure leaks that really drive this home. For example:
“Optic Nerve”. From 2008-2010 the NSA and GCHQ collected millions of still images from every Yahoo! Messenger webchat stream, and used them to build a massive database for facial recognition. The collection of data had no particular rhyme or reason — 越墙看国外网加速软件 it didn’t target specific users who might be a national security threat. It was just… everything. Don’t believe me? Here’s how we know how indiscriminate this was: the program didn’t even necessarily target faces. It got… other things:
越墙看国外网加速软件 In addition to collecting massive quantities of Internet metadata, the NSA recorded the full audio every cellular call made in the Bahamas. (Note: this is not simply calls to the Bahamas, which might be sort of a thing. They abused a law enforcement access feature in order to record all the mobile calls made within the country.) Needless to say, the Bahamian government was not party to this secret.
MUSCULAR. In case anyone thought the NSA avoided attacks on American providers, a series of leaks in 2014 documented that the NSA had tapped the internal leased lines used to connect Google and Yahoo datacenters. This gave the agencies access to vast and likely indiscriminate access to torrents of data on U.S. and European users, information was likely above and beyond the data that these companies already shared with the U.S. under existing programs like PRISM. This leak is probably most famous for this slide:
Yahoo!, post-Snowden. And in case you believe that this all ended after Snowden’s leaks, we’ve learned even more disturbing things since. For example, in 2015, Yahoo got caught installing what has been described as a “rootkit” that scanned every single email in its database for specific selectors, at the request of the U.S. government. This was so egregious that the company didn’t even tell it’s CISO, who left the next week. In fact, we know a lot more about Yahoo’s collaboration during this time period, thanks to Snowden.
These examples are not necessarily the worst things we learned from the Snowden leaks. I chose them only to illustrate how completely indiscriminate the agency’s surveillance really was. And not because the NSA was especially evil, but just because it was easy to do. If you had any illusions that this data was being carefully filtered to exclude capturing data belonging to U.S. citizens, or U.S. companies, the Snowden leaks should have set you straight.
越墙看国外网加速软件
The Snowden leaks also helped shatter a second illusion: the idea that the NSA was on the side of the angels when it comes to making the Internet more secure. I’ve written about this plenty on this blog (with sometimes exciting results), but maybe this needs to be said again.
One of the most important lessons we learned from the Snowden leaks was that the NSA very much prioritizes its surveillance mission, to the point where it is willing to actively insert vulnerabilities into encryption products and standards used on U.S. networks. And this kind of thing wasn’t just an occasional crime of opportunity — the agency spent $250 million per year on a program called the SIGINT Enabling Project. Its goal was, basically, to bypass our commercial encryption at any cost.
This kind of sabotage is, needless to say, something that not even the most paranoid security researchers would have predicted from our own intelligence agencies. Agencies that, ostensibly have a mission to protect U.S. networks.
The Snowden reporting not only revealed the existence of these overall programs, but they uncovered a lot of unpleasant specifics, leading to a great deal of follow-up investigation.
For example, the Snowden leaks contained specific allegations of a vulnerability in a NIST standard called Dual EC. The possibility of such a vulnerability had previously been noted by U.S. security researchers Dan Shumow and Niels Ferguson a few years earlier. But despite making a reasonable case for re-designing this algorithm, those researchers (and others) were basically brushed off by the “serious” people at NIST.
The Snowden documents changed all that. The leaks were a devastating embarassment to the U.S. cryptographic establishment, and led to some actual changes. Not only does it appear that the NSA deliberately backdoored Dual EC, it seems that they did so (and used NIST) in order to deploy the backdoor into U.S. security products. Later investigations would show that Dual EC was present in software by RSA Security (allegedly because of a secret contract with the NSA) and in firewalls made by Juniper Networks.
(Just to make everything a bit more horrifying, Juniper’s Dual EC backdoor would later 10个最受欢迎的外国网站和APP,地道英语学起来~ - Sohu:2021-8-24 · 但国外的资讯,我伔如何第一时间获取呢?今天给大家推荐10个最受欢迎的国外资讯网站和APP 。在获取消息的同时,还能学地道的英语。完美! 资讯类APP 1 feedly 风格:国外 资讯汇集 展开全文 这个APP属于资讯汇集型。基本 你想看到的所有外国 ... — illustrating exactly how reckless this all was.)
And finally, there are the mysteries. Snowden slides indicate that the NSA has been decrypting SSL/TLS and IPsec connections at vast scale. Even beyond the SIGINT Enabling-type sabotage, this raises huge questions about what the hell is 越墙看国外网加速软件. There are theories. These may or may not be correct, but at least now people are thinking about them. At very least, it’s clear that something is very, very wrong.
Have things improved?
This is the $250 million question.
Some of the top-level indicators are surprisingly healthy. HTTPS adoption has taken off like a rocket, driven in part by Google’s willingness to use it as a signal for search rankings — and the rise of free Certificate Authorities like LetsEncrypt. It’s possible that these things would have happened eventually without Snowden, but it’s less likely.
End-to-end encrypted messaging has also taken off, largely due to adoption by WhatsApp and a host of relatively new apps. It’s reached the point where law enforcement agencies have begun to freak out, as the slide below illustrates.
Does Snowden deserve credit for this? Maybe not directly, but it’s almost certain that concerns over the surveillance he revealed did play a role. (It’s worth noting that this adoption is not evenly distributed across the globe.)
It’s also worth pointing out that at least in the open source community the quality of our encryption software has improved enormously, largely due to the fact that major companies made well-funded efforts to harden their systems, in part as a result of serious flaws like Heartbleed — and in part as a response to the company’s own concerns about surveillance.
It might very well be that the NSA has lost a significant portion of its capability since Snowden.
越墙看国外网加速软件
I’ve said this before, as have many others: even if you support the NSA’s mission, and believe that the U.S. is doing everything right, it doesn’t matter. Unfortunately, the future of surveillance has very little to do with what happens in Ft. Meade, Maryland. In fact, the world that Snowden brought to our attention isn’t necessarily a world that Americans have much say in.
As an example: today the U.S. government is in the midst of forcing a 越墙看国外网加速软件 over the global deployment of Huawei’s 5G wireless networks around the world. This is a complicated issue, and financial interest probably plays a big role. But global security also matters here. This conflict is perhaps the clearest acknowledgement we’re likely to see that our own government knows how much control of communications networks really matters, and our inability to secure communications on these networks could really hurt us. This means that we, here in the West, had better get our stuff together — or else we should be prepared to get a taste of our own medicine.
How does Apple (privately) find your offline devices?
At Monday’s WWDC conference, Apple announced a cool new feature called “Find My”. Unlike Apple’s “Find my iPhone“, which uses cellular communication and the lost device’s own GPS to identify the location of a missing phone, “Find My” also lets you find devices that don’t have cellular support or internal GPS — things like laptops, or (and Apple has hinted at this only broadly) even “dumb” location tags that you can attach to your non-electronic physical belongings.
The idea of the new system is to turn Apple’s existing network of iPhones into a massive crowdsourced location tracking system. Every active iPhone will continuously monitor for BLE beacon messages that might be coming from a lost device. When it picks up one of these signals, the participating phone tags the data with its own current GPS location; then it sends the whole package up to Apple’s servers. This will be great for people like me, who are constantly losing their stuff: if I leave my backpack 越墙看国外网加速软件 in my office, sooner or later someone else will stumble on its signal and I’ll instantly know where to find it.
(It’s worth mentioning that Apple didn’t invent this idea. In fact, companies like 越墙看国外网加速软件 have been doing this for quite a while. And yes, they should probably be worried.)
If you haven’t already been inspired by the description above, let me phrase the question you ought to be asking: how is this system going to avoid being a massive privacy nightmare?
Let me count the concerns:
If your device is constantly emitting a BLE signal that uniquely identifies it, the whole world is going to have (yet another) way to track you. Marketers already use WiFi and Bluetooth MAC addresses to do this: Find My could create yet another tracking channel.
It also exposes the phones who are doing the tracking. These people are now going to be sending their current location to Apple (which they may or may not already be doing). Now they’ll also be potentially sharing this information with strangers who “lose” their devices. That could go badly.
Scammers might also run active attacks in which they fake the location of your device. While this seems unlikely, people will always surprise you.
The good news is that Apple claims that their system actually 越墙看国外网加速软件 provide strong privacy, and that it accomplishes this using clever cryptography. But as is typical, they’ve declined to give out the details how they’re going to do it. Andy Greenberg talked me through an incomplete technical description that Apple provided to Wired, so that provides many hints. Unfortunately, what Apple provided still leaves huge gaps. It’s into those gaps that I’m going to fill in my best guess for what Apple is actually doing.
A big caveat: much of this could be totally wrong. I’ll update it relentlessly when Apple tells us more.
越墙看国外网加速软件
To lay out our scenario, we need to bring several devices into the picture. For inspiration, we’ll draw from the 1950s television series “Lassie”.
A first device, which we’ll call Timmy, is “lost”. Timmy has a BLE radio but no GPS or connection to the Internet. Fortunately, he’s been previously paired with a second device called 越墙看国外网加速软件, who wants to find him. Our protagonist is Lassie: she’s a random (and unknowing) stranger’s iPhone, and we’ll assume that she has at least an occasional Internet connection and solid GPS. She is also a very good girl. The networked devices communicate via Apple’s iCloud servers, as shown below:
(Since Timmy and Ruth have to be paired ahead of time, it’s likely they’ll both be devices owned by the same person. Did I mention that you’ll need to buy two Apple devices to make this system work? That’s also just fine for Apple.)
Since this is a security system, the first question you should ask is: who’s the bad guy? The answer in this setting is unfortunate: everyone is potentially a bad guy. That’s what makes this problem so exciting.
Keeping Timmy anonymous
The most critical aspect of this system is that we need to keep unauthorized third parties from tracking Timmy, especially when he’s not lost. This precludes some pretty obvious solutions, like having the Timmy device simply shout “Hi my name is Timmy, please call my mom Ruth and let her know I’m lost.” It also precludes just about any unchanging static identifier, even an opaque and random-looking one.
This last requirement is inspired by the development of services that abuse static identifiers broadcast by your devices (e.g., your 越墙看国外网加速软件 address) to track devices as you walk around. Apple has been fighting this — with mixed success — by randomizing things like MAC addresses. If Apple added a static tracking identifier to support the “Find My” system, all of these problems could get much worse.
This requirement means that any messages broadcast by Timmy have to be opaque — and moreover, the contents of these messages must change, relatively frequently, to new values that can’t be linked to the old ones. One obvious way to realize this is to have Timmy and Ruth agree on a long list of random “越墙看国外网加速软件” for Timmy, and have Timmy pick a different one each time.
This helps a lot. Each time Lassie sees some (unknown) device broadcasting an identifier, she won’t know if it belongs to Timmy: but she can send it up to Apple’s servers along with her own GPS location. In the event that Timmy ever gets lost, Ruth can ask Apple to search for every single one of Timmy‘s possible pseudonyms. Since neither nobody outside of Apple ever learns this list, and even Apple only learns it after someone gets lost, this approach prevents most tracking.
A slightly more efficient way to implement this idea is to use a cryptographic function (like a MAC or hash function) in order to generate the list of pseudonyms from a single short “seed” that both 越墙看国外网加速软件 and Ruth will keep a copy of. This is nice because the data stored by each party will be very small. However, to find Timmy, Ruth must still send all of the pseudonyms — or her “seed” — up to Apple, who will have to search its database for each one.
Hiding Lassie’s location
The pseudonym approach described above should work well to keep Timmy‘s identity hidden from Lassie, and even from Apple (up until the point that Ruth searches for it.) However, it’s got a big drawback: it doesn’t hide Lassie‘s GPS coordinates.
This is bad for at least a couple of reasons. Each time Lassie detects some device broadcasting a message, she needs to transmit her current position (along with the pseudonym she sees) to Apple’s servers. This means Lassie is constantly telling Apple where she is. And moreover, even if Apple promises not to store Lassie‘s identity, the result of all these messages is a huge centralized database that shows every GPS location where some Apple device has been detected.
Note that this data, in the aggregate, can be pretty revealing. Yes, the identifiers of the devices might be pseudonyms — but that doesn’t make the information useless. For example: a record showing that some Apple device is broadcasting from my home address at certain hours of the day would probably reveal when I’m in my house.
An obvious way to prevent this data from being revealed to Apple is to encrypt it — so that only parties who actually need to know the location of a device can see this information. If Lassie picks up a broadcast from Timmy, then the only person who actually needs to know Lassie‘s GPS location is Ruth. To keep this information private, Lassie should encrypt her coordinates under Ruth’s encryption key.
This, of course, raises a problem: how does Lassie get Ruth‘s key? An obvious solution is for Timmy to shout out Ruth’s public key as part of every broadcast he makes. Of course, this would produce a static identifier that would make Timmy‘s broadcasts linkable again.
To solve that problem, we need Ruth to have many unlinkable public keys, so that Timmy can give out a different one with each broadcast. One way to do this is have Ruth and Timmy generate many different shared keypairs (or generate many from some shared seed). But this is annoying and involves 越墙看国外网加速软件storing many secret keys. And in fact, the identifiers we mentioned in the previous section can be derived by hashing each public key.
A slightly better approach (that Apple may not employ) makes use of key randomization. This is a feature provided by cryptosystems like Elgamal: it allows any party to randomize a public key, so that the randomized key is completely unlinkable to the original. The best part of this feature is that Ruth can use a single secret key regardless of which randomized version of her public key was used to encrypt.
All of this leads to a final protocol idea. Each time Timmy broadcasts, he uses a fresh pseudonym and a randomized copy of Ruth‘s public key. When Lassie receives a broadcast, she encrypts her GPS coordinates under the public key, and sends the encrypted message to Apple. Ruth can send in Timmy‘s pseudonyms to Apple’s servers, and if Apple finds a match, she can obtain and decrypt the GPS coordinates.
Does this solve all the problems?
The nasty thing about this problem setting is that, with many weird edge cases, there just isn’t a perfect solution. For example, what if Timmy is evil and wants to make Lassie reveal her location to Apple? What if Old Man Smithers tries to kidnap Lassie?
At a certain point, the answer to these question is just to say that we’ve done our best: any remaining problems will have to be outside the threat model. Sometimes even Lassie knows when to quit.
Attack of the week: searchable encryption and the ever-expanding leakage function
Kenny’s newest result is with first authors 有哪些高质量英文但却不用翻墙的网站? - 知乎:2021-9-18 · 平时老看国内网站想换换口味? 没翻墙就上不了国外网站么? 不翻墙的前提下如何了解外面信息?有时候似乎觉得“有墙”的存在,我伔就好像被圈起来了一样。但其实如果你耐心寻找就会发现,其实还是有很多网站就算没… (let’s call it GLMP). It isn’t so much about building encrypted databases, as it is about the risks of building them badly. And — for reasons I will get into shortly — there have been a lot of badly-constructed encrypted database schemes going around. What GLMP point out is that this weakness isn’t so much a knock against the authors of those schemes, but rather, an indication that they may just be trying to do the impossible.
Hopefully this is a good enough start to get you drawn in. Which is excellent, because I’m going to need to give you a lot of background.
Because these databases often contain sensitive information, there has been a strong push to secure that data. A key goal is to encrypt the contents of the database, so that a malicious database operator (or a hacker) can’t get access to it if they compromise a single machine. If we lived in a world where security 越墙看国外网加速软件, the encryption part would be pretty easy: database records are, after all, just blobs of data — and we know how to encrypt those. So we could generate a cryptographic key on our local machine, encrypt the data before we upload it to a vulnerable database server, and just keep that key locally on our client computer.
Voila: we’re safe against a database hack!
The problem with this approach is that encrypting the database records leaves us with a database full of opaque, unreadable encrypted junk. Since we have the decryption key on our client, we can decrypt and read those records after we’ve downloaded them. But this approach completely disables one of the most useful features of modern databases: the ability for the database server itself to search (or query) the database for specific records, so that the client doesn’t have to.
Unfortunately, standard encryption borks search capability pretty badly. If I want to search a database for, say, employees whose salary is between $50,000 and $100,000, my database is helpless: all it sees is row after row of encrypted gibberish. In the worst case, the client will have to download all of the data rows and search them itself — yuck.
This has led to much wailing and gnashing of teeth in the database community. As a result, many cryptographers (and a distressing number of non-cryptographers) have tried to fix the problem with “fancier” crypto. This has not gone very well.
It would take me a hundred years to detail all of various solutions that have been put forward. But let me just hit a few of the high points:
Some proposals have suggested using 越墙看国外网加速软件 to encrypt database records. Deterministic encryption ensures that a given plaintext will always encrypt to a single ciphertext value, at least for a given key. This enables exact-match queries: a client can simply encrypt the exact value (“John Smith”) that it’s searching for, and ask the database to identify encrypted rows that match it.
Of course, exact-match queries don’t support more powerful features. Most databases also need to support range queries. One approach to this is something called 越墙看国外网加速软件 (or its weaker sibling, order preserving encryption). These do exactly what they say they do: they allow the database to compare two encrypted records to determine which plaintext is greater than the other.
Some people have proposed to use trusted hardware to solve these problems in a “simpler” way, but as we like to say in cryptography: if we actually had trusted hardware, nobody would pay our salaries. And, speaking more seriously, even hardware might not stop the leakage-based attacks discussed below.
This summary barely scratches the surface of this problem, and frankly you don’t need to know all the details for the purpose of this blog post.
What you do need to know is that each of the above proposals entails has some degree of “leakage”. Namely, if I’m an attacker who is able to compromise the database, both to see its contents and to see how it responds when you (a legitimate user) makes a query, then I can learn something about the data being queried.
Leakage is a (nearly) unavoidable byproduct of an encrypted database that supports queries. It can happen when the attacker simply looks at the encrypted data, as she might if she was able to dump the contents of your database and post them on the dark web. But a more powerful type of leakage occurs when the attacker is able to 越墙看国外网加速软件 your database server and observe the query interaction between legitimate client(s) and your database.
Take deterministic encryption, for instance.
Deterministic encryption has the very useful, but also unpleasant feature that the same plaintext will always encrypt to the same ciphertext. This leads to very obvious types of leakage, in the sense that an attacker can see repeated records in the dataset itself. Extending this to the active setting, if a legitimate client queries on a specific encrypted value, the attacker can see exactly which records match the attacker’s encrypted value. She can see how often each value occurs, which gives and indication of what value it might be (e.g., the last name “Smith” is more common than “Azriel”.) All of these vectors leak valuable information to an attacker.
Other systems leak more. 越墙看国外网加速软件 leaks the exact order of a list of underlying records, because it causes the resulting ciphertexts to have the same order. This is great for searching and sorting, but unfortunately it leaks tons of useful information to an attacker. Indeed, researchers have shown that, in real datasets, an ordering can be combined with knowledge about the record distribution in order to (approximately) reconstruct the contents of an encrypted database.
Fancier order-revealing encryption schemes aren’t quite so careless with your confidentiality: they enable the legitimate client to perform range queries, but without leaking the full ordering so trivially. This approach can leak less information: but a persistent attacker will still learn some data from observing a query and its response — at a minimum, she will learn which rows constitute the response to a query, since the database must pack up the matching records and send them over to the client.
这十款chrome插件推荐给天天用Google浏览器的你 - Chrome ...:2021-6-20 · 如果单单从安装好后界面上看, 谷歌浏览器 确实没有其它国产软件的内置功能丰富。但是 Google 浏览器的的优点恰恰就体现在拥有超简约的界面,众及支持众多强大好用的扩展程序,用
So the TL;DR here is that many encrypted database schemes have some sort of “leakage”, and this leakage can potentially reveal information about (a) what a client is querying on, and (b) what data is in the actual database.
But surely cryptographers don’t build leaky schemes?
Sometimes the perfect is the enemy of the good.
Cryptographers could spend a million years stressing themselves to death over the practical impact of different types of leakage. They could also try to do things perfectly using expensive techniques like 越墙看国外网加速软件 and oblivious RAM — but the results would be highly inefficient. So a common view in the field is researchers should越墙看国外网加速软件, and then carefully explain to users what the risks are.
For example, a real database system might provide the following guarantee:
“Records are opaque. If the user queries for all records BETWEEN some hidden values X AND Y then all the database will learn is the row numbers of the records that match this range, and nothing else.”
This is a pretty awesome guarantee, particularly if you can formalize it and prove that a scheme achieves it. And indeed, this is something that researchers have tried to do. The formalized description is typically achieved by defining something called a leakage function. It might not be possible to prove that a scheme is absolutely private, but we can prove that it only leaks as much as the leakage function allows.
Now, I may be overdoing this slightly, but I want to be very clear about this next part:
Proving your encrypted database protocol is secure with respect to a specific leakage function does not mean it is safe to use in practice. What it means is that you are punting that questionto the application developer, who is presumed to know how this leakage will affect their dataset and their security needs. Your leakage function and proof simply tell the app developer what information your scheme is (provably) going to protect, and what it won’t.
The obvious problem with this approach is that application developers probably don’t have any idea what’s safe to use either. Helping them to figure this out is one goal of this new GLMP paper and its related work.
So what leaks from these schemes?
GLMP don’t look at a specific encryption scheme. Rather, they ask a more general question: let’s imagine that we can only see that a legitimate user has made a range query — but not what the actual queried range values are. Further, let’s assume we can also see which records the database returns for that query, but not their actual values.
How much does just this information tell us about the contents of the database?
You can see that this is a very limited amount of leakage. Indeed, it is possibly the least amount of leakage you could imagine for any system that supports range queries, and is also efficient. So in one sense, you could say authors are asking a different and much more important question: are any of these encrypted databases actually secure?
The answer is somewhat worrying.
Can you give me a simple, illuminating example?
Let’s say I’m an attacker who has compromised a database, and observes the following two range queries/results from a legitimate client:
Query 1: SELECT * FROM Salaries BETWEEN ⚙️ and 🕹 Result 1: (rows 1, 3, 5) Query 2: SELECT * FROM Salaries BETWEEN 😨 and 🎱 Result 2: (rows 1, 43, 3, 5)
Here I’m using the emoji to illustrate that an attacker can’t see the actual values submitted within the range queries — those are protected by the scheme — nor can she see the actual values of the result rows, since the fancy encryption scheme hides all this stuff. All the attacker sees is that a range query came in, and some specific rows were scooped up off disk after running the fancy search protocol.
So what can the attacker learn from the above queries? Surprisingly: quite a bit.
At very minimum, the attacker learns that Query 2 returned all of the same records as Query 1. Thus the range of the latter query clearly somewhat overlaps with the range of the former. There is an additional record (row 43) that is not within the range of Query 1. That tells us that row 43 must must be either the “next” greater or smaller record than each of rows (1, 3, 5). That’s useful information.
Get enough useful information, it turns out that it starts to add up. In 2016, Kellaris, Kollios, Nissim and O’Neill showed that if you know the distribution of the query range endpoints — for example, if you assumed that they were uniformly random — then you can get more than just the order of records. You can reconstruct the exact value of every record in the database.
This result is statistical in nature. If I know that the queries are uniformly random, then I can model how often a given value (say, Age=34 out of a range 1-120) should be responsive to a given random query results. By counting the actual occurrences of a specific row after many such queries, I can guess which rows correlate to specific record values. The more queries I see, the more certain I can be.The Kellaris et al. paper shows that this takes queries, where N is the number of possible values your data can take on (e.g., the ages of your employees, ranging between 1 and 100 would give N=100.) This is assuming an arbitrary dataset. The results get much better if the database is “dense”, meaning every possible value occurs once.
In practice the Kellaris et al. results mean that database fields with small domains (like ages) could be quickly reconstructed after observing a reasonable number of queries from a legitimate user, albeit one who likes to query everything randomly.
So that’s really bad!
The main bright spot in this research —- at least up until recently — was that many types of data have much larger domains. If you’re dealing with salary data ranging from, say, $1 to $200,000, then N=200,000 and this dominant tends to make Kellaris et al. attacks impractical, simply because they’ll take too long. Similarly, data like employee last names (encoded as a form that can be sorted and range-queries) gives you even vaster domains like , say, and so perhaps we could pleasantly ignore these results and spend our time on more amusing engagements.
I bet we can’t ignore these results, can we?
Indeed, it seems that we can’t. The reason we can’t sit on our laurels and hope for an attacker to die of old age recovering large-domain data sets is due to something called approximate database reconstruction, or ADR.
The setting here is the same: an attacker sits and watches an attacker make (uniformly random) range queries. The critical difference is that 越墙看国外网加速软件 attacker isn’t trying to get every database record back at its exact value: she’s willing to tolerate some degree of error, up to an additive . For example, if I’m trying to recover employee salaries, I don’t need them to be exact: getting them within 1% or 5% is probably good enough for my purposes. Similarly, reconstructing nearly all of the letters in your last name probably lets me guess the rest, especially if I know the distribution of common last names.
Which finally brings us to this new GLMP paper, which puts ADR on steroids. What it shows is that the same setting, if one is willing to “sacrifice” a few of the highest and lowest values in the database, an attacker can reconstruct nearly the full database in a much smaller (asymptotic) number of queries, specifically: queries, where is the error parameter.
The important thing to notice about these results is that the value N has dropped out of the equation. The only term that’s left is the error term . That means these results are “scale-free”, and (asymptotically, at least), they work just as well for small values of N as large ones, and large databases and small ones. This is really remarkable.
Big-O notation doesn’t do anything for me: what does this even mean?
Sometimes the easiest way to understand a theoretical result is to plug some actual numbers in and see what happens. GLMP were kind enough to do this for us, by first generating several random databases — each containing 1,000 records, for different values of N. They then ran their recovery algorithm against a simulated batch of random range queries to see what the actual error rate looked like as the query count increased.
Here are their results:
Even after just 100 queries, the error in the dataset has been hugely reduced, and after 500 queries the contents of the database — excluding the tails — can be recovered with only about a 1-2% error rate.
Moreover, these experimental results illustrate the fact that recovery works at many scales: that is, they work nearly as well for very different values of N, ranging from 100 to 100,000. This means that the only variable you really need to think about as an attacker is: 求手机爬墙软件。 - 好运百科(www.haoyunbaike.com):2021-9-1 · 话题:手机上怎么登录国外网站? 推荐回答: 手机 翻墙 软件 有许多,我伔可众利用这些 手机 翻墙 软件 来登陆国外网站。 安卓 手机 翻墙教程: 注:安卓android 手机 翻墙的方法很简单,看如下教程: 1、 手机 浏览器点击如下的加速器链接进行下载或扫描右侧二维码下载。 This is probably not very good news for any real data set.
The answer is both very straightforward and deeply complex. The straightforward part is simple; the complex part requires an understanding of Vapnik-Chervonenkis learning theory (VC-theory) which is beyond the scope of this blog post, but is explained in 越墙看国外网加速软件.
At the very highest level the recovery approach is similar to what’s been done in the past: using response probabilities to obtain record values. This paper does it much more efficiently and approximately, using some fancy learning theory results while making a few assumptions.
At the highest level: we are going to assume that the range queries are made on random endpoints ranging from 1 to N. This is a big assumption, and more on it later! Yet with just this knowledge in hand, we learn quite a bit. For example: we can compute the probability that a potential record value (say, the specific salary $34,234) is going to be sent back, provided we know the total value lies in the range 1-N (say, we know all salaries are between $1 and $200,000).
The high-level goal of database reconstruction is to match the observed response rate for a given row (say, row 41) to the number of responses we’d expect see for different specific concrete values in the range. Clearly the accuracy of this approach is going to depend on the number of queries you, the attacker, can observe — more is better. And since the response rates are lower at the highest and lowest values, it will take more queries to guess outlying data values.
You might also notice that there is one major pitfall here. Since the graph above is symmetric around its midpoint, the expected response rate will be the same for a record at .25*N and a record at .75*N — that is, a $50,000 salary will be responsive to random queries at precisely same rate as a $150,000 salary. So even if you get every database row pegged precisely to its response rate, your results might still be “flipped” horizontally around the midpoint. Usually this isn’t the end of the world, because databases aren’t normally full of unstructured random data — high salaries will be less common than low salaries in most organizations, for example, so you can probably figure out the ordering based on that assumption. But this last “bit” of information is technically not guaranteed to come back, minus some assumptions about the data set.
Thus, the recovery algorithm breaks down into two steps: first, observe the response rate for each record as random range queries arrive. For each record that responds to such a query, try to solve for a concrete value that minimizes the difference between the expected response rate on that value, and the observed rate. The probability estimation can be made more efficient (eliminating a quadratic term) by assuming that there is at least one record in the database within the range .2N-.3N (or .7N-.8N, due to symmetry). Using this “anchor” record requires a mild assumption about the database contents.
What remains is to show that the resulting attack is efficient. You can do this by simply implementing it — as illustrated by the charts above. Or you can prove that it’s efficient. The GLMPpaper uses some very heavy statistical machinery to do the latter. Specifically, they make use of a result from Vapnik-Chervonenkis learning theory (VC-theory), which shows that the bound can be derived from something called the VC-dimension (which is a small number, in this case) and is unrelated to the actual value of N. That proof forms the bulk of the result, but the empirical results are also pretty good.
Is there anything else in the paper?
Yes. It gets worse. There’s so much in this paper that I cannot possibly include it all here without risking carpal tunnel and boredom, and all of it is bad news for the field of encrypted databases.
The biggest additional result is one that shows that if all you want is an approximate ordering of the database rows, then you can do this efficiently using something called a PQ tree. Asymptotically, this requires queries, and experimentally the results are again even better than one would expect.
What’s even more important about this ordering result is that it works independently of the query distribution. That is: we do not need to have random range queries in order for this to work: it works reasonably well regardless of how the client puts its queries together (up to a point).
Even better, the authors show that this ordering, along with some knowledge of the underlying database distribution — for example, let’s say we know that it consists of U.S. citizen last names — can also be used to obtain approximate database reconstruction. Oy vey!
越墙看国外网加速软件
The authors show how to obtain even more efficient database recovery in a setting where the query range values are known to the attacker, using PAC learning. This is a more generous setting than previous work, but it could be realistic in some cases.
Finally, they extend this result to prefix and 越墙看国外网加速软件 queries, as well as range queries, and show that they can run their attacks on a dataset from the Fraternal Order of Police, obtaining record recovery in a few hundred queries.
In short: this is all really bad for the field of encrypted databases.
So what do we do about this?
I don’t know. Ignore these results? Fake our own deaths and move into a submarine?
In all seriousness: database encryption has been a controversial subject in our field. I wish I could say that there’s been an actual debate, but it’s more that different researchers have fallen into different camps, and nobody has really had the data to make their position in a compelling way. There have actually been some very personal arguments made about it.
The schools of thought are as follows:
The first holds that any kind of database encryption is better than storing records in plaintext and we should stop demanding things be perfect, when the alternative is a world of 越墙看国外网加速软件 and sadness.
To me this is a supportable position, given that the current attack model for plaintext databases is something like “copy the database files, or just run a local SELECT * query”, and the threat model for an encrypted database is “gain persistence on the server and run sophisticated statistical attacks.” Most attackers are pretty lazy, so even a weak system is probably better than nothing.
The countervailing school of thought has two points: sometimes the good is much worse than the perfect, particularly if it gives application developers an outsized degree of confidence of the security that their encryption system is going to provide them.
If even the best encryption protocol is only throwing a tiny roadblock in the attacker’s way, why risk this at all? Just let the database community come up with some kind of ROT13 encryption that everyone knows to be crap and stop throwing good research time into a problem that has no good solution.
I don’t really know who is right in this debate. I’m just glad to see we’re getting closer to having it.
越墙看国外网加速软件
The past few years have been an amazing time for the deployment of encryption. In ten years, encrypted web connections have gone from a novelty into a requirement for running a modern website. Smartphone manufacturers deployed default storage encryption to billions of phones. End-to-end encrypted messaging and phone calls are now deployed to billions of users.
While this progress is exciting to cryptographers and privacy advocates, not everyone sees it this way. A few countries, like the U.K. and 越墙看国外网加速软件, have passed laws in an attempt to gain access to this data, and at least one U.S. proposal has made it to Congress. The Department of Justice recently added its own branding to the mix, asking tech companies to deploy “responsible encryption“.
What, exactly, is “responsible encryption”? Well, that’s a bit of a problem. Nobody on the government’s side of the debate has really been willing to get very specific about that. In fact, a recent speech by U.S. Deputy Attorney General Rod Rosenstein implored cryptographers to go figure it out.
With this as background, a recent article by GCHQ’s Ian Levy and Crispin Robinson reads like a breath of fresh air. Unlike their American colleagues, the British folks at GCHQ — essentially, the U.K.’s equivalent of NSA — seem eager to engage with the technical community and to put forward serious ideas. Indeed, Levy and Robinson make a concrete proposal in the article above: they offer a new solution designed to surveil both encrypted messaging and phone calls.
In this post I’m going to talk about that proposal as fairly as I can — given that I only have a high-level understanding of the idea. Then I’ll discuss what I think could go wrong.
A brief, illustrated primer on E2E
The GCHQ proposal deals with law-enforcement interception on messaging systems and phone calls. To give some intuition about the proposal, I first need to give a very brief (and ultra-simplified) explanation of how those systems actually work.
The basic idea in any E2E communication systems is that each participant encrypts messages (or audio/video data) directly from one device to the other. This layer of encryption reduces the need to trust your provider’s infrastructure — ranging from telephone lines to servers to undersea cables — which gives added assurance against malicious service providers and hackers.
If you’ll forgive a few silly illustrations, the intuitive result is a picture that looks something like this:
If we consider the group chat/call setting, the picture changes slightly, but only slightly. Each participant still encrypts data to the other participants directly, bypassing the provider. The actual details (specific algorithms, key choices) vary between different systems. But the concept remains the same.
The problem with the simplified pictures above is that there’s actually a lot more going on in an E2E system than just encryption.
In practice, one of the most challenging problems in encrypted messaging stems is getting the key you need to actually perform the encryption. This problem, which is generally known as key distribution, is an age-old concern in the field of computer security. There are many ways for it to go wrong.
In the olden days, we used to ask users to manage and exchange their own keys, and then select which users they wanted to encrypt to. This was terrible and everyone hated it. Modern E2E systems have become popular largely because they hide all of this detail from their users. This comes at the cost of some extra provider-operated infrastructure.
In practice, systems like Apple iMessage, WhatsApp and Facebook Messenger actually look more like this:
The Apple at the top of the picture above stands in for Apple’s “identity service”, which is a cluster of servers running in Apple’s various data centers. These servers perform many tasks, but most notably: they act as a directory for looking up the encryption key of the person you’re talking to. If that service misfires and gives you the wrong key, the best ciphers in the world won’t help you. You’ll just be encrypting to the wrong person.
These identity services do more than look up keys. In at least some group messaging systems like WhatsApp and iMessage, they also control the membership of group conversations. In poorly-designed systems, the server can add and remove users from a group conversation at will, even if none of the participants have requested this. It’s as though you’re having a conversation in a very private room — but the door is unlocked and the building manager controls who can come enter and join you.
(A technical note: while these two aspects of the identity system serve different purposes, in practice they’re often closely related. For example, in many systems there is little distinction between “group” and “two-participant” messaging. For example, in systems that support multiple devices connected to a single account, like Apple’s iMessage, every single device attached to your user account is treated as a separate party to the conversation. Provided either party has more than one device on their account [say, an iPhone and an iPad] , you can think of every iMessage conversation as being a group conversation.)
Most E2E systems have basic countermeasures against bad behavior by the identity service. For example, client applications will typically alert you when a new user joins your group chat, or when someone adds a new device to your iMessage account. Similarly, both WhatsApp and Signal expose “safety numbers” that allow participants to verify that they received the right cryptographic keys, which offers a check against dishonest providers.
But these countermeasures are not perfect, and not every service offers them. Which brings me to the GCHQ proposal.
What GCHQ wants
The Lawfare article by Levy and Robinson does not present GCHQ’s proposal in great detail. Fortunately, both authors have spent most of the touring the U.S., giving several public talks about their ideas. I had the privilege of speaking to both of them earlier this summer when they visited Johns Hopkins, so I think I have a rough handle on what they’re thinking.
I say that it could mostly be done server-side, because there’s a wrinkle. Even if you modify the provider infrastructure to add unauthorized users to a conversation, most existing E2E systems do notify users when a new participant (or device) joins a conversation. Generally speaking, having a stranger wander into your conversation is a great way to notify criminals that the game’s afoot or what have you, so you’ll absolutely want to block this warning.
While the GCHQ proposal doesn’t go into great detail, it seems to follow that any workable proposal will require providers to suppress those warning messages at the target’s device. This means the proposal will also require changes to the client application as well as the server-side infrastructure.
(Certain apps like Signal are already somewhat hardened against these changes, because group chat setup is handled in an end-to-end encrypted/authenticated fashion by clients. This prevents the server from inserting new users without the collaboration of at least one group participant. At the moment, however, both WhatsApp and iMessage seem vulnerable to GCHQ’s proposed approach.)
Due to this need for extensive server and client modifications, the GCHQ proposal actually represents a very significant change to the design of messaging systems. It seems likely that the client-side code changes would need to be deployed to all users, since you can’t do targeted software updates just against criminals. (Or rather, if you could rely on such targeted software updates, you would just use that capability instead of the thing that GCHQ is proposing.)
Which brings us to the last piece: how do get providers to go along with all of this?
While optimism and cooperation are nice in principle, it seems unlikely that communication providers are going to to voluntarily insert a powerful eavesdropping capability into their encrypted services, if only because it represents a huge and risky modification. Presumably this means that the UK government will have to compel cooperation. One potential avenue for this is to use Technical Capability Notices from the UK’s Investigatory Powers Act. Those notices mandate that a provider offer real-time decryption for sets of users ranging from 1-10,000 users, and moreover, that providers must design their systems to ensure this such a capability remains available.
And herein lies the problem.
越墙看国外网加速软件
The real problem with the GCHQ proposal is that it targets a weakness in messaging/calling systems that’s already well-known to providers, and moreover, a weakness that providers have been working to close — perhaps because they’re worried that someone just like GCHQ (or probably, much worse) will try to exploit it. By making this proposal, the folks at GCHQ have virtually guaranteed that those providers will move much, much faster on this.
And they have quite a few options at their disposal. Over the past several years researchers have proposed several designs that offer transparency to users regarding which keys they’re obtaining from a provider’s identity service. These systems operate by having the identity service commit to the keys that are associated with individual users, such that it’s very hard for the provider to change a user’s keys (or to add a device) without everyone in the world noticing.
As mentioned above, advanced messengers like Signal have “submerged” the group chat management into the encrypted communications flow, so that the server cannot add new users without the digitally authenticated approval of one of the existing participants. This design, if ported to in more popular services like WhatsApp, would seem to kill the GCHQ proposal dead.
Of course, these solutions highlight the tricky nature of GCHQ’s proposal. Note that in order to take advantage of existing vulnerabilities, GCHQ is going to have to require that providers change their system. And of course, once you’ve opened the door to forcing providers to change their system, why stop with small changes? What stops the UK government from, say, taking things a step farther, and using the force of law to compel providers notto harden their systems against this type of attack?
Which brings us to the real problem with the GCHQ proposal. As far as I can see, there are two likely outcomes. In the first, providers rapidly harden their system — which is good! — and in the process kill off the vulnerabilities that make GCHQ’s proposal viable (which is bad, at least for GCHQ). The more interest that governments express towards the proposal, the more likely this first outcome is. In the second outcome, the UK government, perhaps along with other governments, solve this problem by forcing the providers to keep their systems vulnerable. This second outcome is what I worry about.
More concretely, it’s true that today’s systems include existing flaws that are easy to exploit. But that does not mean we should entomb those flaws in concrete. And once law enforcement begins to rely on them, we will effectively have done so. Over time what seems like a “modest proposal” using current flaws will rapidly become an ossifying influence that holds ancient flaws in place. In the worst-case outcome, we’ll be appointing agencies like GCHQ as the ultimate architect of Apple and Facebook’s communication systems.
That is not a good outcome. In fact, it’s one that will likely slow down progress for years to come.