Sign Everything

Why every piece of content should be digitally signed
andrew
April 30, 2024

Authenticity is becoming increasingly scarce in the digital world. A piece of data is authentic if you know where or who it came from. The Internet gave us universal information distribution, and AI has given us unbounded content creation, but the tools to ensure that we can trust the data we receive have not caught up. Digital signatures are a battle tested cryptographic primitive that address this authenticity crisis, and the need to widely implement digital signatures has never been more pressing. This article will go over digital signatures, what they are and aren't good at, and explore some of the more interesting types of data that can be digitally signed.

What are Digital Signatures?

Digital signatures are a cryptographic primitive that powers much of the modern Internet. If you are already familiar with them, feel free to skip to the next section.

Say you want to perform a simple interaction online, like getting your account balance from your bank's website. There is one fundamental question - when you visit www.bank.com, how do you know that you're actually talking to your bank? The problem here is that the Internet was not designed with a permanent notion of identity - participants use IP addresses which can change over time. Yet users should always know how to reliably communicate with important websites such as their bank, government page, or favorite social media app. So what can we do to ensure authenticity?

The answer is digital signatures! Digital signatures are a tool of public key cryptography - to start, you generate a keypair which consists of a public key and a private key (both of these are just represented as byte strings). You keep your private key secret and broadcast your public key to world, i.e. telling everyone that "Bob has public key 0xabcdef". You are now able to "sign" a piece of data using your private key, which will generate a string of bytes known as the signature. Anyone in the world can now put together the original piece of data, this signature, and your public key in order to "verify" that the signature did indeed come from your public key.

Note that this solves both the identity and authenticity problems of the internet - your identity is now just your public key! In addition, someone else can assume that in a secure setting, everything signed by your public key originated from you.

Now you can trust that you are actually talking to your bank online. In reality, there remains the question of how you can get your bank's public key in the first place - you can't trust your bank to just tell you its public key since you don't know who your bank is in the first place! For this, we rely on public key infrastructure (PKI) to create a mapping from www.bank.com to your bank's public key. PKI is out of scope for this discussion, but you can learn more about it here.

Why Digital Signatures?

Now we understand why authenticity is so important in the digital world, but why use digital signatures in particular? The answer is threefold: self-sufficiency, unforgeability, and composability.

  • Self-sufficiency

    • Given any piece of data, a signature, and a public key, you can verify the signature. You do not need to know anything specific about where the data came from, but only the signature algorithm being used - the signature speaks for itself. In particular, verification is a portable and offline operation which does not require access to any third party.

  • Unforgeability

    • Fake signatures cannot be created without knowing the private key, and a signature over a piece of data is tied to that data for the rest of time. If data is modified or corrupted, the original signature is no longer valid and the holder of the private key must generate a new signature for the updated data. There is an element of accountability here - once a party has signed some data, they are effectively attesting to that data forever. They can sign a statement retracting a previous signature, but they can never delete that signature once it is given out.

  • Composability

    • Signatures are an open system. As long as there is a similarly open system for public key distribution, anyone can register a public key and begin signing and verifying data themselves. This also means that anyone can build on top of existing digital signature infrastructure, repurposing signatures and even utilizing new cryptographic primitives based on the signatures, for example generating zero knowledge proofs for privacy.

A Case Example: National Digital Identity

Let's visit an example to see how these properties of digital signatures compare to alternatives, and when digital signatures would be preferable. Say the government of Democratia wants to issue a digital identity card for its citizens. They want to choose between two options: a) issue a card to each citizen which contains the citizen's basic information along with a digital signature and is kept in their digital wallet or b) set up accounts for each citizen and allow applications to request information about a citizen using an OAuth-like flow.

What are the advantages of using digital signatures? First off, because signatures are self-sufficient, you can present your digital identity information to anyone without sending a request to Democratia's server or going through any authentication flow. One can imagine that if every country had its own digital identity card, then an application would need to support logging into each individual government website for compatibility. Digital signatures create a lightweight abstraction layer for authentication where logins to different databases are replaced by possession of a signature. Secondly, the unforgeability of digital signatures can lead to better security tradeoffs compared to a central server. Notably, using digital signatures allows for the private key management and signing process to happen in a separate, highly secure server. Democratia can still run a "hot" website which receives a lot of traffic and even displays identity information to users, but even if this website is hacked an attacker will still not be able to generate fake signatures of users' identity data. Finally, because digital signatures are composable, any developer can use Democratia's digital identity system for their own application. For example, I can require users to prove they have a Democratia digital identity in order to login to my application, and use a derivative of their Democratia credentials as their account identifier. By doing this, my application can utilize the system of trust that Democratia has built.

What are the disadvantages of digital signatures? Democratia may want to know every time someone's digital identity is being used, and keep track of what it is being used for. Because digital signatures are self-sufficient, the government would have no way of knowing how a signature is used once it is in the hands of a citizen. In addition, any developer can choose to build their own applications on top of a digital signature without permission, meaning Democratia won't be able to restrict potentially malicious applications from requiring a digital identity to login. This is an important concern when it comes to ensuring data privacy, and some authorization consent screen combined with privacy preserving cryptography such as zero knowledge proofs would likely be necessary to ensure citizens know what information they are giving away. Lastly, the unmalleability of digital signatures makes it more difficult for Democratia to update a user's digital identity - they would need to resign the new identity, distribute the new signature, and have some method for revoking the old one.

Digital signatures are not a silver bullet for all problems related to authentication and authorization. They are not as flexible as restricting access through a central server, and do not provide any privacy guarantees by themselves. These are both issues that can be addressed, with public revocation lists for credential updates and deletes, and zero knowledge proofs with selective disclosure for privacy. But the main opposition to the adoption of digital signatures is likely to come from their self-sufficiency. Being unable to track the use of a signature and allowing users to present data without the consent of the issuing party is antithetical to how much of the data economy works today.

What Can We Sign?

So far, digital signatures have found widespread adoption in TLS certificates, email, peer to peer payments, and more. But this is just the tip of the iceberg - every single piece of data that originates from a specific person or organization could be digitally signed. In particular, here are some important types of data that would benefit the most from having digital signatures.

Government and Identity Documents

Passports, driver's licenses, residency cards, social security are perfect candidates for digital signatures because authenticity is of paramount importance. In fact, many passports and certain residency cards are already digitally signed today! Adding a digital signature to either a physical or digital card has a low marginal cost and greatly increases resiliency against forgery. The main challenge with signatures on identity documents is revocation and credential updates. A solution here would likely require online revocation lists or short expiry periods, but this is a challenge faced by existing identity documents regardless.

Financial and Legal Documents

Digital signatures can be used in financial and legal environments as a system of accountability. Data communication in the current legal framework offers plenty of room for plausible deniability - physical documents can be forged, and authenticity is reliant upon physical signatures which themselves are quite unreliable. Often times legal evidence is based on email or subpoenas on other communication platforms, but this is costly and reliant on third parties. Digital signatures offer an efficient mechanism to keep all parties accountable for the information they communicate.

Art, Music, and Creative Work

Authenticity is what fuels the creative industry - artists have names and reputations based on the work attributed to them. In an age where AI allows anyone to create and copy high quality media, and the Internet allows anyone to distribute such media at little to no cost, the authenticity of creative work has never been in greater peril. One part of the solution is for artists to digitally sign every piece they produce and have trusted registries for artist public keys. In addition, we could have a content timestamp notary, an entity that will take a piece of creative work and sign the content along with a timestamp of when it was created. More advanced techniques are possible, for example creating perceptual hashes of videos to automatically block copyrighted content, but at the minimum one should be able to present a piece of creative work with a signed timestamp and let the public decide who deserves credit for it. In fact, large publishers are already trying to address this problem, with the FOX Corporation recently launching Verify, a tool for media companies to digitally sign their content and upload it to the blockchain so that its provenance can be verified.

Photography, Video, and Audio Recordings

Similar to creative work, it is becoming increasingly clear that the authenticity of content produced by hardware in the physical world is in jeopardy. This is most relevant to the sectors of news and public information, as photos, videos, and audio used in the news are susceptible to forgery and parody by generative models. A possible solution here is attested hardware, cameras and recorders that sign the data they produce with a tamper-resistant private key embedded in hardware. Even more ambitious, the hardware could sign the timestamp and location where the content was produced, although the latter is far more challenging. Photos, audio, and video could be transformed in zero knowledge, allowing information sources to release censored or modified content that was originally produced by trusted hardware.

Physical Check-ins and Interactions

As the cost of forging and imitating digital interactions goes down, the value of physical interactions goes up. Brands, creators, and artists care about loyal engagement from fans, and more often than not the most devout engagement occurs in the physical world. Trusted hardware can attest to human interactions in the real world, as this can be as simple as tapping an NFC card with your phone when you shop at a store, attend a fan meetup, or go to a concert. These interactions are especially valuable because they are verifiably produced by humans who are willing to put in time and effort, and allows creators to isolate those who care the most.

APIs

So much of the data that is transferred on the web goes through APIs, and standardizing digital signatures across data returned by APIs would be a huge step towards authenticity on the Internet. A simple example: I want to prove my bank balance to you. I can take a screenshot of my balance, but then you might just say it was photoshopped. Instead, if my bank had a signed API, I could query this API and send you my balance with a signature proving its veracity. Signed APIs excel when privacy is involved. Instead of sending you my balance in the clear, I could instead generate a zero knowledge proof that my balance is above a certain number, say greater that $1000, and have this claim be rooted in the trust of the bank's original signed API response. However, in practice few APIs are signed, in part because companies depend on restricting access to their APIs to make money. The widespread signing of APIs will likely have to accompany business models that support an open data economy.

HTTP Responses

We could go one step deeper in the abstraction stack and choose to sign HTTP responses. This would allow us to capture not just the data that is available via an API, but the content of any webpage on the Internet. One use case here would be a verifiable web archive, where news sources could be held accountable for the information that they publish, even if they choose to edit it later on. TLS Notary and Reclaim Protocol are bootstrapped solutions that allow us to get verifiable data from HTTP using a third party, but servers signing responses would remove the need for these workarounds.

Data Requiring a Trusted Timestamp or Location

The list of possible data that could be digitally signed is boundless, but one category is particularly interesting - any type of data where a trusted timestamp or geolocation is necessary. We've already discussed the timestamp notary, a system where you could send it a piece of content and get it signed with a timestamp. But another interesting direction is the location notary, a setup where some device can request to get its location at a particular time notarized. A possible implementation could involve aggregating signatures from nearby devices, or even getting signed GPS responses, but locations are far harder to pin down since they can be mocked by a network of devices spread around the world.

What Can't Signatures Do?

Digital signatures are great, but they don't solve everything. In particular, some of the challenges and shortcomings of digital signatures:

  • Guarantee privacy
    • Digital signatures do not offer any privacy guarantees. They are separate from encryption and do not offer any confidential communication. In order to verify a digital signature you must present the original data in plaintext, which reveals everything. The good news is that one can combine digital signatures with a privacy preserving cryptographic primitive such as zero knowledge proofs and verify the digital signature in zero knowledge to ensure that certain information is kept hidden.
  • Handle key management and distribution
    • The utility and adoption of digital signatures depends on there being a widely available registry of trusted public keys, where given an entity I can lookup its public key reliably. This leads to a few issues:
      • First, there is the question of how I can trust that entity X is the "owner" of public key Y. Public key infrastructure is our current solution here, where we trust a few select parties to tell us the public keys of different entities. We may similarly have to rely on trusted parties to bootstrap public key infrastructure for other use cases, for example a platform that registers the keys of artists or a manufacturer that publicizes the keys of their trusted hardware. One important thing to note here is that digital signatures do not create trust in and of themselves, they must be rooted in another source of trust. For example, I download a song from TikTok that sounds like Beyonce, but I don't know if it's AI generated or if it's actually her original song. I see that the downloaded song is digitally signed by public key 0x123456 and I can verify this signature, but I still need to know who 0x123456 belongs to. Anyone can create a mapping from 0x123456 to Beyonce, but this isn't enough. Rather, I need 0x123456 to be rooted in some representation of Beyonce's identity that I trust
        • it could be her Instagram page, Spotify profile, or a third party website that displays an image of Beyonce's face with a blue checkmark next to it which claims this key belongs to Beyonce.
      • Second, managing private keys becomes a big challenge, especially for entities that do not have the appropriate infrastructure. In many current technological systems, we have human fallbacks for edge cases. If you lose your bank account password, you can call someone at the bank and they will help you recover it. Digital signatures offer no such reprieve - if you lose your key, someone else can pretend to be you. A possible remedy here is to have entities that specialize in managing keys which provide stricter security guarantees while at the same time offering human recovery methods such as trustless email-based recovery via ZK Email. And of course, we will always have social consensus as the base layer, where Beyonce could go on X and say her private key got stolen.
  • Provide quantum resistance
    • Our current digital signature schemes are not quantum resistant, meaning that as quantum computing continues to develop, at some point we will be able to forge signatures. Luckily, this is an area of active development and there are multiple plausibly post-quantum secure signature schemes going through the standardization process (although these schemes have been subject to multiple attacks). Interestingly enough, quantum computing can give us access to new signature schemes that weren’t previously possible, such as one-shot signatures which destroy the private key after signing one piece of data.

Why Now?

Digital signatures were first conceived in the 1970's and have already been implemented widely around the Internet, so why should we care about them even more today? In short, the need has never been greater, the UX for key management has vastly improved, and new cryptographic primitives have been developed to address the shortcomings of digital signatures.

  • The authenticity crisis
    • As it becomes easier to manufacture text, speech, art, and other forms of media through the use of generative models, it becomes equally more important to know where content originated from. Creative work is predicated upon the idea of attribution, where an artist who is first to introduce an idea should receive credit for that idea. But modern media makes it extremely easy for ideas to be stolen and copied, and the only recourse an artist has is appealing to the masses with whatever platform they have. Naturally, the less popular artists are worse off in this setting. Speaking more broadly about information in general, humans are only capable of processing so much of the ever increasing quantity of information available in the world. When anyone can write articles that sound like the New York Times, generate photos that belong in National Geographic, or make audio recordings that sound like they came from the president, it becomes necessary that we know when a party is truly responsible for a piece of data. Otherwise, what can we trust?
  • Passkeys, secure elements, and biometrics
    • One of the biggest challenges in digital signature adoption is key management. Keys can be unintuitive to work with yet require the utmost security to ensure authenticity. Luckily, one of the largest movements in digital authentication is underway with the adoption of passkeys. Passkeys are replacements for passwords that actually use digital signatures themselves. A private key is stored in a tamper-resistant, secure element on your device, and in order to use a passkey you must submit an approval using biometrics. This replaces passwords, which are single points of failure that present a clear target for attackers, to something more difficult, namely possession of a device + biometrics. The adoption of passkeys and more generally biometric based credential management is great for a world full of digital signatures. Anyone can store their signing keys just as they would a passkey, and use intuitive UX approvals such as Touch ID or Face ID in order to sign data. Moving forward, passkeys can even offer an efficient way to associate everyone's account on a platform like X or Instagram with a public key because you are already given a digital signature keypair when you use passkeys.
  • Zero knowledge proofs and privacy
    • Digital signatures by themselves are not sufficient for many use cases, especially those that involve private data such as identity documents. Recent developments in cryptographic primitives such as zero knowledge proofs allow digital signatures to be verified without revealing the original data that was signed, but rather only properties about that data. This is a fundamental building block that is necessary for digital signatures to gain adoption, as they enable us to work with signatures over data that is supposed to remain confidential. In fact, many of the cryptographic techniques needed to privately verify signatures have only been developed in the past few years, and now we are able to perform such verifications in roughly one second!

Looking Forward

Authenticity is of paramount importance in today's digital age, and with modern technology it is becoming increasingly difficult to know where information comes from. Digital signatures offer a lightweight, robust, open system to provide authenticity, and we're only beginning to scratch the surface on the types of data that can be digitally signed. Now is a better time than ever to push for their adoption.

P.S. And of course, this piece is digitally signed. I used libhalo from ARX Research to sign the Markdown of this article with a FIDO2 compliant NFC card, you can try it for yourself at nfcsign.me.

P.P.S. The title of this article, Sign Everything, references a previous blog post from Fred Wilson.