Paper 2023/1661

Publicly-Detectable Watermarking for Language Models

Jaiden Fairoze, University of California, Berkeley
Sanjam Garg, University of California, Berkeley
Somesh Jha, University of Wisconsin–Madison
Saeed Mahloujifar, FAIR, Meta
Mohammad Mahmoody, University of Virginia
Mingyuan Wang, University of California, Berkeley
Abstract

We present a highly detectable, trustless watermarking scheme for LLMs: the detection algorithm contains no secret information, and it is executable by anyone. We embed a publicly-verifiable cryptographic signature into LLM output using rejection sampling. We prove that our scheme is cryptographically correct, sound, and distortion-free. We make novel uses of error-correction techniques to overcome periods of low entropy, a barrier for all prior watermarking schemes. We implement our scheme and make empirical measurements over open models in the 2.7B to 70B parameter range. Our experiments suggest that our formal claims are met in practice.

Metadata
Available format(s)
PDF
Category
Applications
Publication info
Preprint.
Keywords
public-detectabilitywatermarkinglarge language modelscryptographic protocolsprovable securitymachine learning
Contact author(s)
fairoze @ berkeley edu
sanjamg @ berkeley edu
jha @ cs wisc edu
saeedm @ meta com
mohammad @ virginia edu
mingyuan @ berkeley edu
History
2024-05-16: revised
2023-10-26: received
See all versions
Short URL
https://ia.cr/2023/1661
License
Creative Commons Attribution
CC BY

BibTeX

@misc{cryptoeprint:2023/1661,
      author = {Jaiden Fairoze and Sanjam Garg and Somesh Jha and Saeed Mahloujifar and Mohammad Mahmoody and Mingyuan Wang},
      title = {Publicly-Detectable Watermarking for Language Models},
      howpublished = {Cryptology ePrint Archive, Paper 2023/1661},
      year = {2023},
      note = {\url{https://eprint.iacr.org/2023/1661}},
      url = {https://eprint.iacr.org/2023/1661}
}
Note: In order to protect the privacy of readers, eprint.iacr.org does not use cookies or embedded third party content.