Paper 2021/029

EPISODE: Efficient Privacy-PreservIng Similar Sequence Queries on Outsourced Genomic DatabasEs

Thomas Schneider and Oleksandr Tkachenko

Abstract

Nowadays, genomic sequencing has become much more affordable for many people and, thus, many people own their genomic data in a digital format. Having paid for genomic sequencing, they want to make use of their data for different tasks that are possible only using genomics, and they share their data with third parties to achieve these tasks, e.g., to find their relatives in a genomic database. As a consequence, more genomic data get collected worldwide. The upside of the data collection is that unique analyses on these data become possible. However, this raises privacy concerns because the genomic data uniquely identify their owner, contain sensitive data about his/her risk for getting particular diseases, and even sensitive information about his/her family members. In this paper, we introduce EPISODE - a highly efficient privacy-preserving protocol for Similar Sequence Queries (SSQs), which can be used for finding genetically similar individuals in an outsourced genomic database, i.e., securely aggregated from data of multiple institutions. Our SSQ protocol is based on the edit distance approximation by Asharov et al. (PETS'18), which we further optimize and extend to the outsourcing scenario. We improve their protocol by using more efficient building blocks and achieve a 5-6x run-time improvement compared to their work in the same two-party scenario. Recently, Cheng et al. (ASIACCS'18) introduced protocols for outsourced SSQs that rely on homomorphic encryption. Our new protocol outperforms theirs by more than factor 24000x in terms of run-time in the same setting and guarantees the same level of security. In addition, we show that our algorithm scales for practical database sizes by querying a database that contains up to a million short sequences within a few minutes, and a database with hundreds of whole-genome sequences containing 75 million alleles each within a few hours.

Metadata
Available format(s)
PDF
Category
Applications
Publication info
Published elsewhere. Minor revision. ASIACCS'19
Keywords
medical privacyprivacy-enhancing technologiesgenomic researchedit distancesecure computationoutsourcing
Contact author(s)
schneider @ encrypto cs tu-darmstadt de
tkachenko @ encrypto cs tu-darmstadt de
History
2021-01-12: received
Short URL
https://ia.cr/2021/029
License
Creative Commons Attribution
CC BY

BibTeX

@misc{cryptoeprint:2021/029,
      author = {Thomas Schneider and Oleksandr Tkachenko},
      title = {EPISODE: Efficient Privacy-PreservIng Similar Sequence Queries on Outsourced Genomic DatabasEs},
      howpublished = {Cryptology ePrint Archive, Paper 2021/029},
      year = {2021},
      note = {\url{https://eprint.iacr.org/2021/029}},
      url = {https://eprint.iacr.org/2021/029}
}
Note: In order to protect the privacy of readers, eprint.iacr.org does not use cookies or embedded third party content.