Towards Safe and Secure Generative AI Systems
Date
2025
Authors
Advisors
Journal Title
Journal ISSN
Volume Title
Repository Usage Stats
views
downloads
Abstract
As generative AI rapidly advances, its associated safety and security risks expand equally. A key factor underlying these risks is the large-scale pre-training data, which is the cornerstone of powerful generative AI systems, such as ChatGPT and Gemini, etc.. Model-providers (e.g., OpenAI) often collect Internet-wide scale pre-training data from numerous data-owners (e.g., Wikipedia). This dissertation investigates the risks posed by unauthorized and untrusted pre-training data, as well as corresponding countermeasures.
In the first part of this dissertation, we introduce EncoderMI, a pre-training data-use auditing method for data owners. Specially, Internet data may be used as pre-training data by model providers without proper authorization from data-owners, raising significant concerns about data privacy and misuse. EncoderMI addresses these issues by allowing data owners to verify whether their data was used in the pre-training data of a foundation model, with only black-box access to a foundation model.
In the second part, we show how untrusted pre-training data can lead to severe vulnerabilities by presenting PoisonedEncoder. Specifically, PoisonedEncoder is a data-poisoning attack that injects carefully crafted "poisoning" inputs into unlabeled pre-training data of a foundation model. As a result, the downstream classifiers built on top of the poisoned foundation model are very likely to misclassify arbitrary clean inputs into attacker-chosen classes across multiple classification tasks.
Finally, we propose a post-training patching defense called Mudjacking to remove backdoors from foundation models after an attack is detected. Because a compromised foundation model can undermine every intelligent application built on it, Mudjacking is crucial for safeguarding the broader ecosystem of generative AI. In particular, Mudjacking removes backdoors while preserving the utility of the foundation model.
Type
Department
Description
Provenance
Subjects
Citation
Permalink
Citation
Liu, Hongbin (2025). Towards Safe and Secure Generative AI Systems. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/32703.
Collections
Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.