Applications of Predictive Analytics for Machine Learning as a Service

Limited Access
This item is unavailable until:
2027-01-13

Date

2024

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

6
views
0
downloads

Abstract

Machine Learning as a Service (MLaaS) has emerged as a pivotal framework for democratizing access to advanced Machine Learning (ML) and generative models, enabling users to leverage cloud-based computational resources and models without the need for substantial local infrastructure. The rapid expansion of cloud-based ML applications and inference services, combined with increasing demands for computational resources, data privacy, and system efficiency, highlights the necessity for scalable, secure, and optimized cloud resource management solutions. This dissertation explores optimizations in cloud resource management and inference serving for ML and generative models respectively; addressing the challenges of improving performance, and scalability in MLaaS environments. Under the purview of MLaaS, this dissertation explores Healthcare as a Service (HaaS), addressing the challenges of developing and deploying state-of-the-art models for cloud services and enhancing user trust in model predictions, particularly in the absence of ground truth through Uncertainty Quantification (UQ). Finally, this dissertation proposes strategies to advance security research of the deployed models under the MLaaS setting. By addressing these core challenges, this work enhances the scalability of computational resources, reliability of model predictions, and security of ML models in MLaaS setting.

Efficient cloud resource orchestration is essential to ensuring that multi-tenant MLaaS platforms can scale effectively while maintaining system-level objectives and preserving user privacy. Kubernetes-based schedulers operate with coarse granularity, satisfying only the valid placement of pods to nodes. They underutilize bare metal resources and access sensitive user metadata to make scheduling decisions. To address these limitations, we propose a privacy-preserving job scheduler integrated with the Argo workflow, built in conjunction with Kubernetes. It enables fine-grain scheduling by selecting competing jobs within a single pod, thus optimizing resource utilization, and preserving user privacy, while satisfying system-level objectives.

Cloud-based models process trillions of inference queries daily, posing challenges due to varied application needs, execution environments, and model types. Unlike traditional ML models, generative models like Stable Diffusion require iterative steps, increasing computational demands at the inference stage. Inference serving platforms for MLaaS are not optimized for resource intensive inferencing, like Stable Diffusion as a Service (SDaaS). This dissertation proposes a model-less, privacy-preserving inference framework for SDaaS, improving performance and satisfying user-defined system objectives.

Moving on to cloud-based healthcare services, predictive models must meet standards of clinical and analytical accuracy, latency, and efficiency. This dissertation presents a cloud-based solution for managing Type-1 diabetes through precise blood glucose prediction, essential for insulin administration. We introduce a neural architecture search framework with deep reinforcement learning to create patient-specific models, explore model pruning for edge deployments, with computational offloading between edge and cloud in an Internet of Things setting, aiming to optimize real-time diabetes management in a HaaS context.

UQ is vital in safety-critical domains like healthcare, ensuring model predictions remain reliable without ground truth. In digital pathology, UQ methods often address distribution shifts between training and test data, typically employing image registration techniques that assume reference images closely match the original training data. In HaaS setting, these techniques may be constrained by privacy regulations To address this, we propose a novel latent space perturbation method using autoencoders for image segmentation. This method is validated on glomeruli segmentation in frozen kidney donor sections, a key task in assessing kidney transplant viability.

Thus, building a cloud-deployable model involves significant investment in data curation, architecture selection, and hyperparameter tuning, making it a critical part of the model owner's Intellectual Property (IP). In MLaaS environments, shared models are vulnerable to theft. To protect IP, model owners embed watermarks—undisclosed trigger inputs and labels—into neural networks. If stolen, these watermarks allow verification of ownership, but adversaries often alter model weights to remove them, degrading model utility. This dissertation introduces two attack strategies to breach model IP, preserving model performance while evading detection at the inference stage.

Description

Provenance

Subjects

Computer engineering, Biomedical engineering, Artificial intelligence

Citation

Citation

Ray, Aritra (2024). Applications of Predictive Analytics for Machine Learning as a Service. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/32627.

Collections


Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.