A COST-SENSITIVE, SEMI-SUPERVISED, AND ACTIVE LEARNING APPROACH FOR PRIORITY OUTLIER INVESTIGATION

Loading...
Thumbnail Image

Date

2023

Advisors

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

88
views
101
downloads

Abstract

This master’s thesis presents a novel approach to address the problem of balancing the cost of investigating suspected cases with the potential gain of detecting an outlier, particularly in the context of fraud detection. The proposed approach is a cost- sensitive, semi-supervised, and active-learning priority outlier investigation model, which aims to identify the top-k unlabeled cases that maximize the overall expected gain.

The proposed approach is developed based on a comprehensive review of related work in cost-sensitive and active learning in outlier detection. We formulate the problem as a maximization function that utilizes kernel density estimation to calculate the probability of unlabeled cases. To improve the model’s accuracy and efficiency, we employ graph representation, which takes into account the similarities and relationships among cases. Furthermore, we utilize the neighborhood of cases for efficient kernel density estimation. The performance of the proposed approach is evaluated using both synthetic data and a real-world credit card fraud detection dataset.

The contributions of this thesis include the development of effective and efficient outlier investigation strategies with practical applications in various domains, particularly in the context of fraud detection. The proposed approach offers a promising solution to the challenge of balancing the cost of investigating suspected cases with the potential gain of detecting an outlier.

Description

Provenance

Subjects

Citation

Citation

Song, Xinran (2023). A COST-SENSITIVE, SEMI-SUPERVISED, AND ACTIVE LEARNING APPROACH FOR PRIORITY OUTLIER INVESTIGATION. Master's thesis, Duke University. Retrieved from https://hdl.handle.net/10161/27823.

Collections


Dukes student scholarship is made available to the public using a Creative Commons Attribution / Non-commercial / No derivative (CC-BY-NC-ND) license.