Beta-CoRM: A Bayesian approach for n-gram profiles analysis

Loading...

Date

2025-02-01

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

0
views
1
downloads

Citation Stats

Attention Stats

Abstract

n-gram profiles have been successfully and widely used to analyse long sequences of potentially differing lengths for clustering or classification. Mainly, machine learning algorithms have been used for this purpose but, despite their predictive performance, these methods cannot discover hidden structures or provide a full probabilistic representation of the data. A novel class of Bayesian generative models designed for n-gram profiles used as binary attributes have been designed to address this. The flexibility of the proposed modelling allows to consider a straightforward approach to feature selection in the generative model. Furthermore, a slice sampling algorithm is derived for a fast inferential procedure, which is applied to synthetic and real data scenarios and shows that feature selection can improve classification accuracy.

Department

Description

Provenance

Subjects

Bayesian statistics, Cyber security, Feature selection, Labelled data, n-Grams

Citation

Published Version (Please cite this version)

10.1016/j.csda.2024.108056

Publication Info

Perusquía, JA, JE Griffin and C Villa (2025). Beta-CoRM: A Bayesian approach for n-gram profiles analysis. Computational Statistics and Data Analysis, 202. pp. 108056–108056. 10.1016/j.csda.2024.108056 Retrieved from https://hdl.handle.net/10161/33546.

This is constructed from limited available data and may be imprecise. To cite this article, please review & use the official citation provided by the journal.

Scholars@Duke

Villa

Cristiano Villa

Associate Professor of Statistics at Duke Kunshan University

Prof. Cristiano Villa main research area is in Bayesian statistics, with particular interest in objective methods. His output has been published in several peer-reviewed journals and presented at international conferences, such as the ISBA International Conference, the O-Bayes conference, and the ERCIM conference. In addition to his research, Prof. Villa is deeply committed to teaching and enjoys interacting with students. His teaching interests include probability, statistics, linear modelling, and risk management. Before joining Duke Kunshan University (DKU), Prof. Villa was a member of the Newcastle University (UK) and the University of Kent (UK). Prior to joining academia in 2014, he worked as an auditor and as an advisor for KPMG in several countries, including, Italy, UK, New Zealand, and Singapore. He holds an M.Sc. and a Ph.D. from the University of Kent, UK.


Unless otherwise indicated, scholarly articles published by Duke faculty members are made available here with a CC-BY-NC (Creative Commons Attribution Non-Commercial) license, as enabled by the Duke Open Access Policy. If you wish to use the materials in ways not already permitted under CC-BY-NC, please consult the copyright owner. Other materials are made available here through the author’s grant of a non-exclusive license to make their work openly accessible.