Beta-CoRM: A Bayesian approach for n-gram profiles analysis

dc.contributor.author

Perusquía, JA

dc.contributor.author

Griffin, JE

dc.contributor.author

Villa, C

dc.date.accessioned

2025-11-29T08:15:21Z

dc.date.available

2025-11-29T08:15:21Z

dc.date.issued

2025-02-01

dc.description.abstract

n-gram profiles have been successfully and widely used to analyse long sequences of potentially differing lengths for clustering or classification. Mainly, machine learning algorithms have been used for this purpose but, despite their predictive performance, these methods cannot discover hidden structures or provide a full probabilistic representation of the data. A novel class of Bayesian generative models designed for n-gram profiles used as binary attributes have been designed to address this. The flexibility of the proposed modelling allows to consider a straightforward approach to feature selection in the generative model. Furthermore, a slice sampling algorithm is derived for a fast inferential procedure, which is applied to synthetic and real data scenarios and shows that feature selection can improve classification accuracy.

dc.identifier.issn

0167-9473

dc.identifier.issn

1872-7352

dc.identifier.uri

https://hdl.handle.net/10161/33546

dc.language

en

dc.publisher

Elsevier BV

dc.relation.ispartof

Computational Statistics and Data Analysis

dc.relation.isversionof

10.1016/j.csda.2024.108056

dc.rights.uri

https://creativecommons.org/licenses/by-nc/4.0

dc.subject

Bayesian statistics

dc.subject

Cyber security

dc.subject

Feature selection

dc.subject

Labelled data

dc.subject

n-Grams

dc.title

Beta-CoRM: A Bayesian approach for n-gram profiles analysis

dc.type

Journal article

duke.contributor.orcid

Villa, C|0000-0002-2670-2954

pubs.begin-page

108056

pubs.end-page

108056

pubs.organisational-group

Duke

pubs.organisational-group

Affiliate

pubs.organisational-group

Duke Kunshan University

pubs.organisational-group

DKU Faculty

pubs.organisational-group

DKU Studies

pubs.publication-status

Published

pubs.volume

202

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
CSDA - 2025.pdf
Size:
3.51 MB
Format:
Adobe Portable Document Format