# Browsing by Author "Banks, David L"

###### Results Per Page

###### Sort Options

Item Open Access A Bayesian Strategy to the 20 Question Game with Applications to Recommender Systems(2017) Suresh, Sunith RajIn this paper, we develop an algorithm that utilizes a Bayesian strategy to determine a sequence of questions to play the 20 Question game. The algorithm is motivated with an application to active recommender systems. We first develop an algorithm that constructs a sequence of questions where each question inquires only about a single binary feature. We test the performance of the algorithm utilizing simulation studies, and find that it performs relatively well under an informed prior. We modify the algorithm to construct a sequence of questions where each question inquires about 2 binary features via AND conjunction. We test the performance of the modified algorithm

via simulation studies, and find that it does not significantly improve performance.

Item Open Access Latent Space Diffusion(2015) Fisher, Jacob CharlesSocial networks represent two different facets of social life: (1) stable paths for diffusion, or the spread of something through a connected population, and (2) random draws from an underlying social space, which indicate the relative positions of the people in the network to one another. The dual nature of networks creates a challenge - if the observed network ties are a single random draw, is it realistic to expect that diffusion only follows the observed network ties? This study takes a first step towards integrating these two perspectives by introducing a social space diffusion model. In the model, network ties indicate positions in social space, and diffusion occurs proportionally to distance in social space. Practically, the simulation occurs in two parts: positions are estimated using a latent space model, and then the predicted probabilities of a tie from that model - representing the distances in social space - or a series of networks drawn from those probabilities - representing routine churn in the network - are used as weights in a weighted averaging framework. Using a school friendship network, I show that the model is more consistent and, when probabilities are used, the model converges faster than diffusion following only the observed network ties.

Item Open Access Mining Political Blogs With Network Based Topic Models(2014) Liang, JiaweiWe develop a Network Based Topic Model (NBTM), which integrates a Random

Graph model with the Latent Dirichlet Allocation (LDA) model. The NBTM assumes that the topic proportion of a document has a xed variance across the document corpus with author dierences treated as random eects. It also assumes that the links between documents are binary variables whose probabilities depend upon the author random eects. We t the model to political blog posts during the calendar year 2012 that mention Trayvon Martin. This paper presents the topic extraction results and posterior prediction results for hidden links within the blogosphere.

Item Open Access Momentum Scale Estimation Using Maximum LikelihoodTemplate Fitting(2010) Zeng, YuA maximum likelihood template fitting procedure is performed by using Upsilon --> mu+mu- events to extract the momentum scale, a scale factor applied to measured momentum, of the CDF detector at Fermilab. The constructed invariant mass spectrum from data events is compared with the invariant mass spectrum from Monte Carlo simulated events, with the momentum scale varying as a free parameter in the simulation. The invariant mass spectrum from simulation which best matches the data spectrum gives the maximum likelihood estimation of the momentum scale. We find the momentum scale is dp/p = (-1.330 ± 0.028(stat) ± 0.099(syst)) × 10^{-3}.

Item Open Access Problems in Computational Advertising(2021) Guo, YiComputational advertising is a multi-billion-dollar industry, yet it has gotten little attention from academic statisticians. Despite this, the performance of this collection of pricing models, keyword auctions, A/B testing, and recommender systems is largely reliant on statistical technique in almost every element of its design and implementation.

Online ad auctions and e-commercial logistics are two of the major components of computational advertising. In a real-time bidding scenario, the objective for the former is to maximize expected utilities. The latter is concerned with the development of statistical modeling for dynamic continuous flows. In turn, this leads to a range of various issues, three of which are discussed in this thesis.

Chapter 1 briefly introduces the topics of online advertising and computational advertising. Chapter 2 proposes a new method, the Backwards Indifference Derivation (BID) algorithm, to numerically approximate the pure strategy Nash equilibrium (PSNE) bidding functions in asymmetric first-price auctions. The classic PSNE solution assumes that all parties agree on the type distribution for each participant, and all know that this information is held in common. This common knowledge assumption is strong and often unrealistic. Chapter 3 addresses that gap by providing two alternative solutions, each based upon an adversarial risk analysis (ARA) perspective. Chapter 4 extends the previous methodology for Bayesian dynamic flow models of discrete data to real-valued and positive flows. Finally, Chapter 5 presents some concluding remarks and briefly discusses other problems in computational advertising.

Item Open Access Statistical Inference Utilizing Agent Based Models(2014) Heard, Daniel PhilipAgent-based models (ABMs) are computational models used to simulate the behaviors,

actionsand interactions of agents within a system. The individual agents

each have their own set of assigned attributes and rules, which determine

their behavior within the ABM system. These rules can be

deterministic or probabilistic, allowing for a great deal of

flexibility. ABMs allow us to

observe how the behaviors of the individual agents affect the system

as a whole and if any emergent structure develops within the

system. Examining rule sets in conjunction with corresponding emergent

structure shows how small-scale changes can

affect large-scale outcomes within the system. Thus, we can better

understand and predict the development and evolution of systems of

interest.

ABMs have become ubiquitous---they used in business

(virtual auctions to select electronic ads for display), atomospheric

science (weather forecasting), and public health (to model epidemics).

But there is limited understanding of the statistical properties of

ABMs. Specifically, there are no formal procedures

for calculating confidence intervals on predictions, nor for

assessing goodness-of-fit, nor for testing whether a specific

parameter (rule) is needed in an ABM.

Motivated by important challenges of this sort,

this dissertation focuses on developing methodology for uncertainty

quantification and statistical inference in a likelihood-free context

for ABMs.

Chapter 2 of the thesis develops theory related to ABMs,

including procedures for model validation, assessing model

equivalence and measuring model complexity.

Chapters 3 and 4 of the thesis focuses on two approaches

for performing likelihood-free inference involving ABMs,

which is necessary because of the intractability of the

likelihood function due to the variety of input rules and

the complexity of outputs.

Chapter 3 explores the use of

Gaussian Process emulators in conjunction with ABMs to perform

statistical inference. This draws upon a wealth of research on emulators,

which find smooth functions on lower-dimensional Euclidean spaces that approximate

the ABM. Emulator methods combine observed data with output from ABM

simulations, using these

to fit and calibrate Gaussian-process approximations.

Chapter 4 discusses Approximate Bayesian Computation for ABM inference,

the goal of which is to obtain approximation of the posterior distribution

of some set of parameters given some observed data.

The final chapters of the thesis demonstrates the approaches

for inference in two applications. Chapter 5 presents application models the spread

of HIV based on detailed data on a social network of men who have sex with

men (MSM) in southern India. Use of an ABM

will allow us to determine which social/economic/policy

factors contribute to thetransmission of the disease.

We aim to estimate the effect that proposed medical interventions will

have on the spread of HIV in this community.

Chapter 6 examines the function of a heroin market

in the Denver, Colorado metropolitan area. Extending an ABM

developed from ethnographic research, we explore a procedure

for reducing the model, as well as estimating posterior

distributions of important quantities based on simulations.

Item Open Access Statistical Issues in Quantifying Text Mining Performance(2017) Chai, Christine PeijinnText mining is an emerging field in data science because text information is ubiquitous, but analyzing text data is much more complicated than analyzing numerical data. Topic modeling is a commonly-used approach to classify text documents into topics and identify key words, so the text information of interest is distilled from the large corpus sea. In this dissertation, I investigate various statistical issues in quantifying text mining performance, and Chapter 1 is a brief introduction.

Chapter 2 is about the adequate pre-processing for text data. For example, words of the same stem (e.g. "study" and "studied") should be assigned the same token because they share the exact same meaning. In addition, specific phrases such as "New York" and "White House" should be retained because many topic classification models focus exclusively on words. Statistical methods, such as conditional probability and p-values, are used as an objective approach to discover these phrases.

Chapter 3 starts the quantification of text mining performance; this measures the improvement of topic modeling results from text pre-processing. Retaining specific phrases increases their distinctivity because the "signal" of the most probable topic becomes stronger (i.e., the maximum probability is higher) than the "signal" generated by any of the two words separately. Therefore, text pre-processing helps recover semantic information at word level.

Chapter 4 quantifies the uncertainty of a widely-used topic model { latent Dirichlet allocation (LDA). A synthetic text dataset was created with known topic proportions, and I tried several methods to determine the appropriate number of topics from the data. Currently, the pre-set number of topics is important to the topic model results because LDA tends to utilize all topics allotted, so that each topic has about equal representation.

Last but not least, Chapter 5 explores a few selected text models as extensions, such as supervised latent Dirichlet allocation (sLDA), survey data application, sentiment analysis, and the infinite Gaussian mixture model.

Item Open Access Topics in Computational Advertising(2014) Au, Timothy Chun-WaiComputational advertising is an emerging scientific discipline that incorporates tools and ideas from fields such as statistics, computer science, and economics. Although a consequence of the rapid growth of the Internet, computational advertising has since helped transform the online advertising business into a multi-billion dollar industry.

The fundamental goal of computational advertising is to determine the ``best'' online ad to display to any given user. This ``best'' ad, however, changes depending upon the specific context that is under consideration. This leads to a variety of different problems, three of which are discussed in this thesis.

Chapter 1 briefly introduces the topics of online advertising and computational advertising. Chapter 2 proposes a numerical method to approximate the pure strategy Nash equilibrium bidding functions in an independent private value first-price sealed-bid auction where bidders draw their types from continuous and atomless distributions---a setting in which solutions cannot generally be analytically derived, despite the fact that they are known to exist and be unique. Chapter 3 proposes a cross-domain recommender system that is a multiple-domain extension of the Bayesian Probabilistic Matrix Factorization model. Chapter 4 discuss some of the tools and challenges of text mining by using the Trayvon Martin shooting incident as a case study in analyzing the lexical content and network connectivity structure of the political blogosphere. Finally, Chapter 5 presents some concluding remarks and briefly discusses other problems in computational advertising.

Item Open Access Two Applications of Adversarial Risk Analysis(2011) Wang, ShouqiangAdversarial risk analysis (ARA) attempts to apply statistical methodology

to game-theoretic problems and provides an alternative to the solution concepts in traditional game theory. Specifically, it uses a Bayesian model for the decision-making processes of one's opponents to develop a subjective distribution over their actions, enabling the application of traditional risk analysis to maximize the expected utility. This thesis applies ARA framework to network routing problems in an adversarial contexts and a range of simple Borel gambling games.