Computational Journalism: from Answering Question to Questioning Answers and Raising Good Questions

dc.contributor.advisor

Yang, Jun

dc.contributor.advisor

Agarwal, Pankaj Kumar

dc.contributor.author

Wu, You

dc.date.accessioned

2015-09-01T20:05:54Z

dc.date.available

2016-08-13T04:30:04Z

dc.date.issued

2015

dc.department

Computer Science

dc.description.abstract

Our media is saturated with claims of ``facts'' made from data. Database research has in the past focused on how to answer queries, but has not devoted much attention to discerning more subtle qualities of the resulting claims, e.g., is a claim ``cherry-picking''? This paper proposes a Query Response Surface (QRS) based framework that models claims based on structured data as parameterized queries. A key insight is that we can learn a lot about a claim by perturbing its parameters and seeing how its conclusion changes. This framework lets us formulate and tackle practical fact-checking tasks --- reverse-engineering vague claims, and countering questionable claims --- as computational problems. Within the QRS based framework, we take one step further, and propose a problem along with efficient algorithms for finding high-quality claims of a given form from data, i.e. raising good questions, in the first place. This is achieved to using a limited number of high-valued claims to represent high-valued regions of the QRS. Besides the general purpose high-quality claim finding problem, lead-finding can be tailored towards specific claim quality measures, also defined within the QRS framework. An example of uniqueness-based lead-finding is presented for ``one-of-the-few'' claims, landing in interpretable high-quality claims, and an adjustable mechanism for ranking objects, e.g. NBA players, based on what claims can be made for them. Finally, we study the use of visualization as a powerful way of conveying results of a large number of claims. An efficient two stage sampling algorithm is proposed for generating input of 2d scatter plot with heatmap, evalutaing a limited amount of data, while preserving the two essential visual features, namely outliers and clusters. For all the problems, we present real-world examples and experiments that demonstrate the power of our model, efficiency of our algorithms, and usefulness of their results.

dc.identifier.uri

https://hdl.handle.net/10161/10526

dc.subject

Computer science

dc.subject

Journalism

dc.subject

computational journalism

dc.subject

Fact checking

dc.subject

lead finding

dc.subject

Sensitivity analysis

dc.title

Computational Journalism: from Answering Question to Questioning Answers and Raising Good Questions

dc.type

Dissertation

duke.embargo.months

11

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Wu_duke_0066D_13115.pdf
Size:
3.68 MB
Format:
Adobe Portable Document Format

Collections