R Markdown: Integrating A Reproducible Analysis Tool into Introductory Statistics

Abstract

Nolan and Temple Lang argue that “the ability to express statistical computations is an es- sential skill.” A key related capacity is the ability to conduct and present data analysis in a way that another person can understand and replicate. The copy-and-paste workflow that is an artifact of antiquated user-interface design makes reproducibility of statistical analysis more difficult, especially as data become increasingly complex and statistical methods become increasingly sophisticated. R Markdown is a new technology that makes creating fully-reproducible statistical analysis simple and painless. It provides a solution suitable not only for cutting edge research, but also for use in an introductory statistics course. We present experiential and statistical evidence that R Markdown can be used effectively in introductory statistics courses, and discuss its role in the rapidly-changing world of statistical computation.

Department

Description

Provenance

Subjects

Citation

Scholars@Duke

Cetinkaya-Rundel

Mine Cetinkaya-Rundel

Professor of the Practice of Statistical Science

I am a Professor of the Practice and the Director of Undergraduate Studies at the Department of Statistical Science and an affiliated faculty in the Computational Media, Arts, and Cultures program at Duke University. My work focuses on innovation in statistics and data science pedagogy, with an emphasis on computing, reproducible research, student-centered learning, and open-source education. I work on integrating computation into the undergraduate statistics curriculum, using reproducible research methodologies and analysis of real and complex datasets.

I am an educator who is passionate about meeting learners where they are and understanding how they learn so that I can build better resources, pedagogy, and tooling to support their learning. My main teaching and research interest is statistics and data science education, particularly using R. I have been at Duke University since 2011 and I had a brief stint at the University of Edinburgh in 2019-2021. Prior to Duke, I received my PhD in Statistics at UCLA in 2011, under the advisement of Jan de Leeuw, and my BS in Actuarial Science at NYU’s Stern School of Business in 2004. In between undergraduate and graduate degrees, I worked as a consulting actuary for two years in New York.

Statistical Science at Duke

You can find out everything you need to know about majoring in Statistical Science at Duke here. If you would like to meet to discuss degree options in the department, you can book a time to meet with me here or send an email to dus@stat.duke.edu.

Statistics and data science education

I primarily work on developing open-educational resources and software for modern statistics and data science education as well as pedagogies for enhancing the student experience in data science and statistics courses. I also work on research projects that aim to assess the effectiveness of these approaches with respect to learning and retention. My computing language of choice is R, though I’m always interested in learning about how educators teaching different languages approach the same challenges. At any given point I have numerous projects active in this area. If you’re a student wanting to work with me or a potential collaborator, I’d love to hear from you.

Open educational resources

I believe in building open-source, open-access resources for education. I have co-authored four open-source statistics textbooks as part of the OpenIntro project at the introductory college and advanced high school level. I am also the creator and maintainer of Data Science in a Box and I have been developing and teaching various massive open online courses, including the popular Statistics with R specialization on Coursera. Materials for all courses and workshops I’ve taught are also openly licensed. You can find them on my teaching page.

ASA DataFest

I co-lead the international effort for putting on ASA DataFest, a two-day competition in which teams of undergraduate students work to reveal insights into a rich and complex data set, annually at over fifty institutions across the globe.

Consulting and training

I enjoy working with research and industry teams on solving challenges (particularly those related to R) and providing training. Previous talks and workshops I’ve delivered can be found here and here, respectively. If you’re interested in setting up a consulting or a training session with me, send me an email here.


Unless otherwise indicated, scholarly articles published by Duke faculty members are made available here with a CC-BY-NC (Creative Commons Attribution Non-Commercial) license, as enabled by the Duke Open Access Policy. If you wish to use the materials in ways not already permitted under CC-BY-NC, please consult the copyright owner. Other materials are made available here through the author’s grant of a non-exclusive license to make their work openly accessible.