Predicting Student Performance Using Discussion Forums' Participation Data

dc.contributor.advisor

Younes, Rabih

dc.contributor.author

Gray, McCullough Joseph

dc.date.accessioned

2024-06-06T13:50:06Z

dc.date.available

2024-06-06T13:50:06Z

dc.date.issued

2024

dc.department

Electrical and Computer Engineering

dc.description.abstract

A significant gap in education lies in the need for mechanisms that enable early detection of potentially at-risk students. Through access to an earlier prediction of student performance, instructors are given ample time to meet with and assist under-achieving students. As with any prediction modeling problem, there are many predictors to choose from when formulating a model. Previous related works have shown limited success in predicting course performance using students' personal and socioeconomic traits. Students learn by asking clarifying questions. Therefore, discussion boards have been a staple of learning at the university level for years.

This research aims to utilize participation in discussion forums to predict final student performance. Using students' course grades at roughly the halfway point in the term and various discussion forum predictors, our model predicts the students' final percentage score. Using the model's prediction, instructors can speak with at-risk students and discuss ways to improve. The student grades and discussion board participation datasets are gathered from graduate-level Electrical and Computer Engineering (ECE) courses at Duke University. Various classical machine learning models are explored, with random forest yielding the highest accuracy. This random forest model, trained on discussion forum participation data, surpasses other similarly trained state-of-the-art models.

Furthermore, related research attempts the classification problem of predicting what discrete letter grade a student will earn. This is not an accurate representation of a student's performance, and therefore, we attempt the regression problem of predicting the exact percentage a student will earn. A significant finding of this research is that our random forest model can predict student performance with an average error of approximately 2.3%. Additionally, our random forest model can generalize to a different graduate-level course and make performance predictions with an average error of 3.3%.

The final important finding is that a model including discussion board predictors outperforms another whose sole predictor is the students' halfway point grade. This indicates that discussion forums hold significant value in determining final performance. We envision that the knowledge from our findings and our optimal random forest model can enable instructors to identify and support potentially at-risk students preemptively.

dc.identifier.uri

https://hdl.handle.net/10161/31029

dc.rights.uri

https://creativecommons.org/licenses/by-nc-nd/4.0/

dc.subject

Artificial intelligence

dc.subject

Education

dc.subject

Computer engineering

dc.subject

At-risk Students

dc.subject

Discussion Forum

dc.subject

Ed Discussion

dc.subject

Machine Learning

dc.subject

Random Forest

dc.subject

Student Performance

dc.title

Predicting Student Performance Using Discussion Forums' Participation Data

dc.type

Master's thesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Gray_duke_0066N_17924.pdf
Size:
567.33 KB
Format:
Adobe Portable Document Format

Collections