Should Learning Analytics Models Include Sensitive Attributes? Explaining the Why

dc.contributor.authorDeho, O.B.
dc.contributor.authorJoksimovic, S.
dc.contributor.authorLi, J.
dc.contributor.authorZhan, C.
dc.contributor.authorLiu, J.
dc.contributor.authorLiu, L.
dc.date.issued2023
dc.description.abstractMany educational institutions are using predictive models to leverage actionable insights using student data and drive student success. A common task has been predicting students at risk of dropping out for the necessary interventions to be made. However, issues of discrimination by these predictive models based on protected attributes of students have recently been raised. An important question that is constantly asked is: should the protected attributes be excluded from the learning analytics (LA) models in order to ensure fairness? In this article, we aimed at answering questions that if we exclude the protected attributes from the LA models, does the exclusion ensure fairness as it supposedly should? Does the exclusion affect the performance of the LA model? If so, why? We found answers to these questions and went further to explain why. We built machine learning models and performed empirical evaluations using a three-year dropout data for a particular program in a large Australian university. We found that excluding or including the protected attributes had marginal effect on predictive performance and fairness. Perhaps not surprisingly, our findings suggest that the effect of including or excluding protected attributes is a function of how they relate with the prediction outcome. More specifically, if a protected attribute is correlated with the target label and proves to be an important feature, then their inclusion or exclusion would have effect on the performance and fairness and vice versa. Our findings provide insightful information that can be used by relevant stakeholders to make well-informed decisions.
dc.description.statementofresponsibilityOscar Blessed Deho, Srecko Joksimovic, Jiuyong Li, Chen Zhan, Jixue Liu, and Lin Liu
dc.identifier.citationIEEE Transactions on Learning Technologies, 2023; 16(4):560-572
dc.identifier.doi10.1109/TLT.2022.3226474
dc.identifier.issn1939-1382
dc.identifier.issn1939-1382
dc.identifier.orcidZhan, C. [0000-0002-4794-8339]
dc.identifier.urihttps://hdl.handle.net/2440/143955
dc.language.isoen
dc.publisherInstitute of Electrical and Electronics Engineers
dc.relation.granthttp://purl.org/au-research/grants/arc/DP200101210
dc.rights© 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
dc.source.urihttps://doi.org/10.1109/tlt.2022.3226474
dc.subjectDropout prediction; explainable AI; fairness in education; learning analytics (LA)
dc.titleShould Learning Analytics Models Include Sensitive Attributes? Explaining the Why
dc.typeJournal article
pubs.publication-statusPublished

Files

Collections