Please use this identifier to cite or link to this item:
https://hdl.handle.net/2440/129742
Citations | ||
Scopus | Web of Science® | Altmetric |
---|---|---|
?
|
?
|
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Quiroz, J.C. | - |
dc.contributor.author | Laranjo, L. | - |
dc.contributor.author | Tufanaru, C. | - |
dc.contributor.author | Kocaballi, A.B. | - |
dc.contributor.author | Rezazadegan, D. | - |
dc.contributor.author | Berkovsky, S. | - |
dc.contributor.author | Coiera, E. | - |
dc.date.issued | 2020 | - |
dc.identifier.citation | International Journal of Medical Informatics, 2020; 145:104324-1-104324-9 | - |
dc.identifier.issn | 1386-5056 | - |
dc.identifier.issn | 1872-8243 | - |
dc.identifier.uri | http://hdl.handle.net/2440/129742 | - |
dc.description.abstract | BACKGROUND: Bayesian modelling and statistical text analysis rely on informed probability priors to encourage good solutions. OBJECTIVE: This paper empirically analyses whether text in medical discharge reports follow Zipf's law, a commonly assumed statistical property of language where word frequency follows a discrete power-law distribution. METHOD: We examined 20,000 medical discharge reports from the MIMIC-III dataset. Methods included splitting the discharge reports into tokens, counting token frequency, fitting power-law distributions to the data, and testing whether alternative distributions-lognormal, exponential, stretched exponential, and truncated power-law-provided superior fits to the data. RESULT: Discharge reports are best fit by the truncated power-law and lognormal distributions. Discharge reports appear to be near-Zipfian by having the truncated power-law provide superior fits over a pure power-law. CONCLUSION: Our findings suggest that Bayesian modelling and statistical text analysis of discharge report text would benefit from using truncated power-law and lognormal probability priors and non-parametric models that capture power-law behavior. | - |
dc.description.statementofresponsibility | Juan C Quiroz, Liliana Laranjo, Catalin Tufanaru, Ahmet BakiKocaballi, Dana Rezazadegan, Shlomo Berkovsky, Enrico Coiera | - |
dc.language.iso | en | - |
dc.publisher | Elsevier | - |
dc.rights | © 2020 Elsevier B.V. All rights reserved. | - |
dc.source.uri | http://dx.doi.org/10.1016/j.ijmedinf.2020.104324 | - |
dc.subject | Data mining | - |
dc.subject | MIMIC-III dataset | - |
dc.subject | Machine learning | - |
dc.subject | Maximum likelihood estimation | - |
dc.subject | Power-law with exponential cut-off | - |
dc.subject | Statistical distributions | - |
dc.title | Empirical analysis of Zipf's law, power law, and lognormal distributions in medical discharge reports | - |
dc.type | Journal article | - |
dc.identifier.doi | 10.1016/j.ijmedinf.2020.104324 | - |
dc.relation.grant | http://purl.org/au-research/grants/nhmrc/1134919 | - |
pubs.publication-status | Published | - |
dc.identifier.orcid | Tufanaru, C. [0000-0002-3457-8770] | - |
Appears in Collections: | Aurora harvest 4 Computer Science publications |
Files in This Item:
There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.