Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/44079
Citations
Scopus Web of Science® Altmetric
?
?
Full metadata record
DC FieldValueLanguage
dc.contributor.authorAllison, A.-
dc.contributor.authorPearce, C.-
dc.contributor.authorAbbott, D.-
dc.contributor.editorKertesz, J.-
dc.contributor.editorBornholdt, S.-
dc.contributor.editorMantegna, R.N.-
dc.date.issued2007-
dc.identifier.citationNoise and Stochastics in Complex Systems and Finance, Florence / János Kertész, Stefan Bornholdt, Rosario N. Mantegna (eds.):660113-1-660113-12-
dc.identifier.isbn0819467383-
dc.identifier.isbn9780819467386-
dc.identifier.issn0277-786X-
dc.identifier.urihttp://hdl.handle.net/2440/44079-
dc.descriptionCopyright © 2007 SPIE - The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only. Copyright 2007 Society of Photo-Optical Instrumentation Engineers. This paper was published in Noise and Stochastics in Complex Systems and Finance, edited by János Kertész, Stefan Bornholdt, Rosario N. Mantegna, Proc. of SPIE Vol. 6601, 660113 and is made available as an electronic reprint with permission of SPIE. One print or electronic copy may be made for personal use only. Systematic or multiple reproduction, distribution to multiple locations via electronic or other means, application of any material in this paper for a fee or for commercial purposes, or modification of the content of the paper are prohibited.-
dc.description.abstractThe amount of text stored on the Internet, and in our libraries, continues to expand at an exponential rate. There is a great practical need to locate relevant content. This requires quick automated methods for classifying textual information, according to subject. We propose a quick statistical approach, which can distinguish between 'keywords' and 'noisewords', like 'the' and 'a', without the need to parse the text into its parts of speech. Our classification is based on an F-statistic, which compares the observed Word Recurrence Interval (WRI) with a simple null hypothesis. We also propose a model to account for the observed distribution of WRI statistics and we subject this model to a number of tests.-
dc.description.statementofresponsibilityAndrew G. Allison, Charles E. M. Pearce and Derek Abbott-
dc.language.isoen-
dc.publisherSPIE-
dc.relation.ispartofseriesProceedings of SPIE ; 660113-
dc.source.urihttp://dx.doi.org/10.1117/12.724655-
dc.titleFinding keywords amongst noise: Automatic text classification without parsing-
dc.typeConference paper-
dc.contributor.conferenceSPIE: Noise and Stochastics in Complex Systems and Finance (2007 : Florence, Italy)-
dc.identifier.doi10.1117/12.724655-
dc.publisher.placewww-
pubs.publication-statusPublished-
dc.identifier.orcidAllison, A. [0000-0003-3865-511X]-
dc.identifier.orcidAbbott, D. [0000-0002-0945-2674]-
Appears in Collections:Aurora harvest 6
Electrical and Electronic Engineering publications

Files in This Item:
File Description SizeFormat 
hdl_44079.pdf753.36 kBPublisher's PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.