DSpace Repository

Towards Smart Data Management of Scientific Literature: Addressing Polysemy and Aberrant Decoding in Author Names

Show simple item record

dc.contributor.author Khalid S.
dc.contributor.author Hassan S.-U.
dc.contributor.author Sitthi A.
dc.date.accessioned 2022-03-10T13:17:03Z
dc.date.available 2022-03-10T13:17:03Z
dc.date.issued 2021
dc.identifier.issn 22138684
dc.identifier.other 2-s2.0-85116411238
dc.identifier.uri https://ir.swu.ac.th/jspui/handle/123456789/17429
dc.identifier.uri https://www.scopus.com/inward/record.uri?eid=2-s2.0-85116411238&doi=10.1007%2f978-3-030-84311-3_40&partnerID=40&md5=7fe1025d2a742aa5c5805317a38f0282
dc.description.abstract In digital libraries, ambiguous author names may occur due to the existence of multiple authors with the same name (polysemes) or different name variations for the same author (synonyms). We attempt to disambiguate the authors of scientific publications based on the attributes, which define the similarity among the publications belonging to a unique author. We apply two supervised machine-learning approaches, namely Support Vector Machine and Naïve Bayes, for training the classifier with commonly available features in bibliography databases such as author affiliation, subject area, journal title, city, references, and keywords. We opt not to choose features like author contact details, such as phone numbers or email addresses, which are usually not available in bibliography databases. Furthermore, we test our model using an extremely ambiguous dataset, which consists of Chinese authors with identical names, affiliated with the same institute, and even having the same research area. The dataset is downloaded from Scopus containing 5180 publication records with nine different authors having a same name, i.e., “Zhang Wei,”. Our model shows a very encouraging accuracy of 95.68% using the Naïve Bayes classifier, and the Support Vector Machine is about 3% better with the polynomial kernel when deployed on our dataset. Overall, the implications of this research are not only limited to improving the data management systems for scholarly search systems, however other databases with name disambiguation may also benefit from the proposed technique. © 2021, The Author(s), under exclusive license to Springer Nature Switzerland AG.
dc.language en
dc.title Towards Smart Data Management of Scientific Literature: Addressing Polysemy and Aberrant Decoding in Author Names
dc.type Conference Paper
dc.rights.holder Scopus
dc.identifier.bibliograpycitation Springer Proceedings in Complexity. Vol , No. (2021), p.435-445
dc.identifier.doi 10.1007/978-3-030-84311-3_40


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse

My Account

Statistics