Please use this identifier to cite or link to this item: https://ir.swu.ac.th/jspui/handle/123456789/12535
Full metadata record
DC FieldValueLanguage
dc.contributor.authorKunnakorntammanop S.
dc.contributor.authorThepwuttisathaphon N.
dc.contributor.authorThaicharoen S.
dc.date.accessioned2021-04-05T03:04:00Z-
dc.date.available2021-04-05T03:04:00Z-
dc.date.issued2019
dc.identifier.issn18650929
dc.identifier.other2-s2.0-85076109761
dc.identifier.urihttps://ir.swu.ac.th/jspui/handle/123456789/12535-
dc.identifier.urihttps://www.scopus.com/inward/record.uri?eid=2-s2.0-85076109761&doi=10.1007%2f978-981-15-0399-3_17&partnerID=40&md5=c7d44580309df72a12d696418c18ce98
dc.description.abstractMany real-world data are not only large in volume but also heterogeneous and fast generated. This type of data, known as big data, typically cannot be analyzed by using traditional software tools and techniques. Although an open-source software project, Apache Hadoop, has been successfully developed and used for handling big data, its setup and configuration complexity including its requirement to learn other additional related tools have hindered non-technical researchers and educators from actually entering the area of big data analytics. To support big-data community, this paper describes procedures and experiences gained from building a big data analytics framework, and demonstrates its usage on a popular case study, Twitter sentiment analysis. The framework comprises a cluster of four commodity computers run by Cloudera CDH 6.0.1 and RapidMiner Studio 9.3 with Text Processing, Hive Connector, and Radoop extensions. According to the study results, setting up a big data analytics framework on a cluster of computers does not require advanced computer knowledge but needs meticulous system configurations to satisfy system installation and software integration requirements. Once all setup and configurations are correctly done, data analysis can be readily performed using visual workflow designers provided by RapidMiner. Finally, the framework is further evaluated on a large data set of 185 million records, “TalkingData AdTracking Fraud Detection” data set. The outcome is very satisfied and proves that the framework is easy to use and can practically be deployed for big data analytics. © 2019, Springer Nature Singapore Pte Ltd.
dc.subjectBig data
dc.subjectComputer aided software engineering
dc.subjectData Analytics
dc.subjectOpen source software
dc.subjectOpen systems
dc.subjectSentiment analysis
dc.subjectSoft computing
dc.subjectApache Hadoop
dc.subjectCloudera CDH
dc.subjectComputer clusters
dc.subjectOpen source software projects
dc.subjectRapidMiner Radoop
dc.subjectSoftware integration
dc.subjectSoftware Tools and Techniques
dc.subjectSystem configurations
dc.subjectData handling
dc.titleAn Experience Report on Building a Big Data Analytics Framework Using Cloudera CDH and RapidMiner Radoop with a Cluster of Commodity Computers
dc.typeConference Paper
dc.rights.holderScopus
dc.identifier.bibliograpycitationCommunications in Computer and Information Science. Vol 1100, (2019), p.208-222
dc.identifier.doi10.1007/978-981-15-0399-3_17
Appears in Collections:Scopus 1983-2021

Files in This Item:
There are no files associated with this item.


Items in SWU repository are protected by copyright, with all rights reserved, unless otherwise indicated.