An Experience Report on Building a Big Data Analytics Framework Using Cloudera CDH and RapidMiner Radoop with a Cluster of Commodity Computers

Kunnakorntammanop S.; Thepwuttisathaphon N.; Thaicharoen S.

Please use this identifier to cite or link to this item: https://ir.swu.ac.th/jspui/handle/123456789/12535

Full metadata record

DC Field	Value	Language
dc.contributor.author	Kunnakorntammanop S.
dc.contributor.author	Thepwuttisathaphon N.
dc.contributor.author	Thaicharoen S.
dc.date.accessioned	2021-04-05T03:04:00Z	-
dc.date.available	2021-04-05T03:04:00Z	-
dc.date.issued	2019
dc.identifier.issn	18650929
dc.identifier.other	2-s2.0-85076109761
dc.identifier.uri	https://ir.swu.ac.th/jspui/handle/123456789/12535	-
dc.identifier.uri	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85076109761&doi=10.1007%2f978-981-15-0399-3_17&partnerID=40&md5=c7d44580309df72a12d696418c18ce98
dc.description.abstract	Many real-world data are not only large in volume but also heterogeneous and fast generated. This type of data, known as big data, typically cannot be analyzed by using traditional software tools and techniques. Although an open-source software project, Apache Hadoop, has been successfully developed and used for handling big data, its setup and configuration complexity including its requirement to learn other additional related tools have hindered non-technical researchers and educators from actually entering the area of big data analytics. To support big-data community, this paper describes procedures and experiences gained from building a big data analytics framework, and demonstrates its usage on a popular case study, Twitter sentiment analysis. The framework comprises a cluster of four commodity computers run by Cloudera CDH 6.0.1 and RapidMiner Studio 9.3 with Text Processing, Hive Connector, and Radoop extensions. According to the study results, setting up a big data analytics framework on a cluster of computers does not require advanced computer knowledge but needs meticulous system configurations to satisfy system installation and software integration requirements. Once all setup and configurations are correctly done, data analysis can be readily performed using visual workflow designers provided by RapidMiner. Finally, the framework is further evaluated on a large data set of 185 million records, “TalkingData AdTracking Fraud Detection” data set. The outcome is very satisfied and proves that the framework is easy to use and can practically be deployed for big data analytics. © 2019, Springer Nature Singapore Pte Ltd.
dc.subject	Big data
dc.subject	Computer aided software engineering
dc.subject	Data Analytics
dc.subject	Open source software
dc.subject	Open systems
dc.subject	Sentiment analysis
dc.subject	Soft computing
dc.subject	Apache Hadoop
dc.subject	Cloudera CDH
dc.subject	Computer clusters
dc.subject	Open source software projects
dc.subject	RapidMiner Radoop
dc.subject	Software integration
dc.subject	Software Tools and Techniques
dc.subject	System configurations
dc.subject	Data handling
dc.title	An Experience Report on Building a Big Data Analytics Framework Using Cloudera CDH and RapidMiner Radoop with a Cluster of Commodity Computers
dc.type	Conference Paper
dc.rights.holder	Scopus
dc.identifier.bibliograpycitation	Communications in Computer and Information Science. Vol 1100, (2019), p.208-222
dc.identifier.doi	10.1007/978-981-15-0399-3_17
Appears in Collections:	Scopus 1983-2021

Files in This Item:

There are no files associated with this item.

Show simple item record