Please use this identifier to cite or link to this item: https://ir.swu.ac.th/jspui/handle/123456789/12535
Title: An Experience Report on Building a Big Data Analytics Framework Using Cloudera CDH and RapidMiner Radoop with a Cluster of Commodity Computers
Authors: Kunnakorntammanop S.
Thepwuttisathaphon N.
Thaicharoen S.
Keywords: Big data
Computer aided software engineering
Data Analytics
Open source software
Open systems
Sentiment analysis
Soft computing
Apache Hadoop
Cloudera CDH
Computer clusters
Open source software projects
RapidMiner Radoop
Software integration
Software Tools and Techniques
System configurations
Data handling
Issue Date: 2019
Abstract: Many real-world data are not only large in volume but also heterogeneous and fast generated. This type of data, known as big data, typically cannot be analyzed by using traditional software tools and techniques. Although an open-source software project, Apache Hadoop, has been successfully developed and used for handling big data, its setup and configuration complexity including its requirement to learn other additional related tools have hindered non-technical researchers and educators from actually entering the area of big data analytics. To support big-data community, this paper describes procedures and experiences gained from building a big data analytics framework, and demonstrates its usage on a popular case study, Twitter sentiment analysis. The framework comprises a cluster of four commodity computers run by Cloudera CDH 6.0.1 and RapidMiner Studio 9.3 with Text Processing, Hive Connector, and Radoop extensions. According to the study results, setting up a big data analytics framework on a cluster of computers does not require advanced computer knowledge but needs meticulous system configurations to satisfy system installation and software integration requirements. Once all setup and configurations are correctly done, data analysis can be readily performed using visual workflow designers provided by RapidMiner. Finally, the framework is further evaluated on a large data set of 185 million records, “TalkingData AdTracking Fraud Detection” data set. The outcome is very satisfied and proves that the framework is easy to use and can practically be deployed for big data analytics. © 2019, Springer Nature Singapore Pte Ltd.
URI: https://ir.swu.ac.th/jspui/handle/123456789/12535
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85076109761&doi=10.1007%2f978-981-15-0399-3_17&partnerID=40&md5=c7d44580309df72a12d696418c18ce98
ISSN: 18650929
Appears in Collections:Scopus 1983-2021

Files in This Item:
There are no files associated with this item.


Items in SWU repository are protected by copyright, with all rights reserved, unless otherwise indicated.