An Experience Report on Building a Big Data Analytics Framework Using Cloudera CDH and RapidMiner Radoop with a Cluster of Commodity Computers

Kunnakorntammanop S.; Thepwuttisathaphon N.; Thaicharoen S.

Please use this identifier to cite or link to this item: https://ir.swu.ac.th/jspui/handle/123456789/12535

Title:	An Experience Report on Building a Big Data Analytics Framework Using Cloudera CDH and RapidMiner Radoop with a Cluster of Commodity Computers
Authors:	Kunnakorntammanop S. Thepwuttisathaphon N. Thaicharoen S.
Keywords:	Big data Computer aided software engineering Data Analytics Open source software Open systems Sentiment analysis Soft computing Apache Hadoop Cloudera CDH Computer clusters Open source software projects RapidMiner Radoop Software integration Software Tools and Techniques System configurations Data handling
Issue Date:	2019
Abstract:	Many real-world data are not only large in volume but also heterogeneous and fast generated. This type of data, known as big data, typically cannot be analyzed by using traditional software tools and techniques. Although an open-source software project, Apache Hadoop, has been successfully developed and used for handling big data, its setup and configuration complexity including its requirement to learn other additional related tools have hindered non-technical researchers and educators from actually entering the area of big data analytics. To support big-data community, this paper describes procedures and experiences gained from building a big data analytics framework, and demonstrates its usage on a popular case study, Twitter sentiment analysis. The framework comprises a cluster of four commodity computers run by Cloudera CDH 6.0.1 and RapidMiner Studio 9.3 with Text Processing, Hive Connector, and Radoop extensions. According to the study results, setting up a big data analytics framework on a cluster of computers does not require advanced computer knowledge but needs meticulous system configurations to satisfy system installation and software integration requirements. Once all setup and configurations are correctly done, data analysis can be readily performed using visual workflow designers provided by RapidMiner. Finally, the framework is further evaluated on a large data set of 185 million records, “TalkingData AdTracking Fraud Detection” data set. The outcome is very satisfied and proves that the framework is easy to use and can practically be deployed for big data analytics. © 2019, Springer Nature Singapore Pte Ltd.
URI:	https://ir.swu.ac.th/jspui/handle/123456789/12535 https://www.scopus.com/inward/record.uri?eid=2-s2.0-85076109761&doi=10.1007%2f978-981-15-0399-3_17&partnerID=40&md5=c7d44580309df72a12d696418c18ce98
ISSN:	18650929
Appears in Collections:	Scopus 1983-2021

Files in This Item:

There are no files associated with this item.

Show full item record