Skripsi
ANALISA BIG DATA PADA CLUSTER KOMPUTER MENGGUNAKAN KOMPUTASI TERDISTRIBUSI
Along with the development of the era of globalization, the use of technology has been very widespread in various industrial sectors, so data accumulates in a very fast time to grow into large-scale data called big data. The emergence of big data makes the formulation of optimization problems more complicated, because of the large volume and complexity of the data, therefore it is necessary to implement a parallel and distributed computer cluster architecture. There are several methods that support parallelization and computing systems to perform data processing such as MPI (Message Processing Interface), OpenMP (Open Multi Processing), Hadoop, Spark, and others. In the context of big data, many data structures in big data become more complex, high dimensions, and large sizes. This study utilizes the parallelization system of the Apache Spark framework system which is used as a medium to conduct distributed computer clusters to carry out big data processing. The results of this study showed that the distributed cluster system on spark effectively read big data, in the wordcount experiment on 31,788,324 rows of data, spark was faster with a time difference of 84.6 seconds. The performance produced in the spark library, MLlib, to conduct machine learning classification experiments and recommendation system to carry out advanced big data processing, the performance produced in the classification model gets the best value with an accuracy of 94.95%, F1-score 95%, recall 95.18%, and precision 94.77% of the 6 models used, while for the recommendation system with Algorithm ALS (Alternating Least Squares) got an RMSE score of 0.46 from 5 experiments with different tune parameters.
Inventory Code | Barcode | Call Number | Location | Status |
---|---|---|---|---|
2307006728 | T130876 | T1308762023 | Central Library (Referens) | Available |
No other version available