Data analysis and Rhadoop : case studies

The era of “Big Data” is here. Rapid growth in big data and application of analytical algorithms has created massive opportunities for data scientists. From Facebook to small business organizations, everyone is relying on big data for their business forecast. With data firmly in hand and with the ability given by Big Data Technologies to effectively store and analyze this data, we can predict and work to optimize every aspect of our behavior. Amazon can know every book we have ever bought or viewed by analyzing big data gathered over the years. With the advent of many digital modalities all this data has grown to BIG data and is still on the rise. Ultimately Big Data technologies can exist to improve decision making and to provide greater insights faster when needed but with the downside of loss of data privacy. This project has two phases. In initial phase, the main goal is to understand how RHADOOP works for data analysis. Hadoop was working fine with its own java based environment, but we needed more flexibility and data analysis capability. Arising from this constraint, the requirement of something new has emerged. Data analysts are using R for data analysis, and the use of R is increasing rapidly. Our intention here is clear, to utilize the power of Hadoop and R for Data analysis and combining the power of both technologies, the solution is RHadoop. It uses special packages like rmr2, rhdfs, plyrmr and rhbase for accessing HDFS files and mapreducing jobs. The second phase is the investigation of mapreduce job over RHADOOP. Mapreduce job contains mapper and reducer functions. In some case studies, we have used both functions to understand how they are compatiable with each other. At the beginning, system assigns an id for overall mapreduce job, and then handles mapper and reducer respectively. Mapreduce job processing approach is using in most of the big data analysis, so it is highly important to deal with it. Hadoop and R is a natural match and are quite complementary in terms of visualization and analysis of big data. This work mainly focuses on RHADOOP and its operational features for data analysis.

Data analysis and Rhadoop : case studies

RAHMAN, KH.EHSANUR

2015/2016

Abstract

The era of “Big Data” is here. Rapid growth in big data and application of analytical algorithms has created massive opportunities for data scientists. From Facebook to small business organizations, everyone is relying on big data for their business forecast. With data firmly in hand and with the ability given by Big Data Technologies to effectively store and analyze this data, we can predict and work to optimize every aspect of our behavior. Amazon can know every book we have ever bought or viewed by analyzing big data gathered over the years. With the advent of many digital modalities all this data has grown to BIG data and is still on the rise. Ultimately Big Data technologies can exist to improve decision making and to provide greater insights faster when needed but with the downside of loss of data privacy. This project has two phases. In initial phase, the main goal is to understand how RHADOOP works for data analysis. Hadoop was working fine with its own java based environment, but we needed more flexibility and data analysis capability. Arising from this constraint, the requirement of something new has emerged. Data analysts are using R for data analysis, and the use of R is increasing rapidly. Our intention here is clear, to utilize the power of Hadoop and R for Data analysis and combining the power of both technologies, the solution is RHadoop. It uses special packages like rmr2, rhdfs, plyrmr and rhbase for accessing HDFS files and mapreducing jobs. The second phase is the investigation of mapreduce job over RHADOOP. Mapreduce job contains mapper and reducer functions. In some case studies, we have used both functions to understand how they are compatiable with each other. At the beginning, system assigns an id for overall mapreduce job, and then handles mapper and reducer respectively. Mapreduce job processing approach is using in most of the big data analysis, so it is highly important to deal with it. Hadoop and R is a natural match and are quite complementary in terms of visualization and analysis of big data. This work mainly focuses on RHADOOP and its operational features for data analysis.

Scheda breve

Scheda completa

	Relatore
	
				GRIBAUDO, MARCO
			
	Scuola / Dip.
	
				ING  - Scuola di Ingegneria Industriale e dell'Informazione
			
	Data
	
				27-lug-2016
			
	Anno accademico
	
				2015/2016
			
	Tipo di documento
	
				Tesi di laurea Magistrale
			
	Appare nelle tipologie:
	
				Tesi di laurea Magistrale

File allegati

File	Dimensione	Formato
Kh.Ehsanur Rahman_833342.pdf accessibile in internet per tutti Descrizione: DATA ANALYSIS AND RHADOOP: CASE STUDIES Dimensione 2.57 MB Formato Adobe PDF Visualizza/Apri	2.57 MB	Adobe PDF	Visualizza/Apri

I documenti in POLITesi sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10589/123768