Big Data Final project
Final Project
Analysis on Million songs
Analysis on Million Songs Data set:
Summer 2016 – Big Data Final project
The goal of this project assignment is to gain experience in building applications using
Please build the Hadoop cluster with more than one instance on any linux flavored platform.
– Download the million songs meta data from the below repository and load the same into HDFS.
One good source for download ur data set:
- https://drive.google.com/open?id=0B4qvMVe-iB-eWGI1X29FNDYwVXc
Please filter the input data by letter which has assigned to you while reading the input source and use the corresponding data set for the project work.
Eg: If the letter ‘K’ has assigned to you then consider the input data where 2nd column value starts with letter ‘k’.
In this assignment my letter is ‘k’.
The first row has corresponding column names in the spreadsheet.
Submit JPS Output, ifconfig output, Cluster Details & Total number of files count in HDFS.
Once the data is loaded successfully into HDFS, please submit the below analytical metrics usingHive, Map reduce or Pig latin.
- Analyze the Duration of Songs for each year.
Submit the calculated results data and also corresponding bar graph / pie chart.
- Analyze on no of songs which ending with same last digit of their digital ID.
Submit the calculated results data and also corresponding bar graph / pie chart.
- Analyze on number of artists by the first letter of their name OR
Analyze the familiarity of song for each year.
Submit the calculated results data and also corresponding bar graph / pie chart.
- Analyze on range of tempo or loudness for each year.
Submit the calculated results data and also corresponding bar graph / pie chart.
- Analyze on songs with same key value.
Submit the calculated results data.
Use the following coupon code :
BEST22