An Exploratory Data Analysis of the Forbes Richest 50 - 2016!
Forbes Richest 50 - 2016
Gopinath Subbegowda
November 28, 2017
Note : This analysis is for Educational purpose only and hence prone to mistakes in authentisity and/or correctness of data and interpretation.
This document is created using R Markdown
Dynamic Documents for R : http://rmarkdown.rstudio.com/
The Forbes 50 richest people by country
Recently when I was browsing the net for datasets, I came accross this link -
forbesListR - R Package
This is the package created by Alex Bresler which provides an R wrapper for the Forbes list API.
forbesListR - An easy way to access the data contained in lists maintained by the fine folks at Forbes in R
Using this package, I downloaded the Richest 50 people in each of these Asian countries also including Australia and Africa - China, India, Japan, Korea and Africa. Though I could get the richest 2016 list for all countries, for some reason, I did not get 2016 list for Africa. Hence I make do with 2015 list just for the sake of reference along with the other countries.
## Classes 'tbl_df', 'tbl' and 'data.frame': 300 obs. of 6 variables:
## $ rank : Factor w/ 50 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ net_worth.millions: num 8800 8500 6900 5000 3600 3500 3200 2800 1950 1900 ...
## $ age : int 67 63 84 87 57 50 83 80 61 87 ...
## $ gender : Factor w/ 2 levels "F","M": 1 1 2 2 2 2 2 2 2 2 ...
## $ source_industry : chr "media" "mining" "property" "shopping malls" ...
## $ country : chr "Australia" "Australia" "Australia" "Australia" ...
Exploratory Data Analysis (EDA)
I tried various plots for the Exploratory Data Analysis of the Forbes 50 Richest Asians dataset .
## rank net_worth.millions age gender
## 13 : 8 Min. : 330 Min. :36.0 F: 26
## 27 : 8 1st Qu.: 950 1st Qu.:54.0 M:274
## 17 : 7 Median : 2200 Median :65.0
## 31 : 7 Mean : 3707 Mean :64.2
## 39 : 7 3rd Qu.: 4825 3rd Qu.:72.0
## 42 : 7 Max. :33000 Max. :95.0
## (Other):256 NA's :18
## source_industry country
## Length:300 Length:300
## Class :character Class :character
## Mode :character Mode :character
##
##
##
##
As we can see above, there are two Factor variables, two numerical and two character variables.
Frankly, till I concluded this post, I had little idea of the importance of why the graphical assistance in terms of visual files and graphs are so essential for the Data Analysis. The visuals enhance the analysis and help unearth most of the insights otherwise not thought of.
Visualizations :
GGPLOT2 Package - Star of visual data exploration
Here we can clearly analyse the relation between the “age” and the “net_worh” (net_worth.millions) and little detailed analysis will unearth the trends with age among the richest.
Another GGPLOT2 visualization - Is money “Manly” or “Motherly” !!?? :-)
Now the richest and the Gender gap!
## Warning: Removed 18 rows containing non-finite values (stat_smooth).
## Warning: Removed 18 rows containing missing values (geom_point).
Here is a density plot using qplot; gives us an idea of the density spread of riches within individual countries. Second one below is the same density plot with color fill and GGPLOT2.
Here is a Dot plot using qplot with the color representation for countries and the Dot Size representing the relative size of the riches.
We can clearly see the ‘Reach of the Riches!!’ and in that China’s rich have a long/deep reach!!
We can also see India is jsut behind to reach out to the Top!!
## Warning: Removed 18 rows containing missing values (geom_point).
The next one is the density distribution of the riches among the Genders!
The next visualization gives us an window view (Grid View) of the richest with a seperate Grid for each country dipicting Gender gap with color !
I will extend this EDA study to the regression analysis of and predictive modelling using the data from previous years .
Please feel free to post your comments for improvement of this study. Thanks in advance.
I can be reached at - gopinath.subbegowda@gmail.com
Phone : +91 9945057234
Comments
Post a Comment