An Exploratory Data Analysis of the Forbes Richest 50 - 2016!

Forbes Richest 50 - 2016

Note : This analysis is for Educational purpose only and hence prone to mistakes in authentisity and/or correctness of data and interpretation.

This document is created using R Markdown

Dynamic Documents for R : http://rmarkdown.rstudio.com/

The Forbes 50 richest people by country

Recently when I was browsing the net for datasets, I came accross this link -

https://github.com/gopinathsubbegowda/forbesListR

forbesListR - R Package

This is the package created by Alex Bresler which provides an R wrapper for the Forbes list API.

forbesListR - An easy way to access the data contained in lists maintained by the fine folks at Forbes in R

Using this package, I downloaded the Richest 50 people in each of these Asian countries also including Australia and Africa - China, India, Japan, Korea and Africa. Though I could get the richest 2016 list for all countries, for some reason, I did not get 2016 list for Africa. Hence I make do with 2015 list just for the sake of reference along with the other countries.

## Classes 'tbl_df', 'tbl' and 'data.frame':    300 obs. of  6 variables:
##  $ rank              : Factor w/ 50 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ net_worth.millions: num  8800 8500 6900 5000 3600 3500 3200 2800 1950 1900 ...
##  $ age               : int  67 63 84 87 57 50 83 80 61 87 ...
##  $ gender            : Factor w/ 2 levels "F","M": 1 1 2 2 2 2 2 2 2 2 ...
##  $ source_industry   : chr  "media" "mining" "property" "shopping malls" ...
##  $ country           : chr  "Australia" "Australia" "Australia" "Australia" ...

Exploratory Data Analysis (EDA)

I tried various plots for the Exploratory Data Analysis of the Forbes 50 Richest Asians dataset .

##       rank     net_worth.millions      age       gender 
##  13     :  8   Min.   :  330      Min.   :36.0   F: 26  
##  27     :  8   1st Qu.:  950      1st Qu.:54.0   M:274  
##  17     :  7   Median : 2200      Median :65.0          
##  31     :  7   Mean   : 3707      Mean   :64.2          
##  39     :  7   3rd Qu.: 4825      3rd Qu.:72.0          
##  42     :  7   Max.   :33000      Max.   :95.0          
##  (Other):256                      NA's   :18            
##  source_industry      country         
##  Length:300         Length:300        
##  Class :character   Class :character  
##  Mode  :character   Mode  :character  
##                                       
##                                       
##                                       
## 

As we can see above, there are two Factor variables, two numerical and two character variables.

Frankly, till I concluded this post, I had little idea of the importance of why the graphical assistance in terms of visual files and graphs are so essential for the Data Analysis. The visuals enhance the analysis and help unearth most of the insights otherwise not thought of.

Visualizations :

GGPLOT2 Package - Star of visual data exploration

Here we can clearly analyse the relation between the “age” and the “net_worh” (net_worth.millions) and little detailed analysis will unearth the trends with age among the richest.

Another GGPLOT2 visualization - Is money “Manly” or “Motherly” !!?? :-)

Now the richest and the Gender gap!

## Warning: Removed 18 rows containing non-finite values (stat_smooth).
## Warning: Removed 18 rows containing missing values (geom_point).

Here is a density plot using qplot; gives us an idea of the density spread of riches within individual countries. Second one below is the same density plot with color fill and GGPLOT2.

Here is a Dot plot using qplot with the color representation for countries and the Dot Size representing the relative size of the riches.

We can clearly see the ‘Reach of the Riches!!’ and in that China’s rich have a long/deep reach!!

We can also see India is jsut behind to reach out to the Top!!

## Warning: Removed 18 rows containing missing values (geom_point).

The next one is the density distribution of the riches among the Genders!

The next visualization gives us an window view (Grid View) of the richest with a seperate Grid for each country dipicting Gender gap with color !

I will extend this EDA study to the regression analysis of and predictive modelling using the data from previous years .


Please feel free to post your comments for improvement of this study. Thanks in advance.

I can be reached at - gopinath.subbegowda@gmail.com

Phone : +91 9945057234


Comments

Popular Posts