Data Analysis and Visualization using Python

Akkal Bahadu Bist akkalbist55@gmail.com Sikhar Tech Pvt. Ltd. (Data Scientist/AI Researcher)

Dependencies

Install Python

$ sudo apt-get install python3.5

Install Anaconda

https://conda.io/docs/user-guide/install/index.html
https://www.anaconda.com/download/

Create an environment “ai” with all must-have libraries.

$ conda create -n ai python=3.5
$ source activate ai
$ conda install pandas matplotlib jupyter notebook scipy scikit nltk

Run jupyter notebook

$ jupyter noteboook

FIFA 2018 Worldcup Player Dataset

17k+ players, 70+ attributes extracted from the latest edition of FIFA. source: https://www.kaggle.com/thec03u5/fifa-18-demo-player-dataset, extended version

Dataset includes:

Player personal attributes (Nationality, Club, Photo, Age, Value etc.)
Player performance attributes (Overall, Potential, Aggression, Agility etc.)
Player preferred position and ratings at all positions.

Attributes:

Name: name of a player
Age: age of a player
Photo: picture of player
Nationality: player nationality
Flag
Overall
Potential
Club: The international club for which the player plays
Club Logo
Value
Wage
Special
Acceleration
Aggression
Agility
Balance
Ball control
Composure
Crossing
Curve
Dribbling
Finishing
Free kick accuracy
GK diving
GK handling
GK kicking
GK positioning
GK reflexes
Heading accuracy
Interceptions
Jumping
Long passing
Long shots
Marking
Penalties
Positioning
Reactions
Short passing
Shot power
Sliding tackle
Sprint speed
Stamina
Standing tackle
Strength

Vision
Volleys
CAM: Center Attacking Midfielder
CB: Center Back
CDM: Center Defensive Midfielder
CF: Center Forward
CM: Center Midfielder
ID: Player's ID in FIFA18
LAM: Left Attacking Midfielder
LB: Left Back
LCB: Left Center Back
LCM: Left Center Midfielder
LDM: Left Defensive Midfielder
LF: Left Forward
LM: Left Midfielder
LS: Left Striker
LW: Left Wing
LWB: Left Wing Back
Preferred Positions: Player's Preferred Position
RAM: Right Attacking Midfielder
RB: Right Back
RCB: Right Center Back
RCM: Right Center Midfielder
RDM: Right Defensive Midfielder
RF: Right Forward
RM: Right Midfielder
RS: Right Striker
RW: Right Wing
RWB: Right Wing Back
ST: Striker

Import libraries¶

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from wordcloud import WordCloud

# import plotly modules
import plotly.offline as py
py.init_notebook_mode(connected=True)
import plotly.graph_objs as go

from IPython.display import display

%matplotlib inline

Read dataset

dataset = pd.read_csv('CompleteDataset.csv')

/home/akkal/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py:2785: DtypeWarning:

Columns (23,35) have mixed types. Specify dtype option on import or set low_memory=False.

Quick Overview

display(dataset.head())

	Unnamed: 0	Name	Age	Photo	Nationality	Flag	Overall	Potential	Club	Club Logo	...	RB	RCB	RCM	RDM	RF	RM	RS	RW	RWB	ST
0	0	Cristiano Ronaldo	32	https://cdn.sofifa.org/48/18/players/20801.png	Portugal	https://cdn.sofifa.org/flags/38.png	94	94	Real Madrid CF	https://cdn.sofifa.org/24/18/teams/243.png	...	61.0	53.0	82.0	62.0	91.0	89.0	92.0	91.0	66.0	92.0
1	1	L. Messi	30	https://cdn.sofifa.org/48/18/players/158023.png	Argentina	https://cdn.sofifa.org/flags/52.png	93	93	FC Barcelona	https://cdn.sofifa.org/24/18/teams/241.png	...	57.0	45.0	84.0	59.0	92.0	90.0	88.0	91.0	62.0	88.0
2	2	Neymar	25	https://cdn.sofifa.org/48/18/players/190871.png	Brazil	https://cdn.sofifa.org/flags/54.png	92	94	Paris Saint-Germain	https://cdn.sofifa.org/24/18/teams/73.png	...	59.0	46.0	79.0	59.0	88.0	87.0	84.0	89.0	64.0	84.0
3	3	L. Suárez	30	https://cdn.sofifa.org/48/18/players/176580.png	Uruguay	https://cdn.sofifa.org/flags/60.png	92	92	FC Barcelona	https://cdn.sofifa.org/24/18/teams/241.png	...	64.0	58.0	80.0	65.0	88.0	85.0	88.0	87.0	68.0	88.0
4	4	M. Neuer	31	https://cdn.sofifa.org/48/18/players/167495.png	Germany	https://cdn.sofifa.org/flags/21.png	92	92	FC Bayern Munich	https://cdn.sofifa.org/24/18/teams/21.png	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

5 rows × 75 columns

Select interested columns only¶

int_col = [
    'Name', 
    'Age', 
    'Photo', 
    'Nationality', 
    'Overall', 
    'Potential', 
    'Club', 
    'Value', 
    'Wage', 
    'Preferred Positions'
]

dataset = pd.DataFrame(dataset, columns=int_col)

display(dataset.head())

	Name	Age	Photo	Nationality	Overall	Potential	Club	Value	Wage	Preferred Positions
0	Cristiano Ronaldo	32	https://cdn.sofifa.org/48/18/players/20801.png	Portugal	94	94	Real Madrid CF	€95.5M	€565K	ST LW
1	L. Messi	30	https://cdn.sofifa.org/48/18/players/158023.png	Argentina	93	93	FC Barcelona	€105M	€565K	RW
2	Neymar	25	https://cdn.sofifa.org/48/18/players/190871.png	Brazil	92	94	Paris Saint-Germain	€123M	€280K	LW
3	L. Suárez	30	https://cdn.sofifa.org/48/18/players/176580.png	Uruguay	92	92	FC Barcelona	€97M	€510K	ST
4	M. Neuer	31	https://cdn.sofifa.org/48/18/players/167495.png	Germany	92	92	FC Bayern Munich	€61M	€230K	GK

Preprocessing

# Drop NA values
dataset = dataset.dropna()

# Numeric columns of Value and Wage
def str2number(amount):
    if amount[-1] == 'M':
        return float(amount[1:-1])*1000000
    elif amount[-1] == 'K':
        return float(amount[1:-1])*1000
    else:
        return float(amount[1:])
    
dataset['ValueNum'] = dataset['Value'].apply(lambda x: str2number(x))
dataset['WageNum'] = dataset['Wage'].apply(lambda x: str2number(x))

display(dataset.head())

	Name	Age	Photo	Nationality	Overall	Potential	Club	Value	Wage	Preferred Positions	ValueNum	WageNum
0	Cristiano Ronaldo	32	https://cdn.sofifa.org/48/18/players/20801.png	Portugal	94	94	Real Madrid CF	€95.5M	€565K	ST LW	95500000.0	565000.0
1	L. Messi	30	https://cdn.sofifa.org/48/18/players/158023.png	Argentina	93	93	FC Barcelona	€105M	€565K	RW	105000000.0	565000.0
2	Neymar	25	https://cdn.sofifa.org/48/18/players/190871.png	Brazil	92	94	Paris Saint-Germain	€123M	€280K	LW	123000000.0	280000.0
3	L. Suárez	30	https://cdn.sofifa.org/48/18/players/176580.png	Uruguay	92	92	FC Barcelona	€97M	€510K	ST	97000000.0	510000.0
4	M. Neuer	31	https://cdn.sofifa.org/48/18/players/167495.png	Germany	92	92	FC Bayern Munich	€61M	€230K	GK	61000000.0	230000.0

# Categorical columns of Value and Wage
max_value = float(dataset['ValueNum'].max() + 1)
max_wage = float(dataset['WageNum'].max() + 1)

print("Max value:", max_value, "Max_wage:", max_wage)

# Supporting function for creating category columns 'ValueCategory' and 'WageCategory'
def mappingAmount(x, max_amount):
    for i in range(0, 10):
        if x >= max_amount/10*i and x < max_amount/10*(i+1):
            return i
        
dataset['ValueCategory'] = dataset['ValueNum'].apply(lambda x: mappingAmount(x, max_value))
dataset['WageCategory'] = dataset['WageNum'].apply(lambda x: mappingAmount(x, max_wage))

Max value: 123000001.0 Max_wage: 565001.0

display(dataset.head())

	Name	Age	Photo	Nationality	Overall	Potential	Club	Value	Wage	Preferred Positions	ValueNum	WageNum	ValueCategory	WageCategory
0	Cristiano Ronaldo	32	https://cdn.sofifa.org/48/18/players/20801.png	Portugal	94	94	Real Madrid CF	€95.5M	€565K	ST LW	95500000.0	565000.0	7	9
1	L. Messi	30	https://cdn.sofifa.org/48/18/players/158023.png	Argentina	93	93	FC Barcelona	€105M	€565K	RW	105000000.0	565000.0	8	9
2	Neymar	25	https://cdn.sofifa.org/48/18/players/190871.png	Brazil	92	94	Paris Saint-Germain	€123M	€280K	LW	123000000.0	280000.0	9	4
3	L. Suárez	30	https://cdn.sofifa.org/48/18/players/176580.png	Uruguay	92	92	FC Barcelona	€97M	€510K	ST	97000000.0	510000.0	7	9
4	M. Neuer	31	https://cdn.sofifa.org/48/18/players/167495.png	Germany	92	92	FC Bayern Munich	€61M	€230K	GK	61000000.0	230000.0	4	4

# Add two categories 0 and 1 and inform if player value/wage is highier then mean value
mean_value = float(dataset["ValueNum"].mean())
mean_wage = float(dataset["WageNum"].mean())

# Supporting function for creating category columns 'OverMeanValue' and 'OverMeanWage'
def overValue(x, limit):
    if x > limit:
        return 1
    else:
        return 0
    
dataset['OverMeanValue'] = dataset['ValueNum'].apply(lambda x: overValue(x, mean_value))
dataset['OverMeanWage'] = dataset['WageNum'].apply(lambda x: overValue(x, mean_wage))

display(dataset.head())

	Name	Age	Photo	Nationality	Overall	Potential	Club	Value	Wage	Preferred Positions	ValueNum	WageNum	ValueCategory	WageCategory	OverMeanValue	OverMeanWage
0	Cristiano Ronaldo	32	https://cdn.sofifa.org/48/18/players/20801.png	Portugal	94	94	Real Madrid CF	€95.5M	€565K	ST LW	95500000.0	565000.0	7	9	1	1
1	L. Messi	30	https://cdn.sofifa.org/48/18/players/158023.png	Argentina	93	93	FC Barcelona	€105M	€565K	RW	105000000.0	565000.0	8	9	1	1
2	Neymar	25	https://cdn.sofifa.org/48/18/players/190871.png	Brazil	92	94	Paris Saint-Germain	€123M	€280K	LW	123000000.0	280000.0	9	4	1	1
3	L. Suárez	30	https://cdn.sofifa.org/48/18/players/176580.png	Uruguay	92	92	FC Barcelona	€97M	€510K	ST	97000000.0	510000.0	7	9	1	1
4	M. Neuer	31	https://cdn.sofifa.org/48/18/players/167495.png	Germany	92	92	FC Bayern Munich	€61M	€230K	GK	61000000.0	230000.0	4	4	1	1

# Potential points
dataset['PotentialPoints'] = dataset['Potential'] - dataset['Overall']
display(dataset.head())

	Name	Age	Photo	Nationality	Overall	Potential	Club	Value	Wage	Preferred Positions	ValueNum	WageNum	ValueCategory	WageCategory	OverMeanValue	OverMeanWage	PotentialPoints
0	Cristiano Ronaldo	32	https://cdn.sofifa.org/48/18/players/20801.png	Portugal	94	94	Real Madrid CF	€95.5M	€565K	ST LW	95500000.0	565000.0	7	9	1	1	0
1	L. Messi	30	https://cdn.sofifa.org/48/18/players/158023.png	Argentina	93	93	FC Barcelona	€105M	€565K	RW	105000000.0	565000.0	8	9	1	1	0
2	Neymar	25	https://cdn.sofifa.org/48/18/players/190871.png	Brazil	92	94	Paris Saint-Germain	€123M	€280K	LW	123000000.0	280000.0	9	4	1	1	2
3	L. Suárez	30	https://cdn.sofifa.org/48/18/players/176580.png	Uruguay	92	92	FC Barcelona	€97M	€510K	ST	97000000.0	510000.0	7	9	1	1	0
4	M. Neuer	31	https://cdn.sofifa.org/48/18/players/167495.png	Germany	92	92	FC Bayern Munich	€61M	€230K	GK	61000000.0	230000.0	4	4	1	1	0

# Preferred position
#To make things simpler we select first position from list as preferred

dataset['Position'] = dataset['Preferred Positions'].str.split().str[0]
dataset['PositionNum'] = dataset['Preferred Positions'].apply(lambda x: len(x.split()))

display(dataset.head())

	Name	Age	Photo	Nationality	Overall	Potential	Club	Value	Wage	Preferred Positions	ValueNum	WageNum	ValueCategory	WageCategory	OverMeanValue	OverMeanWage	PotentialPoints	Position	PositionNum
0	Cristiano Ronaldo	32	https://cdn.sofifa.org/48/18/players/20801.png	Portugal	94	94	Real Madrid CF	€95.5M	€565K	ST LW	95500000.0	565000.0	7	9	1	1	0	ST	2
1	L. Messi	30	https://cdn.sofifa.org/48/18/players/158023.png	Argentina	93	93	FC Barcelona	€105M	€565K	RW	105000000.0	565000.0	8	9	1	1	0	RW	1
2	Neymar	25	https://cdn.sofifa.org/48/18/players/190871.png	Brazil	92	94	Paris Saint-Germain	€123M	€280K	LW	123000000.0	280000.0	9	4	1	1	2	LW	1
3	L. Suárez	30	https://cdn.sofifa.org/48/18/players/176580.png	Uruguay	92	92	FC Barcelona	€97M	€510K	ST	97000000.0	510000.0	7	9	1	1	0	ST	1
4	M. Neuer	31	https://cdn.sofifa.org/48/18/players/167495.png	Germany	92	92	FC Bayern Munich	€61M	€230K	GK	61000000.0	230000.0	4	4	1	1	0	GK	1

# Continent

# Function matching continent to countries
def find_continent(x, continents_list):
    # Iteration over 
    for key in continents_list:
        if x in continents_list[key]:
            return key
    return np.NaN

dataset['Continent'] = dataset['Nationality'].apply(lambda x: find_continent(x, continents))
display(dataset.head())

	Name	Age	Photo	Nationality	Overall	Potential	Club	Value	Wage	Preferred Positions	ValueNum	WageNum	ValueCategory	WageCategory	OverMeanValue	OverMeanWage	PotentialPoints	Position	PositionNum	Continent
0	Cristiano Ronaldo	32	https://cdn.sofifa.org/48/18/players/20801.png	Portugal	94	94	Real Madrid CF	€95.5M	€565K	ST LW	95500000.0	565000.0	7	9	1	1	0	ST	2	Europe
1	L. Messi	30	https://cdn.sofifa.org/48/18/players/158023.png	Argentina	93	93	FC Barcelona	€105M	€565K	RW	105000000.0	565000.0	8	9	1	1	0	RW	1	South America
2	Neymar	25	https://cdn.sofifa.org/48/18/players/190871.png	Brazil	92	94	Paris Saint-Germain	€123M	€280K	LW	123000000.0	280000.0	9	4	1	1	2	LW	1	South America
3	L. Suárez	30	https://cdn.sofifa.org/48/18/players/176580.png	Uruguay	92	92	FC Barcelona	€97M	€510K	ST	97000000.0	510000.0	7	9	1	1	0	ST	1	South America
4	M. Neuer	31	https://cdn.sofifa.org/48/18/players/167495.png	Germany	92	92	FC Bayern Munich	€61M	€230K	GK	61000000.0	230000.0	4	4	1	1	0	GK	1	Europe

Datasetset Visualization Age/Overall/Potential

Grouping players by Nationality - D3 chart

Referece: Zoomable Circle Packing via D3.js in IPython

Next Continue ....=>

Search This Blog

Akkal Bahadur Bist

Data Analysis and Visualization using Python

Data Analysis and Visualization using Python

Dependencies

Install Python

Install Anaconda

Create an environment “ai” with all must-have libraries.

Run jupyter notebook

FIFA 2018 Worldcup Player Dataset

Import libraries¶

Read dataset

Quick Overview

Select interested columns only¶

Preprocessing

Datasetset Visualization Age/Overall/Potential

Grouping players by Nationality - D3 chart

Comments

Post a Comment

Popular Posts

Computer Fundamental Part-1

Multiple-group Chord diagram in R circlize packages