Data Analysis and Visualization using Python

Akkal Bahadu Bist akkalbist55@gmail.com Sikhar Tech Pvt. Ltd. (Data Scientist/AI Researcher)
$ sudo apt-get install python3.5
$ conda create -n ai python=3.5
$ source activate ai
$ conda install pandas matplotlib jupyter notebook scipy scikit nltk
$ source activate ai
$ conda install pandas matplotlib jupyter notebook scipy scikit nltk
$ jupyter noteboook
17k+ players, 70+ attributes extracted from the latest edition of FIFA. source: https://www.kaggle.com/thec03u5/fifa-18-demo-player-dataset, extended version

Dataset includes:
- Player personal attributes (Nationality, Club, Photo, Age, Value etc.)
- Player performance attributes (Overall, Potential, Aggression, Agility etc.)
- Player preferred position and ratings at all positions.
Attributes:
- Name: name of a player
- Age: age of a player
- Photo: picture of player
- Nationality: player nationality
- Flag
- Overall
- Potential
- Club: The international club for which the player plays
- Club Logo
- Value
- Wage
- Special
- Acceleration
- Aggression
- Agility
- Balance
- Ball control
- Composure
- Crossing
- Curve
- Dribbling
- Finishing
- Free kick accuracy
- GK diving
- GK handling
- GK kicking
- GK positioning
- GK reflexes
- Heading accuracy
- Interceptions
- Jumping
- Long passing
- Long shots
- Marking
- Penalties
- Positioning
- Reactions
- Short passing
- Shot power
- Sliding tackle
- Sprint speed
- Stamina
- Standing tackle
- Strength
- Vision
- Volleys
- CAM: Center Attacking Midfielder
- CB: Center Back
- CDM: Center Defensive Midfielder
- CF: Center Forward
- CM: Center Midfielder
- ID: Player's ID in FIFA18
- LAM: Left Attacking Midfielder
- LB: Left Back
- LCB: Left Center Back
- LCM: Left Center Midfielder
- LDM: Left Defensive Midfielder
- LF: Left Forward
- LM: Left Midfielder
- LS: Left Striker
- LW: Left Wing
- LWB: Left Wing Back
- Preferred Positions: Player's Preferred Position
- RAM: Right Attacking Midfielder
- RB: Right Back
- RCB: Right Center Back
- RCM: Right Center Midfielder
- RDM: Right Defensive Midfielder
- RF: Right Forward
- RM: Right Midfielder
- RS: Right Striker
- RW: Right Wing
- RWB: Right Wing Back
- ST: Striker
Import libraries¶
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from wordcloud import WordCloud
# import plotly modules
import plotly.offline as py
py.init_notebook_mode(connected=True)
import plotly.graph_objs as go
from IPython.display import display
%matplotlib inline
dataset = pd.read_csv('CompleteDataset.csv')
Quick Overview
display(dataset.head())
int_col = [
'Name',
'Age',
'Photo',
'Nationality',
'Overall',
'Potential',
'Club',
'Value',
'Wage',
'Preferred Positions'
]
dataset = pd.DataFrame(dataset, columns=int_col)
display(dataset.head())
# Drop NA values
dataset = dataset.dropna()
# Numeric columns of Value and Wage
def str2number(amount):
if amount[-1] == 'M':
return float(amount[1:-1])*1000000
elif amount[-1] == 'K':
return float(amount[1:-1])*1000
else:
return float(amount[1:])
dataset['ValueNum'] = dataset['Value'].apply(lambda x: str2number(x))
dataset['WageNum'] = dataset['Wage'].apply(lambda x: str2number(x))
display(dataset.head())
# Categorical columns of Value and Wage
max_value = float(dataset['ValueNum'].max() + 1)
max_wage = float(dataset['WageNum'].max() + 1)
print("Max value:", max_value, "Max_wage:", max_wage)
# Supporting function for creating category columns 'ValueCategory' and 'WageCategory'
def mappingAmount(x, max_amount):
for i in range(0, 10):
if x >= max_amount/10*i and x < max_amount/10*(i+1):
return i
dataset['ValueCategory'] = dataset['ValueNum'].apply(lambda x: mappingAmount(x, max_value))
dataset['WageCategory'] = dataset['WageNum'].apply(lambda x: mappingAmount(x, max_wage))
display(dataset.head())
# Add two categories 0 and 1 and inform if player value/wage is highier then mean value
mean_value = float(dataset["ValueNum"].mean())
mean_wage = float(dataset["WageNum"].mean())
# Supporting function for creating category columns 'OverMeanValue' and 'OverMeanWage'
def overValue(x, limit):
if x > limit:
return 1
else:
return 0
dataset['OverMeanValue'] = dataset['ValueNum'].apply(lambda x: overValue(x, mean_value))
dataset['OverMeanWage'] = dataset['WageNum'].apply(lambda x: overValue(x, mean_wage))
display(dataset.head())
# Potential points
dataset['PotentialPoints'] = dataset['Potential'] - dataset['Overall']
display(dataset.head())
# Preferred position
#To make things simpler we select first position from list as preferred
dataset['Position'] = dataset['Preferred Positions'].str.split().str[0]
dataset['PositionNum'] = dataset['Preferred Positions'].apply(lambda x: len(x.split()))
display(dataset.head())
# Continent
# Function matching continent to countries
def find_continent(x, continents_list):
# Iteration over
for key in continents_list:
if x in continents_list[key]:
return key
return np.NaN
dataset['Continent'] = dataset['Nationality'].apply(lambda x: find_continent(x, continents))
display(dataset.head())
Comments
Post a Comment