Beta
import pandas as pd
df= pd.read_csv('pokemon_data.csv')
#print (df.tail(3))
SQL (Importing a Dataframe and CSV file)
Unknown integration
DataFrameavailable as
df
variable
select *
from df
limit 100
This query is taking long to finish...Consider adding a LIMIT clause or switching to Query mode to preview the result.
Importing pandas and files (CSV, xlsx and txt file)
#another function of pandas is that you can insert xlsx files
df_xlsx = pd.read_excel('pokemon_data.xlsx')
df_xlsx
# lastly, you can add text files but you need to add a delimiter function '\t' to get columns as seen as in the CSV.
df_text=pd.read_csv('pokemon_data.txt', delimiter='\t')
df_text
Reading Data in Pandas
## Read headers
#df.columns
## Read a specific column
#df[['Name','Type 1', 'Speed']]
## Reach each row
#df.head()
#df.iloc[1]
#df.iloc[0:4]
##Read a specific location, integer based (R,C): Venusaur
#df.iloc[2,1]
##Read a specific location, non-integer based (R,C)
#df.loc[df['Type 1']]
##to specify for ALL outputs of columns/row use function:
#for index, row in df.iterrows():
#print(index, row[['Name','Type 1']])
for index, row in df.iterrows():
print(index, row['Name'])
##Read a specific location, non-integer based (R,C)
df.head()
#conditional statements, for this location function, I am only interested in values where there is Fire in Type 1
df.loc[df['Type 1'] =='Grass']
Sorting and describing the data
##quick stats (count, mean, std, min, 25%, 50%, 75%, max)
df.describe()
##I am only interested in a specific column ('Name') and I would like them descending. In order to get them descending, I need to attribute a False statement to ascending
##descending values for 'Name' column
#df.sort_values('Name', ascending=False)
##ascending values for 'Name' column
df.sort_values('Name', ascending=True)
Making changes to our data (adding columns/ rearranging and dropping columns)
## helps visualize our headers/columns
#df.head(5)
##add a column
#df['Total']= df['HP'] + df['Attack']+ df['Sp. Atk']
#df.head(5)
##summing multiple columns to create a new column
#axis=1: add horizontally, axis=0: add vertically
df['Total']= df.iloc[:,4:10].sum(axis=1)
#df.head()
##double checking our values are good - always check!
#45+49+49+65+65+45 = 318, but columns 4:9 = 273!
#you need to include the next column for column 9's value to count
##re-arranging columns
cols=list(df.columns.values)
df_new=df[cols[0:4]+ [cols[-1]]+ cols[4:12]]
df_new
##dropping columns
#df_drop=df.drop(columns='Attack')
#df_drop.head()
Saving our Data (CSV, Excel, TXT)
##saving as a CSV file
##index=False gets rid of the index column
df_new.to_csv('modified.csv', index=False)
##saving to excel file
df_new.to_excel('modified.xlsx', index=False)
df_new.to_csv('modified.txt', index=False)