Workspace
Víctor Manuel Barceló/

Python Data Science Toolbox (Part 1)

0
Beta
Spinner

Python Data Science Toolbox (Part 1)

Run the hidden code cell below to import the data used in this course.

Take Notes

Add notes about the concepts you've learned and code cells with code you want to keep.

Add your notes here

###### ITERATORS
# iter(iterable) -> iteretor
# print(*iterator) # prints all elements and closes the iterator

###### ITERABLE: ZIP
# zip() -> joins lists or tuples into an object that matches firsts, seconds, thirds, etc..
# since zip creates a value pair, when "unzipping" with * you can use:
# value1, value2 = zip(*my_zip)
# rs_dict = dict(zipped_lists)

###### LIST COMPREHENSIONS
# squares = [i**2 for i in range(10)]
# Create a 5 x 5 matrix using a list of lists: matrix
# matrix = [[col for col in range(5)] for row in range(5)]
# [num ** 2 if num % 2 == 0 else 0 for num in range(20)]
# >>>[0, 0, 4, 0, 16, 0 36, 0, 64, 0]

# nums_list2 = [*range(1,12,2)]       unpacks built-in iterable range and defines the list in one go, all odd from 1 to 11

###### DICT COMPREHENSIONS
# Create dict comprehension: new_fellowship
# new_fellowship = {member: len(member) for member in fellowship}

###### GENERATORS
# lannister = ['cersei', 'jaime', 'tywin', 'tyrion', 'joffrey']
# lengths = (len(person) for person in lannister)
###### GENERATOR FUNCTION
# def get_lengths(input_list):
#     """Generator function that yields the
#     length of the strings in input_list."""
# 
#     # Yield the length of a string
#     for person in input_list:
#         yield len(person)

###### DATA FRAMES filtering
# df_pop_ceb = df_urb_pop[df_urb_pop['CountryCode'] == 'CEB']

################### WRITING EFFICIENT PYTHON CODE ###################

###### TESTING AND OPTIMIZATION
# %timeit -r(runs) -n(loops) <one-liner>
# %%timeit -r -n <beginning of multiple lines>
#   <indexed line of loop or conditional>
#   <indexed line of loop or conditional>

# pip install line_profiler
# >>>%load_ext line_profiler
# >>>%lprun -f (function no '()') (function call)
# >>>?%lprun

# pip install memory_profiler
# >>>%load_ext memory_profiler
# from your_file(NO.py)) import you_function
# >>>%mprun -f (function no '()') (function call) MUST BE IN FILE and imported
# >>>?%mprun

###### COLLECTIONS
# from collections import Counter
###### SETS
# set() .union(), .difference(), .intersection(), .symmetric_difference()


###### ITERTOOLS
# from itertools import combinations
# Collect all possible combinations of 4 Pokémon directly into a list
# combos_4 = [*combinations(pokemon, 4)]

#top_3 = sorted(poke_list_np, key=lambda x: x[1], reverse=True)[:3]


###### PANDAS EFFICIENCIES (DATAFRAMES and series)
# df.info() can give similar results to R's str() "structure" method
# rangers_df.describe().transpose() similar to R's summary()

# len(DATAFRAME) -> # of rows

# .iterrows() -> tuple(index, pandas series)
# for row_tuple in team_wins_df.iterrows():
#     print(row_tuple[1]['Team'])

# .itertuples() -> namedtuplewhith fields accessible using attribute lookup i.e. :
# for now_namedtuple in team_wins_df.itertuples():
#     print(row_namedtuple)
# print(row_namedtuple.Index)
# print(row_namedtuple.Team) etc...

# df.apply(function, iertable?) 
# df.apply(lambda x: function, iterable?)

# df['column'].values -> a Numpy array capable of broadcasting
# win_perc_preds_np = predict_win_perc(baseball_df['RS'].values, baseball_df['RA'].values)
# baseball_df['WP_preds'] = win_perc_preds_np


# import inspect to .getdoc or .getargs for a function. Returns docstring.
# sphinx and pydoc automatically generate online documentation for you based off of you docstrings.

# Use the "stock('NVDA')" context manager
#with stock('NVDA') as nvda:
# Open "NVDA.txt" for writing as f_out
#  with open("NVDA.txt", "w") as f_out:
#    for _ in range(10):         #########################################
#      value = nvda.price()
#      print('Logging ${:.2f} for NVDA'.format(value))
#      f_out.write('{:.2f}\n'.format(value))


######################### DECORATORS #########################
#from functools import wraps      ###################################
#def print_before_and_after(func):
#  @wraps(func)    --> preserves wrapped functions metadata         ##################
#  def wrapper(*args, **kwargs):
#    print('Before {}'.format(func.__name__))
#    # Call the function being decorated with *args
#    func(*args, **kwargs)
#    print('After {}'.format(func.__name__))   ###############################
  # Return the nested function
#  return wrapper

#For decorators to accept args, you create a func that returns a decorater, 

#def returns(return_type):
#  def decorator(func):
#    def wrapper(*args, **kwargs):
#      result = func
#      assert type(result) == return_type
#      return func
#    return wrapper
#  return decorator

Explore Datasets

Use the DataFrame imported in the first cell to explore the data and practice your skills!

  • Write a function that takes a timestamp (see column timestamp_ms) and returns the text of any tweet published at that timestamp. Additionally, make it so that users can pass column names as flexible arguments (*args) so that the function can print out any other columns users want to see.
  • In a filter() call, write a lambda function to return tweets created on a Tuesday. Tip: look at the first three characters of the created_at column.
  • Make sure to add error handling on the functions you've created!
  • AI Chat
  • Code