Python For Finance: Algorithmic Trading
  • AI Chat
  • Code
  • Report
  • Beta
    Spinner

    Python For Finance: Algorithmic Trading

    This Python for Finance tutorial introduces you to algorithmic trading, and much more.

    Technology has become an asset in finance: financial institutions are now evolving to technology companies rather than only staying occupied with just the financial aspect: besides the fact that technology brings about innovation the speeds and can help to gain a competitive advantage, the rate and frequency of financial transactions, together with the large data volumes, makes that financial institutions’ attention for technology has increased over the years and that technology has indeed become the main enabler in finance.

    Among the hottest programming languages for finance, you’ll find R and Python, alongside languages such as C++, C#, and Java. In this tutorial, you’ll learn how to get started with Python for finance. The tutorial will cover the following:

    • The basics that you need to get started: for those who are new to finance, you’ll first learn more about the stocks and trading strategies, what time series data is and what you need to set up your workspace.
    • An introduction to time series data and some of the most common financial analyses, such as moving windows, volatility calculation, … with the Python package Pandas.
    • The development of a simple momentum strategy: you’ll first go through the development process step-by-step and start by formulating and coding up a simple algorithmic trading strategy.
    • Next, you’ll backtest the formulated trading strategy with Pandas, zipline and Quantopian.
    • Afterward, you’ll see how you can do optimizations to your strategy to make it perform better, and you’ll eventually evaluate your strategy’s performance and robustness.

    Download the Jupyter notebook of this tutorial here.

    Getting Started With Python for Finance


    Before you go into trading strategies, it’s a good idea to get the hang of the basics first. This first part of the tutorial will focus on explaining the Python basics that you need to get started. This does not mean, however, that you’ll start entirely from zero: you should have at least done DataCamp’s free Intro to Python for Data Science course, in which you learned how to work with Python lists, packages, and NumPy. Additionally, it is desired to already know the basics of Pandas, the popular Python data manipulation package, but this is no requirement.

    Then I would suggest you take DataCamp’s Intro to Python for Finance course to learn the basics of finance in Python. If you then want to apply your new 'Python for Data Science' skills to real-world financial data, consider taking the Importing and Managing Financial Data in Python course.

    Stocks & Trading

    When a company wants to grow and undertake new projects or expand, it can issue stocks to raise capital. A stock represents a share in the ownership of a company and is issued in return for money. Stocks are bought and sold: buyers and sellers trade existing, previously issued shares. The price at which stocks are sold can move independent of the company’s success: the prices instead reflect supply and demand. This means that whenever a stock is considered as ‘desirable’, due to success, popularity, … the stock price will go up.

    Note that stocks are not the same as bonds, which is when companies raise money through borrowing, either as a loan from a bank or by issuing debt.

    As you just read, buying and selling or trading is essential when you’re talking about stocks, but certainly not limited to it: trading is the act of buying or selling an asset, which could be financial security, like stock, a bond or a tangible product, such as gold or oil.

    Stock trading is then the process of the cash that is paid for the stocks is converted into a share in the ownership of a company, which can be converted back to cash by selling, and this all hopefully with a profit. Now, to achieve a profitable return, you either go long or short in markets: you either by shares thinking that the stock price will go up to sell at a higher price in the future, or you sell your stock, expecting that you can buy it back at a lower price and realize a profit. When you follow a fixed plan to go long or short in markets, you have a trading strategy.

    Developing a trading strategy is something that goes through a couple of phases, just like when you, for example, build machine learning models: you formulate a strategy and specify it in a form that you can test on your computer, you do some preliminary testing or backtesting, you optimize your strategy and lastly, you evaluate the performance and robustness of your strategy.

    Trading strategies are usually verified by backtesting: you reconstruct, with historical data, trades that would have occurred in the past using the rules that are defined with the strategy that you have developed. This way, you can get an idea of the effectiveness of your strategy, and you can use it as a starting point to optimize and improve your strategy before applying it to real markets. Of course, this all relies heavily on the underlying theory or belief that any strategy that has worked out well in the past will likely also work out well in the future, and, that any strategy that has performed poorly in the past will probably also do badly in the future.

    Time Series Data

    A time series is a sequence of numerical data points taken at successive equally spaced points in time. In investing, a time series tracks the movement of the chosen data points, such as the stock price, over a specified period of time with data points recorded at regular intervals. If you’re still in doubt about what this would exactly look like, take a look at the following example:

    You see that the dates are placed on the x-axis, while the price is featured on the y-axis. The “successive equally spaced points in time” in this case means that the days that are featured on the x-axis are 14 days apart: note the difference between 3/7/2005 and the next point, 3/31/2005, and 4/5/2005 and 4/19/2005.

    However, what you’ll often see when you’re working with stock data is not just two columns, that contain period and price observations, but most of the times, you’ll have five columns that contain observations of the period and the opening, high, low and closing prices of that period. This means that, if your period is set at a daily level, the observations for that day will give you an idea of the opening and closing price for that day and the extreme high and low price movement for a particular stock during that day.

    For now, you have a basic idea of the basic concepts that you need to know to go through this tutorial. These concepts will come back soon enough, and you’ll learn more about them later on in this tutorial.

    Setting Up The Workspace

    Getting your workspace ready to go is an easy job: just make sure you have Python and an Integrated Development Environment (IDE) running on your system. However, there are some ways in which you can get started that are maybe a little easier when you’re just starting out.

    Take for instance Anaconda, a high-performance distribution of Python and R and includes over 100 of the most popular Python, R and Scala packages for data science. Additionally, installing Anaconda will give you access to over 720 packages that can easily be installed with conda, our renowned package, dependency and environment manager, that is included in Anaconda. And, besides all that, you’ll get the Jupyter Notebook and Spyder IDE with it.

    That sounds like a good deal, right?

    You can install Anaconda from here and don’t forget to check out how to set up your Jupyter Notebook in DataCamp’s Jupyter Notebook Tutorial: The Definitive Guide.

    Of course, Anaconda is not your only option: you can also check out the Canopy Python distribution (which doesn’t come free), or try out the Quant Platform.

    The latter offers you a couple of additional advantages over using, for example, Jupyter or the Spyder IDE, since it provides you everything you need specifically to do financial analytics in your browser! With the Quant Platform, you’ll gain access to GUI-based Financial Engineering, interactive and Python-based financial analytics and your own Python-based analytics library. What’s more, you’ll also have access to a forum where you can discuss solutions or questions with peers!

    Python Basics For Finance: Pandas

    When you’re using Python for finance, you’ll often find yourself using the data manipulation package, Pandas. But also other packages such as NumPy, SciPy, Matplotlib,… will pass by once you start digging deeper.

    For now, let’s focus on Pandas and using it to analyze time series data. This section will explain how you can import data, explore and manipulate it with Pandas. On top of all of that, you’ll learn how you can perform common financial analyses on the data that you imported.

    Importing Financial Data Into Python

    The pandas-datareader package allows for reading in data from sources such as Google, World Bank,… If you want to have an updated list of the data sources that are made available with this function, go to the documentation. You used to be able to access data from Yahoo! Finance directly, but it has since been deprecated. To access Yahoo! Finance data, check out this video by Matt Macarty that shows a workaround. For this tutorial, you will use the package to read in data from Yahoo! Finance. Make sure to install the package first by installing the latest release version via pip with pip install pandas-datareader

    Tip: if you want to install the latest development version or if you experience any issues, you can read up on the installation instructions here.

    %%capture
    !pip install -r requirements.txt
    import pandas as pd
    import numpy as np
    import datetime
    import matplotlib.pyplot as plt
    import pandas_datareader as pdr
    import datetime 
    aapl = pdr.get_data_yahoo('AAPL', 
                              start=datetime.datetime(2006, 10, 1), 
                              end=datetime.datetime(2012, 1, 1))

    At this moment, there is a lot going on in the open-source community because of the changes to the Yahoo! Finance API. That's why you don't only use the pandas_datareader package, but also a custom fix fix_yahoo_finance to get your data:

    from pandas_datareader import data as pdr
    import fix_yahoo_finance
    
    aapl = pdr.get_data_yahoo('AAPL', 
                              start=datetime.datetime(2006, 10, 1), 
                              end=datetime.datetime(2012, 1, 1))
    aapl.head()

    Note that the Yahoo API endpoint has recently changed and that, if you want to already start working with the library on your own, you’ll need to install a temporary fix until the patch has been merged into the master branch to start pulling in data from Yahoo! Finance with pandas-datareader. Make sure to read up on the issue here before you start on your own!

    No worries, though, for this tutorial, the data has been loaded in for you so that you don’t face any issues while learning about finance in Python with Pandas.

    It’s wise to consider though that, even though pandas-datareader offers a lot of options to pull in data into Python, it isn’t the only package that you can use to pull in financial data: you can also make use of libraries such as Quandl, for example, to get data from Google Finance:

    The pandas_datareader offers a lot of possibilities to get financial data. If you don't want to make use of this package, however, you can also use Quandl to retrieve data:

    # import quandl 
    # aapl = quandl.get("WIKI/AAPL", start_date="2006-10-01", end_date="2012-01-01")
    # aapl.head()

    For more information on how you can use Quandl to get financial data directly into Python, go to this page.

    Lastly, if you’ve already been working in finance for a while, you’ll probably know that you most often use Excel also to manipulate your data. In such cases, you should know that you can integrate Python with Excel.

    Check out DataCamp’s Python Excel Tutorial: The Definitive Guide for more information.

    Working With Time Series Data

    The first thing that you want to do when you finally have the data in your workspace is getting your hands dirty. However, now that you’re working with time series data, this might not seem as straightforward, since your index now contains DateTime values.

    No worries, though! Let’s start step-by-step and explore the data first with some functions that you might already know if you have some prior programming experience with R or if you’ve previously worked with Pandas.

    Either way, you’ll see it’s pretty straightforward!

    As you saw in the code chunk above, you have used pandas_datareader to import data into your workspace. The resulting object aapl is a DataFrame, which is a 2-dimensional labeled data structure with columns of potentially different types. Now, one of the first things that you probably do when you have a regular DataFrame on your hands, is running the head() and tail() functions to take a peek at the first and the last rows of your DataFrame. Luckily, this doesn’t change when you’re working with time series data!

    Tip: also make sure to use the describe() function to get some useful summary statistics about your data.

    Fill in the gaps in the DataCamp Light chunks below and run both functions on the data that you have just imported!