For a while I have been wanting to brush up my skills in programming.
When I signed up to university back in 2005, the indication I had got from my parents and the outside world was that computer could not be a serious profession. Programming could have at most been a hobby, mostly associated to playing videogames. The world has changed though: nowadays – in the internet era – programming skills have gained a prominent importance in the labour market. The availability of data has increased exponentially and the ability to efficiently connect, process and transmit data can give one a competitive edge on the workplace. Programming is an essential part of it.
The first question I asked myself was what programming language to choose. A (high level) programming language is essentially a set of logical instructions that is organized in a humanly readable syntax, and is passed by a compiler onto the computer in pure machine language.
I had some experience at high school with Pascal, and with Visual Basic for Applications (the programming language for Excel) at a previous job. However, I wanted to try it out with a pure Object Oriented and more versatile language.
Why Python?
I decided that my entry into programming would be with a popular, widely adopted general programming language. For a beginner, I think it is important to start with learning a popular language for essentially 4 reasons:
- There are more learning resources available, on the web or in paper format
- A larger set of sample code is available publicly on the internet (for example, on the public code repository github.com),
- Experienced programmers are more likely to implement pre-coded libraries of code
- There is more help available on coders’ forums (I like stackoverflow.com for example)
To gauge coding languages’ popularity and trends, i looked at the TIOBE index below, which ranks the programming languages by number of search engines hits. Moreover, I looked at github code statistics on githut.info.
Ultimately, my choice was to start learning Python. It ranks 5th by popularity on the TIOBE index, and third by number of repositories on github. Moreover, I had gotten good feedback about Python from a friend.
At inception, I had a limited awareness about the quality of my choice. It was – really – a bit of a blind dive. However, as time passes and as I better understand Python features, I think it was a really good pick.
Python’s key qualities for a beginner
Python as an interesting story to begin with: as Wikipedia reports, Guido Van Rossum, a Dutch programmer, took a Christmas holiday in December 1989 and he was looking for a “hobby” programming project to keep himself occupied. Guido chose Python as a working title for the project, being in a slightly irreverent mood (and a big fan of Monty Python’s Flying Circus). The rest is history.
Also, I appreciated greatly that Python is free and open-source and has a community-based development model.
Aside from an interesting story and an inspiring open-source philosophy, however, I think Python’s key strengths as a first language to learn lie in its architecture an breadth of adoption. Note that the below list is based on my (yet) limited knowledge of software programming, and apologies if I make mistakes along the way.
-
Python is dynamically-typed and has a run-time type checking capability.
Broadly speaking, a programming language can be either statically or dynamically typed. With static typing, you have to declare the type of your variables in advance. You need to tell the machine to expect a string of characters or a number. In Python, on the contrary, dynamic typing means that the compiler recognizes automatically the type of your variable at assignment.
I have heard some programmers saying that this is actually a weakness of the language, especially when the project codebase becomes very large: indeed, static typing allows catching early bugs in the code. Nonetheless, I love the fact that one does not have to waste time with type declaration in Python: the variable type is automatically recognised by the compiler, and some functions even handle cross-types compatibility!
For example, in Python there is no need whatsoever to write
>>> int x >>> x = 3
…instead, the easier
>>> x = 3
will suffice.
Also, as I mentioned already, several embedded functions in the language can recognise the type of a variable and handle it dynamically. For example, the multiplication function will work on both integers…
>>> x = 3 #this is an integer >>> x * 2 3
…and string types.
>>> x = "spam" #this is a string >>> x * 2 spamspam
-
Python forces good coding style.
Python is one of the few programming languages that strictly imposes code indentation by blocks of logic. This feature is extremely helpful for a beginner: even on the crappiest text editor, a Python file comes out properly styled, making it easier to read and follow its inherent logic. The compiler actually returns an error message if the code is not indented properly!
For example, while the below is valid Python code…
>>> x = 8 >>> y = 7 >>> if x > y: >>> print(x) #properly indented 8
…the below will return a compiler error, because the instruction below the if statement is not properly indented:
>>> x = 8 >>> y = 7 >>> if x > y: >>> print(x) IndentationError: expected an indented block
Nonetheless, code indentation requires some caution, as it may trigger unwanted effects in the programme. For example, the script below prints 8 because – while the first print call is validly indented – the second call is not indented. Thereby, it escapes the if logic.
>>> x = 7 >>> y = 8 >>> if x > y: >>> print(x) >>> print(y) 8 #prints 8, is that what you wanted?
-
Python has good cross-platform portability
I am running code on both Windows and Linux. Therefore, I need a programming language that can smoothly execute my code across these operating systems that do not share the same internal architecture. Python ticks this box: not only the core code, but also external libraries are built with cross-platform compatibility always in mind. While Java or Javascript may be a better choice in terms of portability, Python gives satisfactory results too. As a proof of cross-platform compatibility on Linux and Windows, I have run some tests on the same (simple) scripts that I have created but that contained three external libraries. The tests are done using continuous integration tools like Travis and Appveyor, that ship your code simultaneously on many platform environments and return the tests results. As you can see below, the same piece of code works in 4 versions of Python and on the 2 machines (Windows and Linux) seamlessly.
Graphical libraries are as well very portable in Python: PyQt and Tkinter are great libraries that allow a programmer to build portable Graphical User Interfaces.
-
Python has a highly expressive syntax and powerful iteration tools
In a programming language, there is a lot of iterations across data, and logic embedded in “if/then/else” control statements. Python’s expressiveness means that we can replicate logic in a syntax which is very similar to actual plain English language. Moreover, a single statement can often encapsulate a large amount of logic (low verbosity).
Have a look at the example below: you have two sets of numbers and you want to combine them (unless the couples have the same number). A more traditional way of expressing this in language code is something along the below lines:
>>> combs = [] >>> for x in [1,2,3]: ... for y in [3,1,4]: ... if x != y: ... combs.append((x, y)) ... >>> combs [(1, 3), (1, 4), (2, 3), (2, 1), (2, 4), (3, 1), (3, 4)]
… or a total of six statements (two nested for and one if). In Python, however, one can use the so-called list and dictionary comprehension syntax to obtain the same result nicely and clearly, using just one single statement, like the one below:
>>> [(x, y) for x in [1,2,3] for y in [3,1,4] if x != y] [(1, 3), (1, 4), (2, 3), (2, 1), (2, 4), (3, 1), (3, 4)]
-
Python is highly modular and favours code reuse.
Once you write a function or an object in Python, this can be re-used (yes, forever!) in other files with a single statement. This is called “modular programming”: every file (but also every object) is a module whose functionalities and logic can be imported into other programmes. In an ideal Python world, I would never need to code a custom function twice but could always re-use the module I wrote earlier.
Modular programming also facilitates community collaboration. Python comes with a package manager, called pip that allows me to download Python code from the internet so that I can embed the modules in my programmes. Even with little programming knowledge, I can access a vast array of tools that is not part of the core Python but that was written for Python by the vast community of users.
Let’s see with a very simplistic example how modular programming works.
I first create a new file, massi.py:
my_var = 8
I then import it in the compile and that way I can access its objects and functions:
>>> import massi >>> print(massi.my_var) 8
Import access can also be structured to target a single object only. In the example below, I only import my_var from the massi module.
>>> from massi import my_var >>> my_var 8
I can also change the name of the imported object as I please. Obviously my_var will not change its name in my original file, but only for the current compiler session:
>>> from massi import my_var as another_name >>> another_name 8
-
Python is very used in quantitative finance and for data analysis
Python is very widely adopted in the field of quantitative finance and algorithmic trading, partly for the many reasons I listed above. Again, it seems that adoption triggers a positive feedback loop. The more people adopt the language, the more resources are made available for that language, the easier it is for beginners to learn, and so on… After all, a programming language is just like a normal spoken language. If no-one else spoke English apart from me, why would I learn English? 🙂
Python’s built-in data-types (especially lists and tuples) are already well-fitted for data analysis. In addition to that, there are brilliant data analysis and mathematical libraries that can enhance Python’s capabilities and make it as good for data science as R and Matlab, that however are not general purpose programming languages. Some major Python libraries for data manipulation are scikit-learn (good and getting better for Machine Learning), pandas, numpy, matplotlib and scipy. These libraries are also extensively documented: it takes some time to learn them and fully harness their power, but it is an investment worth making.
In your computer console do:
c:\> pip install numpy c:\> python Python 3.5.2 (default, Jul 5 2016, 12:43:10) [GCC 5.4.0 20160609] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import numpy
and you will be able to access all the functionalities of the numpy library. The caveat is that when you ship your code (unless you don’t freeze it) the user will also need to have the libraries installed on his/her machine in order to be able to run it.
The web also offers Python distributions that pre-bundle all these scientific libraries for data analysis, and provide additional packages for optional installation. Anaconda claims to be the leading open data science platform powered by Python, and contains over 100 of the most popular Python, R and Scala packages for data science. A key part of the Anaconda Python distribution is Spyder, an interactive development environment that includes a code editor. Additionally, one can have access to over 720 packages that can be installed with conda, Anaconda’s package manager. Anaconda is also fairly easy to install and setup and that’s why it’s preferred by many programmers.
-
API Libraries are widely available for Python
So we have seen that Python is a capable data processing and analysis programming language. However, the data needs to be sourced somewhere. A powerful tool for sourcing data from financial markets are APIs (Application Programming Interface). Through an API, one can connect to static or dynamic data streaming by a broker or a data provider. The availability of an API library makes the whole process particularly easy.
Python API libraries are available for all major brokers and data providers. This is for example how I am able to connect to my Forex streaming API provider by my broker OANDA. I rely on a Python API wrapper called oandapy. Using oandapy, I am able to interrogate OANDA’s servers and download/stream in the local environment of my pc the information I require:
c:\> pip install oandapy c:\> python >>> import oandapy >>> import os >>> token = os.environ.get('OANDA_ACCESS_TOKEN', None) >>> oanda = oandapy.API(environment="live", access_token=token)
This is just an example. There are tons of APIs available in the web, and again tons of Python API wrapper. And not only for financial data… See for example the Twitter API and its related Python wrapper Tweepy.
So… how do I actually learn Python?
Good questions. Five things are necessary in my opinion:
- The latest version of Python at the time of writing, Python 3.5.2. No point of going back to the 2.x releases, the language has evolved and it will be useful for you as a beginner to start with a brand new version rather than with the old ones
- A good text editor, with syntax highlighting and – possibly – autocompletion features. I rely on Sublime Text and Atom.
- A good manual. I am actually not a big fan on online courses or so-called codeacademies. I would suggest you do it the traditional way: arm yourself with patience, buy a good book and follow it line by line, trying to understand the examples, replicate them autonomously and play with them, adding basic functionalities to the code presented in the text. The must-have books are – in my opinion:
- an introductory one called Learning Python, 5th Edition – deals with introduction to data types, object oriented programming and basic logic construction
- An intermediate/advance one titled Programming Python, 4th Edition – deals with database, interprocess communication, Graphical User Interface, network programming
- Hitchhiker’s Guide to Python is written by the community and – as a living book – deals with the most common beginners’ queries and it is kept up to date on the latest advancements of the technology
- Automate the boring stuff with Python is an entertaining booklet that deals with Python automation of daily tasks in an office environment
- The full language reference documentation – this will be useful once you have acquired the basic knowledge
My first Python projects
As I learn the language and delve deeper in its capabilities, I have already started to work on some projects that are likely to keep my mornings, evenings and weekends busy for the near future. In case anyone is interested to cooperate, please drop me an email at info@massimilianoterzi.it. These projects are:
- Connection and data-processing to my brokers’ APIs. I have already managed two nice automated email reminders from my VPS for Kraken and Oanda.
- Working on a small algorithmic robo-trading
- Working on time-series data analysis
- Working on machine learning techniques for image processing
- …and more to follow
Keep on following my blog for further updates and have a nice Python discovery.