Introduction V - Introduction to Python - I
Contents
Introduction V - Introduction to Python - I¶
Before we get started…¶
We’re going to be exploring the very basics of the Python programming language in this session, as an interface we’ll continue to rely on Jupyter Notebooks
.
If you’ve followed the Setup and installed conda and jupyter, you can simply open a notebook yourself by either:
A. Opening the Anaconda application and selecting the Jupyter Notebooks tile
B. Or opening a terminal/shell type jupyter notebook
and hit enter
. If you’re not automatically directed to a webpage copy the URL (https://....
) printed in the terminal
and paste it in your browser
Note on interactive Mode¶
As this website is build on Jupyter Notebooks you can click also on the small rocket at the top of this website, select Live code
(and wait a bit) and this site will become interactive.
Following you can try to run the code cells
below, by clicking on the “run” button, that appears beneath them.
Some functionality of this notebooks can’t be demonstrated using the live code implementation you can therefore either download the course content or open this notebook via Binder, i.e. by clicking on the rocket and select Binder
. This will open an online Jupyter-lab session where you can find this notebook by follow the folder strcuture that will be opened on the right hand side via lecture
-> content
and click on intro_jupyter.ipynb
.
Goals📍¶
learn basic and efficient usage of the python programming language
what is python & how to utilize it
building blocks of & operations in python
Roadmap¶
What we will do in this section of the course is a short introduction to Python
to help beginners get familiar with this programming language
.
It is divided into the following parts:
What is Python?¶
Python is a programming language
It’s free and open source
Specifically, it’s a widely used/very flexible, high-level, general-purpose, dynamic programming language
That’s a mouthful! Let’s explore each of these points in more detail.
Widely-used¶
Python is the fastest-growing major programming language
Top 3 overall (with JavaScript, Java)
The incredible growth of python, by David Robinson, 6.09.2017
Looking at newer numbers we see that Python is among the top 4 most popular languages for by more than 40% of professional developers. Only around 3% mentioned Matlab or R as their most popular langugae. To put this even further into perspective: Pythons is only outclassed by HTML, Javascript and SQL. The main languages that keep the internet and most dataflows around the world running.
You can follow the results here: 2022 Stackoverflow Developer Survey: most popular languages
2022 Stackoverflow Developer Survey: most popular languages for professionals
Among people learning to code Python is even more popular, with more than 50% voting for Python. Matlab and R are again around 6%.
2022 Stackoverflow Developer Survey: most popular languages for learners
High-level¶
Python features a high level of abstraction, meaning it does a lot of work for you:
Many operations that are explicit in lower-level languages (e.g., C/C++) are implicit in Python
E.g., memory allocation, garbage collection, etc.
Python lets you write code faster and requires less technical skill
For example below you’ll find code on how to read files with Java:
File reading in Java¶
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
public class ReadFile {
public static void main(String[] args) throws IOException{
String fileContents = readEntireFile("./foo.txt");
}
private static String readEntireFile(String filename) throws IOException {
FileReader in = new FileReader(filename);
StringBuilder contents = new StringBuilder();
char[] buffer = new char[4096];
int read = 0;
do {
contents.append(buffer, 0, read);
read = in.read(buffer);
} while (read >= 0);
return contents.toString();
}
}
Compared to this single line of Python code, which does the exact same thing:
File-reading in Python¶
open(filename).read()
General purpose¶
You can truly do almost everything in Python
Widely used in many areas of software development (web, dev-ops, data science, etc.; So learning Python for research already provides the ground work for switching to different kind of fields)
Dynamic¶
Code is interpreted at [runtime](https://en.wikipedia.org/wiki/Runtime_(program_lifecycle_phase), meaning you hit run and the code is processes:
No compilation process*; code is read line-by-line when executed
Eliminates delays between development and execution
The downside: poorer performance compared to compiled languages (like C), but this difference is often of little importance in research
Why do programming/science in Python?¶
Lets go through some advantages of the python
programming language.
Easy to learn¶
Readable, explicit syntax, close to natural language
Most packages are very well documented
e.g.,
scikit-learn
’s documentation is widely held up as a model
A huge number of tutorials, guides, and other educational materials
Comprehensive standard library¶
Python offers a comprehensive standard library containing contains a huge number of high-quality modules (meaning that you can do most things by relying purely on standard Python and don’t have to import functions from third-parties)
When in doubt, check the standard library first before you write your own tools!
For example:
os
: operating system toolsre
: regular expressionscollections
: useful data structuresmultiprocessing
: simple parallelization toolspickle
: serializationjson
: reading and writing JSON
Exceptional external libraries¶
Python
has very good (often best-in-class) externalpackages
packages are installable via a universal package manager, giving access to an enormous ecosystem of third-party packages
for everertying that you could possibly need to do science there is a python package out there
Particularly important for “data science”, which draws on a very broad toolkit
Package, dependency and environment management is easy via seemless integration with the conda system
Example packages:
Web development: flask, Django
Database ORMs: SQLAlchemy, Django ORM (w/ adapters for all major DBs)
Scraping/parsing text/markup: beautifulsoup, scrapy
Natural language processing (NLP): nltk, gensim, textblob
Numerical computation and data analysis: numpy, scipy, pandas, xarray, statsmodels, pingouin
Machine learning: scikit-learn, Tensorflow, keras
Image processing: pillow, scikit-image, OpenCV
audio processing: librosa, pyaudio
Plotting: matplotlib, seaborn, altair, ggplot, Bokeh
GUI development: pyQT, wxPython
Testing: py.test
Etc. etc. etc.
(Relatively) good performance¶
Python
is a high-level dynamic language — this comes at a performance costFor many (not all!) use cases, performance is irrelevant most of the time
In general, the less
Python
code you write yourself, the better your performance will beMuch of the standard library consists of
Python
interfaces toC
functionsNumpy
,scikit-learn
, etc. all rely heavily onC/C++
orFortran
Widely used¶
The Python community is extremely large and well-connected, it’s relatively easy to find solutions to common problems.
It’s much more likely that you’ll encounter Python code when trying to collaborate with other sceintists or looking for implementations of e.g. code for an anylsis you want to try out.
It’s great for your CV. While the market for people using other programming languages popular for scientific work like R and Matlab are much smaller
Python vs. other data science languages¶
Python
competes for mind share with many other languagesMost notably,
R
To a lesser extent,
Matlab
,Mathematica
,SAS
,Julia
,Java
,Scala
, etc.
R¶
R is wide-spread in traditional statistics and some fields of science
Has attracted many SAS, SPSS, and Stata users
Exceptional statistics support; hundreds of best-in-class libraries
Designed to make data analysis and visualization as easy as possible
Slow
Language quirks drive many experienced software developers crazy
Less support for most things non-data-related
MATLAB¶
A proprietary numerical computing language used in engineering and science
Good performance and very active development, but expensive
Closed ecosystem, relatively few third-party libraries
There is an open-source port (Octave)
Not suitable for use as a general-purpose language
So, why Python?¶
Why choose Python over other languages?
Arguably none of these offers the same combination of readability, flexibility, libraries, and performance
Leading packages in modern science including A.I and Machine learning, but also Neuroimaging
Python is sometimes described as “the second best language for everything”
Doesn’t mean you should always use Python
Depends on your needs, community, etc.
You can have your cake and eat it!¶
Roadmap¶
Modules
Now that we’ve talked extensively about what Python is and why you should consider learning Python let’s introduce the basics of Python programming
Modules¶
Most of the functionality in Python
is provided by modules. To use a module in a Python program it first has to be imported. A module can be imported using the import
statement.
For example, to import the module math
, which contains many standard mathematical functions, we can do:
Note: You can run the following lines of code via the live code implementation by clicking on the rocket at the top of the page and selecting Live code
. If you’ve opened this page via Binder as described above or downloaded the course materials follow along with the next couple of excerices. You can also open a new notebook as described above and copy the following code cells into your notebook.
import math
This includes the whole module and makes it available for use later in the program. For example, we can do:
import math
x = math.cos(2 * math.pi)
print(x)
1.0
where math.cos() is what is understood as a function (i.e a block of code which runs only when it is called
). Functions can usually be identified by the Brackets ()
directly following the function name.
Importing the whole module is often times unnecessary and can lead to longer loading times or increased memory consumption. As an alternative to the previous method, we can also choose to import only a few selected functions from
a module by explicitly listing which ones we want to import:
from math import cos, pi
x = cos(2 * pi)
print(x)
1.0
You can make use of tab
again to get a list of functions
/classes
/etc. for a given module
. Try it out via navigating the cursor behind the import statement
and press tab
:
from math import
Cell In [4], line 1
from math import
^
SyntaxError: invalid syntax
Comparably you can also use the help
function (one of a numer of pythons build-in functions) to find out more about a given module
:
import math
help(math)
It is also possible to give an imported module or symbol your own access name with the as
additional:
import numpy as np
from math import pi as number_pi
x = np.rad2deg(number_pi)
print(x)
You can provide any name (given it’s following python
/coding
conventions) but focusing on intelligibility won’t be the worst idea.
Remember “Readibility counts”
import matplotlib as pineapple
pineapple.
Exercise 1.1¶
Import the max
from numpy
and find out what it does.
note: the #
character tells python not to run the following code in that line. It is therefore used to embedd comments in our code while avoiding errors. You can try deleting the #
and see what happens.
# write your solution in this code cell
Exercise 1.2¶
Import the scipy
package and assign the access name middle_earth
and check its functions
.
# write your solution in this code cell
Exercise 1.3¶
What happens when we try to import a module
that is either misspelled or doesn’t exist in our environment
or at all?
python
provides us a hint that themodule
name might be misspelledwe’ll get an
error
telling us that themodule
doesn’t existpython
automatically searches for themodule
and if it exists downloads/installs it
import welovethiscourse
A module consists of a python file (.py) containing code that will automatically run on import, but also allows us to access functions specfied in the module file. In the Introduction folder you’ll find a file called hello_statement.py
import hello_statement
Let’s have a look at the functions contained
help(hello_statement)
and call them as usual
hello_statement.some_func()
hello_statement.some_other_func()
hello_statement.some_other_other_func()
Namespaces and imports¶
Python is very serious about maintaining orderly
namespaces
(i.e unique identifiers)If you want to use some code outside the current scope, you need to explicitly “
import
” itPython’s import system often annoys beginners, but it substantially increases
code
clarityAlmost completely eliminates naming conflicts and confusion
Help and Descriptions¶
Using the function help
we can get a description of almost all functions.
help(math.log)
math.log(10)
math.log(10, 2)
Variables and data types¶
in programming
variables
are things that storevalues
in
Python
, we declare avariable
by assigning it avalue
with the=
signname = value
code
variables
!= math variablesin mathematics
=
refers to equality (statement of truth), e.g.y = 10x + 2
in coding
=
refers to assignments, e.g.x = x + 1
(instead we use==
to test for equality)
Variables are pointers, not data stores!
Python
supports a variety ofdata types
andstructures
:booleans
numbers
(ints
,floats
, etc.)strings
lists
dictionaries
many others!
We don’t specify a variable’s type at assignment
Variables and types¶
Symbol names¶
Variable names in Python can contain alphanumerical characters a-z
, A-Z
, 0-9
and some special characters such as _
. Normal variable names must start with a letter.
By convention, variable names start with a lower-case letter, and Class names start with a capital letter.
In addition, there are a number of Python keywords that cannot be used as variable names. These keywords are:
and, as, assert, break, class, continue, def, del, elif, else, except, exec, finally, for, from, global, if, import, in, is, lambda, not, or, pass, print, raise, return, try, while, with, yield
Assignment¶
(Not your homework assignment but the operator in python
.)
The assignment operator in Python
is =
. Python
is a dynamically typed language
, so we do not need to specify the type of a variable
when we create one.
Assigning
a value
to a new variable
creates the variable
:
# variable assignment
x = 1.0
Again, this does not mean that x
equals 1
but that the variable
x
contains the value
1
. Thus, our variable
x
is stored in the respective namespace
:
x
1.0
This means that we can directly utilize the value
of our variable
:
x + 3
4.0
Although not explicitly specified, a variable
does have a type
associated with it. The type
is derived from the value
it was assigned
.
type(x)
float
If we assign
a new value
to a variable
, its type
can change.
x = 1
type(x)
int
This outlines one more very important characteristic of python
(and many other programming languages):
variables
can be directly overwritten by assigning
them a new value
.
We don’t get an error like “This namespace
is already taken.” Thus, always remember/keep track of what namespaces
were already used to avoid unintentional deletions/errors (reproducibility/replicability much?).
ring_bearer = 'Bilbo'
ring_bearer
'Bilbo'
ring_bearer = 'Frodo'
ring_bearer
'Frodo'
If we try to use a variable that has not yet been defined we get an NameError
fellowship
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In [12], line 1
----> 1 fellowship
NameError: name 'fellowship' is not defined
Note for later sessions, that we will use in the notebooks try/except
blocks to handle the exception, so the notebook doesn’t stop. The code below will try to execute the print
function and if the NameError
occurs the error message will be printed. Otherwise, an error will be raised. You will learn more about exception handling later.
try:
print(Peeeeeer)
except(NameError) as err:
print("NameError", err)
else:
raise
NameError name 'Peeeeeer' is not defined
Variable names:
Can include
letters
(A-Z),digits
(0-9), andunderscores
( _ )Cannot start with a
digit
Are case sensitive (question: where did “lower/upper case” originate?)
This means that, for example:
shire0
is a valid variable name, whereas0shire
is notshire
andShire
are different variables
Exercise 2.1¶
Create the following variables
n_elves
, n_dwarfs
, n_humans
with the respective values 3
, 7.0
and nine
.
# write your solution here
Exercise 2.3¶
Consider the following lines of code.
ring_bearer = 'Gollum'
ring_bearer
ring_bearer = 'Bilbo'
ring_bearer
What is the final output?
'Bilbo'
'Gollum'
neither, the variable got deleted
Fundamental types & data structures¶
Most code requires more complex structures built out of basic data
types
data type
refers to thevalue
that isassigned
to avariable
Python
provides built-in support for many common structuresMany additional structures can be found in the collections module
Most of the time you’ll encounter the following data types
integers
(e.g.1
,42
,180
)floating-point numbers
(e.g.1.0
,42.42
,180.90
)strings
(e.g."Rivendell"
,"Weathertop"
)Boolean
(True
,False
)
If you’re unsure about the data type
of a given variable
, you can always use the type()
command.
Integers¶
Lets check out the different data types
in more detail, starting with integers
. Intergers
are natural numbers that can be signed (e.g. 1
, 42
, 180
, -1
, -42
, -180
).
x = 1
type(x)
int
n_nazgul = 9
type(n_nazgul)
int
remaining_rings = -1
type(remaining_rings)
int
Floating-point numbers¶
So what’s the difference to floating-point numbers
? Floating-point numbers
are decimal-point numbers that can be signed (e.g. 1.0
, 42.42
, 180.90
, -1.0
, -42.42
, -180.90
).
x_float = 1.0
type(x_float)
float
n_nazgul_float = 9.0
type(n_nazgul_float)
float
remaining_rings_float = -1.0
type(remaining_rings_float)
float
Strings¶
Next up: strings
.
Strings
are basically text elements
, from letters
to words
to sentences
all are encoded as strings
in python
. In order to define a string
, Python
needs quotation marks, more precisely strings
start and end with quotation marks, e.g. "Rivendell"
. You can choose between "
and '
as both will work (NB: python
will put '
around strings
even if you specified "
). However, it is recommended to decide on one and be consistent.
location = "Weathertop"
type(location)
str
abbreviation = 'LOTR'
type(abbreviation)
str
book_one = "The fellowship of the ring"
type(book_one)
str
Booleans¶
How about some Boolean
s? At this point it gets a bit more “abstract”. While there are many possible numbers
and strings
, a Boolean can only have one of two values
: True
or False
. That is, a Boolean
says something about whether something is the case or not. It’s easier to understand with some examples. First try the type()
function with a Boolean
as an argument.
True
True
b1 = True
type(b1)
bool
b2 = False
type(b2)
bool
lotr_is_awesome = True
type(lotr_is_awesome)
bool
Interestingly, True
and False
also have numeric values
! True
has an integer value of 1
and False
has a value of 0
.
True + True
2
wrongs = False + False
print(wrongs)
type(wrongs)
0
int
Converting data types¶
As mentioned before the data type
is not set when assigning
a value
to a variable
but determined based on its properties. Additionally, the data type
of a given value
can also be changed via set of functions.
int()
-> convert thevalue
of avariable
to aninteger
float()
-> convert thevalue
of avariable
to afloating-point number
str()
-> convert thevalue
of avariable
to astring
bool()
-> convert thevalue
of avariable
to aBoolean
int("4")
4
float(3)
3.0
str(2)
'2'
bool(1)
True
Exercise 3.1¶
Define the following variables
with the respective values
and data types
: fellowship_n_humans
with a value
of two as a float
, fellowship_n_hobbits
with a value
of four as a string
and fellowship_n_elves
with a value of one as an integer
.
# write your solution here
Exercise 3.2¶
What outcome would you expect based on the following lines of code?
True - False
type(True)
# write your solution here
Exercise 3.3¶
Define two variables
, fellowship_n_dwarfs
with a value
of one as a string
and fellowship_n_wizards
with a value
of one as a float
.
Subsequently, change the data type
of fellowship_n_dwarfs
to integer
and the data type
of fellowship_n_wizard
to string
.
# write your solution here
# write your solution here
The core Python “data science” stack¶
The Python ecosystem contains tens of thousands of packages
Several are very widely used in data science applications:
Jupyter: interactive notebooks
Numpy: numerical computing in Python
pandas: data structures for Python
Scipy: scientific Python tools
Matplotlib: plotting in Python
scikit-learn: machine learning in Python
We’ll cover the first three very briefly here
Other tutorials will go into greater detail on most of the others
The core “Python for psychology” stack¶
The
Python ecosystem
contains tens of thousands ofpackages
Several are very widely used in psychology research:
Jupyter: interactive notebooks
Numpy: numerical computing in
Python
pandas: data structures for
Python
Scipy: scientific
Python
toolsMatplotlib: plotting in
Python
seaborn: plotting in
Python
scikit-learn: machine learning in
Python
statsmodels: statistical analyses in
Python
pingouin: statistical analyses in
Python
psychopy: running experiments in
Python
nilearn: brain imaging analyses in `Python``
mne: electrophysiology analyses in
Python
Execept
scikit-learn
,nilearn
andmne
, we’ll cover all very briefly in this coursethere are many free tutorials online that will go into greater detail and also cover the other
packages
Achknowledgments¶
most of what you’ll see within this lecture was prepared by Ross Markello, Michael Notter and Peer Herholz and further adapted for this course by Peer Herholz, Michael Ernst & Felix Körber
based on Tal Yarkoni’s “Introduction to Python” lecture at Neurohackademy 2019
based on http://www.stavros.io/tutorials/python/ & http://www.swaroopch.com/notes/python
based on https://github.com/oesteban/biss2016 & https://github.com/jvns/pandas-cookbook
Michael Ernst
Phd student - Fiebach Lab, Neurocognitive Psychology at Goethe-University Frankfurt
Peer Herholz (he/him)
Research affiliate - NeuroDataScience lab at MNI/MIT
Member - BIDS, ReproNim, Brainhack, Neuromod, OHBM SEA-SIG, UNIQUE
@peerherholz