Data Science

  1. Course description in Course Catalogue (Studiegids)
  2. Rooster on Datanose
  3. Blackboard
  4. Docent, times
  5. Literature
    1. Lecturenotes
    2. Software
  6. Course Objectives/Leerdoelen
  7. Ervaringen van eerdere studenten
  8. Examination/toetsing
  9. Fraud
  10. Week by Week/Course Plan
  11. Site Overview


  1. Lecturer: Maarten Marx,
  2. Assistant(s): Alexandra Arkut, Victor Milewski, Ruben Blom
  3. For times, see Datanose


  1. Davenport, Thomas H., and D. J. Patil. "Data scientist." Harvard business review 90.5 (2012): 70-76.
  2. Vasant Dhar. 2013. Data science and prediction. Commun. ACM 56, 12 (December 2013), 64-73. DOI:
  3. The Case For Python in Scientific Computing, Hugo Bowne Anderson, 2017, Datacamp.
  4. Chapters 2 and 3 from Discovering knowledge in data : an introduction to data mining by Daniel T. Larose. 2005, Wiley and Sons.
  5. Python Data Science Handbook This O Reilly book is also freely avialable as a set of notebooks. Download them and read the book interactively.
Recommended but not obligatory literature
  1. Learning IPython for Interactive Computing and Data Visualization
  2. IPython Interactive Computing and Visualization Cookbook
  3. Python for Data Analysis, O’Reilly Media - Python for Data Analysis is concerned with the nuts and bolts of manipulating, processing, cleaning, and crunching data in Python. It is also a practical, modern introduction to scientific computing in Python, tailored for data-intensive applications. This is a book about the parts of the Python language and libraries you’ll need to effectively solve a broad set of data analysis problems. This book is not an exposition on analytical methods using Python as the implementation language.
  4. Tutorials and articles available on the web.
    1. See the lecture notes.
    2. Shen, Helen. Interactive notebooks: Sharing the code. Nature 515, 151–152 (06 November 2014) doi:10.1038/515151a
  5. NumPy & SciPy: Stefan van der Walt, S. Chris Colbert and Ga?l Varoquaux. The NumPy Array: A Structure for Efficient Numerical Computation, Computing in Science & Engineering, 13, 22-30 (2011), DOI:10.1109/MCSE.2011.37
  6. Matplotlib John D. Hunter. Matplotlib: A 2D Graphics Environment, Computing in Science & Engineering, 9, 90-95 (2007), DOI:10.1109/MCSE.2007.55
More pointers to literature
  1. Overview of datascience cheatsheets at datacamp
  2. An overview of introduction tutorials
  3. LowClass Python: Style Guide for Data Scientists


Each lecture will be accompanied by lecture notes, and often slides too. These notes are typically IPython Notebooks or MarkDown files. Lecture notes are valuable pointers to the literature and form the basic requirements of what you are supposed to know. They are excellent material for helping you master the course and know what you should study for the exam.

Lecturenotes Folder


Becoming fluent in a number of software tools that help you organize your work is a key objective in this course. These tools and languages help you focus on the content instead of the form. Besides they form a type of logbook which helps in keeping things together. In the end it is very easy to produce a good looking report (or thesis) out of these logs. Possibly, you may even hand in the notebook itself as a final report.

You need to install:

  1. IPython and IPython notebook. See Installing IPython and the recommended packages from Learning IPython. See for packages.
  2. Several Python packages (many come already with Anaconda)
  3. Have a look at this nice list of top 15 Data Science Python Modules. And use them!

We strongly recommend you to install: (as you will most probably (must) use this when writing your thesis anyway)

  1. A good LateX editor.
  2. Git and a private github account
Programming languages

You will learn the following languages

  1. IPython Notebook specifics.
  2. Markdown The key language to put comments in your Notebooks.
  3. LateX
  4. nbconvert

You will get familiar with several data science tools. Some of them use the command line, others Python packages like pandas , seaborn, scikit-learn and numpy.


If you like to work in the cloud, try Google Cloud service (at time of writing you get a $300 free trial). It is quite easy to set up a jupyter anaconda environment at Google, see

Course Objectives

See Course description in Course Catalogue (Studiegids)

Ervaringen van eerdere studenten



We assess progess in this course by weekly graded assignments and two exams. Assignments and project can only be done in a group of two persons.

Exams are individual.

Attendance of the werkcolleges is obligatory and is registered in DataNose only at the begining of the werkcollege.

You may only miss one of the werkcolleges.

Link to the week by week schedule for the assignments and exams

Model answers to the assignments are given during the first hour of the werkcollege

Assignment grading

We grade the notebook you handed in on Blackboard. During the werkcollege we ask you to demonstrate your notebook and explain the code. Both members of the group must be able to explain and run all code. If you are not able to explain what you have programmed, your grade may be lowered.

When and how to hand in assignments?

Nog meer regeltjes voor de opgaven:

Fraud, plagiarism, sharing answers

Site Overview