Data Science

  1. Course description in Course Catalogue (Studiegids)
  2. Rooster on Datanose
  3. Blackboard
  4. Docent, times
  5. Literature
    1. Lecturenotes
    2. Software
  6. Course Objectives/Leerdoelen
  7. Ervaringen van eerdere studenten
  8. Examination/toetsing
  9. Fraud
  10. Week by Week/Course Plan
  11. Site Overview

Docent

  1. Lecturer: Maarten Marx, http://maartenmarx.nl
  2. Assistant(s): Alexandra Arkut, Victor Milewski, Ruben Blom
  3. For times, see Datanose

Literature

  1. Davenport, Thomas H., and D. J. Patil. "Data scientist." Harvard business review 90.5 (2012): 70-76.
  2. Vasant Dhar. 2013. Data science and prediction. Commun. ACM 56, 12 (December 2013), 64-73. DOI: http://dx.doi.org/10.1145/2500499
  3. Chapters 2 and 3 from Discovering knowledge in data : an introduction to data mining by Daniel T. Larose. 2005, Wiley and Sons. Must read
  4. Python for Data Analysis, O’Reilly Media - Python for Data Analysis is concerned with the nuts and bolts of manipulating, processing, cleaning, and crunching data in Python. It is also a practical, modern introduction to scientific computing in Python, tailored for data-intensive applications. This is a book about the parts of the Python language and libraries you’ll need to effectively solve a broad set of data analysis problems. This book is not an exposition on analytical methods using Python as the implementation language.
  5. Fernando P?rez and Brian E. Granger. IPython: A System for Interactive Scientific Computing, Computing in Science & Engineering, 9, 21-29 (2007),DOI:10.1109/MCSE.2007.53
  6. Tutorials and articles available on the web.
    1. See the lecture notes.
    2. Shen, Helen. Interactive notebooks: Sharing the code. Nature 515, 151–152 (06 November 2014) doi:10.1038/515151a

Recommended literature

  1. Learning IPython for Interactive Computing and Data Visualization
  2. IPython Interactive Computing and Visualization Cookbook
  3. NumPy & SciPy: Stefan van der Walt, S. Chris Colbert and Ga?l Varoquaux. The NumPy Array: A Structure for Efficient Numerical Computation, Computing in Science & Engineering, 13, 22-30 (2011), DOI:10.1109/MCSE.2011.37
  4. Matplotlib John D. Hunter. Matplotlib: A 2D Graphics Environment, Computing in Science & Engineering, 9, 90-95 (2007), DOI:10.1109/MCSE.2007.55

More pointers to literature

  1. An overview of introduction tutorials

Lecturenotes

Each lecture will be accompanied by lecture notes, and often slides too. These notes are typically IPython Notebooks or MarkDown files. Lecture notes are valuable pointers to the literature and form the basic requirements of what you are supposed to know. They are excellent material for helping you master the course and know what you should study for the exam.

Lecturenotes Folder

Software

Becoming fluent in a number of software tools that help you organize your work is a key objective in this course. These tools and languages help you focus on the content instead of the form. Besides they form a type of logbook which helps in keeping things together. In the end it is very easy to produce a good looking report (or thesis) out of these logs. Possibly, you may even hand in the notebook itself as a final report.

You need to install:

  1. IPython and IPython notebook. See Installing IPython and the recommended packages from Learning IPython. See http://ipython.org/install.html#i-am-getting-started-with-python for packages.
  2. Several Python packages (many come already with Anaconda)
  3. Have a look at this nice list of top 15 Data Science Python Modules. And use them!

We strongly recommend you to install: (as you will most probably (must) use this when writing your thesis anyway)

  1. A good LateX editor.
  2. Git and a private github account
Programming languages

You will learn the following languages

  1. IPython Notebook specifics.
  2. Markdown The key language to put comments in your Notebooks.
  3. LateX
  4. nbconvert

You will get familiar with several data science tools. Some of them use the command line, others Python packages like pandas , scipy and numpy.

Course Objectives

See Course description in Course Catalogue (Studiegids)

Ervaringen van eerdere studenten

Zie http://maartenmarx.nl/teaching/DataScience/CoursePlan/oordelen_van_studenten_over_het_vak_data_science.pdf

Examination/toetsing

We assess progess in this course by weekly assignments which are graded, by a mid course exam, and by a final mini project. Assignments and project can only be done in a group of two persons.

Exam is individual.

Attendance of the werkcolleges is obligatory and is registered in DataNose only at the begining of the werkcollege.

You may only miss one of the werkcolleges.

The rules for passing the exam are:

Link to the week by week schedule for the assignments and exams

Model answers to the assignments are given during the first hour of the werkcollege

Assignment grading

We grade the notebook you handed in on Blackboard. During the werkcollege we ask you to demonstrate your notebook and explain the code. Both members of the group must be able to explain and run all code. If you are not able to explain what you have programmed, your grade will be lowered.

When and how to hand in assignments?

Nog meer regeltjes voor de opgaven:

Fraud, plagiarism, sharing answers


Site Overview