Privacy in Statistics and Machine Learning
Spring 2023
Course Overview: How can we learn from a data set of sensitive information while providing meaningful privacy to the individuals whose information it contains? The course explores this question, starting from the problems faced by straightforward solutions and moving on to rigorous state-of-the-art solutions using differential privacy. The class will focus on foundations, but also delve into some applied work and on some of the social, ethical, and legal context for the subject. Students will be required to complete some mathematical assignments, some light programming assignments, and a final course project.
These two MinutePhysics videos ( 1, 2) give a light introduction to the course topics in the context of the US decennial census.
Instructor | Contact | Office Hours |
---|---|---|
Adam Smith | (Piazza) | Tuesdays 2:00-3:00pm in CDS 1038 |
Based on materials co-developed with Jonathan Ullman.
Links:
- Piazza
- Gradescope (entry code WV668E)
- Google Drive
Time and Place:
- Tuesday/Thursday, 11:00 AM - 12:15 PM in CAS 216 (725 Commonwealth Ave, Boston, MA)
Flipped Classroom Lectures: For certain lectures, the instructor(s) will be recording lecture videos, which will be made available via the course website. Scheduled class time will be used for discussion of the lecture material and collaboratively solving related problems. You are expected to watch the recorded lectures, come to scheduled class times, and participate actively in the discussion. Lectures will be posted at least 48 hours ahead of time, generally more.
Course Topics: The exact set of topics will evolve as the course proceeds, but a representative list includes:
- Attacks on statistical data privacy
- What does âprivacyâ mean in learning and statistics?
- Defining privacy: differential privacy and its variants
- Achieving privacy: algorithmic tools for differential privacy
- Applications of differential privacy
- Legal and ethical frameworks relating to privacy
- Connections to other areas of computer science and statistics
Please see the schedule tab for the most up-to-date information about the course topics.
Textbook: There is no official textbook for the class. Some good (and free) resources for the material are
- C. Dwork and A. Roth. The Algorithmic Foundations of Differential Privacy. 2014.
- S. Vadhan. The Complexity of Differential Privacy. 2017
- K. Ligett, K. Nissim, V. Shmatikov, A. Smith, J. Ullman. Differential Privacy: From Theory to Practice. 7th Bar Ilan University Winter School on Cryptography. 2017
- J. Near, C. Abuah. Programming Differential Privacy.
Prerequisites: Students should have a solid grounding in probability and statistics, linear algebra, basic vector calculus, and algorithms. Students should be comfortable reading and writing mathematical proofs involving algorithms and probability.
Coursework and Grading: The grade will be based on:
- a significant course project (45%)
- written assignments (45%)
- participation in class and on Piazza (10%)
Information about the assignments and final project can be found on the assignments tab. You are expected to watch the recorded lectures, come to scheduled class times, and participate actively in the discussion.
Auditors are welcome: In particular, students from other universities are welcome to attend and participate in discussions. If youâre interested in auditing, please contact the instructor to introduce yourself!
Collaboration and academic conduct: You may discuss homework assignments and projects with classmates, but you are solely responsible for what you turn in. Collaboration in the form of discussion is allowed, but all forms of cheating (copying parts of a classmateâs assignment, plagiarism from papers or old posted solutions) are NOT allowed. A rough rule of thumb: you should be able to walk away from a discussion of a homework problem with no notes at all and write your solution on your own. See also the BU GRS academic conduct code.
Late-work policy: In order to help you deal with unexpected problems or bursty work deadlines, we are giving everyone a budget of six late days to distribute as they see fit, no questions asked. You may use these late days on any assignment, including project milestones, except for the final project report. You may only use integer numbers of late days. For example, turning in an assignment 25 hours late counts for two late days. Additional extensions beyond your allocated late days be granted only in rare circumstances.
Course atmosphere, diversity and inclusion: We intend to provide a positive and inclusive atmosphere in classes (in-person or remote) and on the associated virtual platforms. Students from a wide range of backgrounds and with a diverse set of perspectives are welcome. We ask that students treat each other with thoughtfulness and respect, and do their part to make all their peers feel welcome. Your suggestions are encouraged and appreciated. Please let us know ways to improve the effectiveness of the course for you personally or for other students or student groups.