Instructor | David Rosenberg |
---|---|
Lecture | Wednesdays 7:10pm–9pm, Warren Weaver Hall 109 |
Lab | Thursday 7:10pm–8pm, Warren Weaver Hall 109 |
Office Hours | Instructor: Thursday 8pm–9pm, Warren Weaver Hall 109 |
TA: Wednesdays 2pm–3pm, Warren Weaver Hall 605 | |
Graders: Tuesdays 2pm–4pm in the CDS common area |
This course covers a wide variety of topics in machine learning and statistical modeling. While mathematical methods and theoretical aspects will be covered, the primary goal is to provide students with the tools and principles needed to solve the data science problems found in practice. This course also serves as a foundation on which more specialized courses and further independent study can build.
This course was designed as part of the core curriculum for the Center for Data Science's Masters degree in Data Science. Other interested students who satisfy the prerequisites are welcome to take the class as well. Note that class is intended as a continuation of DS-GA-1001 Intro to Data Science, which covers some important, fundamental data science topics that may not be explicitly covered in this DS-GA class (e.g. data cleaning, cross-validation, and sampling bias).
This term we will be using Piazza for class discussion. The system is highly catered to getting you help fast and efficiently from classmates, the TA, graders, and the instructor. Rather than emailing questions to the teaching staff, you are encouraged to post your questions on Piazza. If you have any problems or feedback for the developers, email team@piazza.com.
Other information:
Homework (40%) + One-Hour Test (15%) + Two-Hour Test (25%) + Project (20%)
Many homework assignments will have problems designated as “optional”. At the end of the semester, strong performance on these problems may lift the final course grade by up to half a letter grade (e.g. B+ to A- or A- to A), especially for borderline grades. You should view the optional problems primarily as a way to engage with more material, if you have the time. Along with the performance on optional problems, we will also consider significant contributions to Piazza and in-class discussions for boosting a borderline grade.
(If you find additional references that you recommend, please share them on Piazza and we can add them here.)
Homework Submission: Homework should be submitted through NYU Classes.
Late Policy: Homeworks are due at 6pm on the date specified. Homeworks will still be accepted for 48 hours after this time but will have a 20% penalty.
Collaboration Policy: You may discuss problems with your classmates. However, you must write up the homework solutions and the code from scratch, without referring to notes from your joint session. In your solution to each problem, you must write down the names of any person with whom you discussed the problem—this will not affect your grade.
The project is your opportunity for in-depth engagement with a data science problem. In job interviews, it's often your course projects that you end up discussing, so it has some importance even beyond this class. That said, it's better to pick a project that you will be able to go deep with (in terms of trying different methods, feature engineering, error analysis, etc.), than choosing a very ambitious project that requires so much setup that you will only have time to try one or two approaches.
A good project for this class is one that's a real "problem", in the sense that you have something you want to accomplish, and it's not necessarily clear from the beginning the best approach. The techiques used should be relevant to our class, so most likely you will be building a prediction system. A probabilistic model would also be acceptable, though we will not be covering these topics until later in the semester.
To be clear, the following approaches would be less than ideal:
The project proposal should be roughly 2 pages, though it can be longer if you want to include figures or sample data that will be helpful to your presentation. Your proposal should do the following:
David is a data scientist in the office of the CTO at Bloomberg L.P. Formerly he was Chief Scientist of YP Mobile Labs at YP.
Levent is a PhD student at the Courant Institute of Mathematical Sciences.
Peter Li (Head Grader)
Peter is a second year student in the Data Science program at NYU.
Lucy is a Master's student in Data Science at NYU. She is also working as an investor and in-house data scientist at Greycroft Partners, a venture capital firm making investments in early-stage tech companies.
Jackie is a second-year student in the Center for Data Science. She currently works as a researcher on educational measurement issues in computer-supported collaborative learning and has experience in statistical consulting.
Daniel is at the Institute for Advanced Studies in Toulouse and Toulouse School of Economics. He is a former Chair of Law and Economics at ETH Zurich (2012-2015), Duke Assistant Professor of Law, Economics, and Public Policy (2010-2012), and Kauffman Fellow at the University of Chicago Law School (2009-2010).
Brian is Director of Data Science at Zocdoc, and he was formerly the VP of Data Science at Dstillery. He is also an Adjunct Professor of Data Science at NYU Stern School of Business.
Kurt is a researcher at the quantitative hedge fund PDT Partners.
Bonnie is VP Data Science at Pegged Software. Prior to Pegged, she was Director, Cognitive Algorithms, at IBM Research and has also served on the faculty at the New Jersey Institute of Technology.
Kush is a research staff member at IBM Research and a data ambassador with DataKind.