Welcome to INFO 2950

Lecture 1

Dr. Benjamin Soltoff

Cornell University
INFO 2950 - Spring 2024

January 23, 2024

Agenda

Agenda

  • Intros
  • What is data science?
  • Software
  • Application exercise
  • Course overview
  • This week’s tasks

Students on the waitlist

  • INFO 2950 enrollment is restricted to IS/ISST majors
  • If you are not an IS/ISST major (or are still in the process of affiliating), join the waitlist through Student Center
  • PINs distributed on a rolling basis
  • We currently have over 50 seats available

Staff intros

Meet the instructor

Dr. Benjamin Soltoff

Lecturer in Information Science

Gates Hall 216

Headshot of Dr. Benjamin Soltoff

Meet the course team

Grad TAs

  • Arunabh S
  • Boris H
  • Chenyu Y
  • Jingruo C
  • Breanna G
  • Eun-Jeong K
  • Pin-Sung K

Undergrad TAs

  • Alexia A
  • Andrew M
  • Arthur S
  • Bella H
  • Kevin C
  • Claire Y
  • Elliot K
  • Gaby M
  • Gabby F
  • Hung P
  • Israel D
  • Jocelyn P


  • Joyce C
  • Karla W
  • Max B
  • Mateo C
  • Ming D
  • Philan T
  • Richie S
  • Sam G
  • Shuqian L
  • Shara S
  • Vanessa S

Meet each other!

Physically interact with at least 2 people sitting around you. Introduce yourselves to each other and share:

  • Your name - Prof/Dr. Soltoff
  • Your major - Political science
  • The last movie you saw - To All The Boys I’ve Loved Before
  • What you hope to get out of this class - A paycheck
02:00

What is data science?

What is data science?

  • Data science is an exciting discipline that allows you to turn raw data into understanding, insight, and knowledge.

    [A]n interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract or extrapolate knowledge and insights from noisy, structured and unstructured data, and apply knowledge from data across a broad range of application domains1

  • We’re going to learn to do this in a tidy way – more on that later!

  • This is a course on computing applications for data science workflows

Software

Excel - not…

An Excel window with data about countries

R

An R shell

RStudio

An RStudio window

Data science life cycle

Data science life cycle

Data science life cycle

Import

Data science life cycle, with import highlighted

Tidy + transform

Data science life cycle, with tidy and transform highlighted

Visualize

Data science life cycle, with visualize highlighted

Model

Data science life cycle, with model highlighted

Understand

Data science life cycle, with understand highlighted

Communicate

Data science life cycle, with communicate highlighted

Understand + communicate

Data science life cycle, with understand and communicate highlighted

Program

Data science life cycle, with program highlighted

Let’s dive in!

Application exercise

Or more like demo for today…

📋 github.coecis.cornell.edu/info2950-sp24/ae-00-unvotes

🛠️ Rendered report

What do we learn from this chart?

Respond at PollEv.com/soltoff

03:00

Course overview

Homepage

https://info2950.infosci.cornell.edu/

  • All course materials
  • Links to Canvas, GitHub, RStudio Workbench, etc.
  • Let’s take a tour!

Course toolkit

All linked from the course website:

Important

Make sure you can access RStudio (Posit) Workbench before lab on Friday.

Activities: Prepare, Participate, Practice, Perform

  • Prepare: Introduce new content and prepare for lectures by completing the readings

  • Participate: Attend and actively participate in lectures and labs, office hours, team meetings

  • Practice: Practice applying statistical concepts and computing with application exercises during lecture, graded for completion

  • Perform: Put together what you’ve learned to analyze real-world data

    • Lab assignments x 7(-ish) (team-based)
    • Homework assignments x 7(-ish) (individual)
    • Exams (mid-semester take-home, final in-person)
    • Team project

Cadence

  • Application exercises: Complete by the end of the following day
  • Labs: Start and make large progress on Friday in lab section, finish up by Monday 11:59pm the following week
  • HWs: Posted Friday morning, due following Wednesday 11:59pm
  • Exam: More details later this semester
  • Project: Deadlines throughout the semester, with some lab time dedicated to working on them, and most work done in teams outside of class

Grading

Category Percentage
Exams 25%
Homework 25%
Project 25%
Labs 15%
Application Exercises 10%

See course syllabus for how the final letter grade will be determined.

15 minute rule

Support

  • Attend office hours
  • Ask and answer questions on the discussion forum
  • Reserve email for questions on personal matters and/or grades
  • Read the course support page

Announcements

  • Posted on Canvas (Announcements tool), be sure to check regularly (or forward announcements to your email)
  • I’ll assume that you’ve read an announcement by the next “business” day

Diversity + inclusion

  • I want you to feel like you belong in this class and are respected
  • We are committed to full inclusion in education for all persons*
  • If you feel that we have failed these goals, please either let us know or report it, and we will address the issue

Accessibility

I want this course to be accessible to students with all abilities. Please feel free to let me know if there are circumstances affecting your ability to participate in class.

Course policies

 

As long as you meet
the prereqs

Prerequisites

  1. CS 1110 or CS 1112, AND
  2. MATH 1710 or equivalent
  • No prior experience with R is expected
  • You must have general-purpose programming experience AND be comfortable with basic probability and statistical inference

Late work, waivers, regrades policy

  • We have policies!
  • Read about them on the course syllabus and refer back to them when you need it

Collaboration policy

  • Only work that is clearly assigned as team work should be completed collaboratively.

  • Homeworks must be completed individually. You may not directly share answers / code with others, however you are welcome to discuss the problems in general and ask for advice.

  • Exams must be completed individually. You may not discuss any aspect of the exam with peers.

Sharing / reusing code policy

  • We are aware that a huge volume of code is available on the web, and many tasks may have solutions posted

  • Any recycled code that is discovered and is not explicitly cited will be treated as plagiarism, regardless of source

  • All code must be written by you, the human being

Generative AI

Academic integrity

  1. A student shall in no way misrepresent his or her work.
  2. A student shall in no way fraudulently or unfairly advance his or her academic position.
  3. A student shall refuse to be a party to another student’s failure to maintain academic integrity.
  4. A student shall not in any other manner violate the principle of academic integrity.

Most importantly!

Ask if you’re not sure if something violates a policy!

This week’s tasks

Movie trailer