CSC 495 and CSC 791: Natural Language Processing

Fall 2023

3 credits

Munindar P. Singh


Email address: mpsingh at the-ncsu-domain
Office hours: MW 3:15–4:15PM plus via email
Office: EB II 2252

Catalog Description

Natural Language Processing. This course is self-contained, and provides the essential foundation in natural language processing. It identifies the key concepts underlying NLP applications as well as the main NLP paradigms and techniques. This course combines the core ideas developed in linguistics and in artificial intelligence to show how to understand language. Key topics include regular expressions, unigrams, and n-grams; word embeddings; syntactic [phrase-structure] and dependency parsing; semantic role labeling; language modeling; sentiment and affect analysis; question answering; text-based dialogue; discourse processing; and applications of machine learning to language processing. The course provides the necessary background in linguistics and artificial intelligence. This course is suitable for high-performing undergraduates who are willing and able to learn abstract concepts, complete programming assignments, and develop a student-selected project.

Additional Course Description

This course is self-contained, and provides the essential foundation in natural language processing. It identifies the key concepts underlying NLP applications as well as the main NLP paradigms and techniques.

This course combines the core ideas developed in linguistics and in artificial intelligence to show how to understand language. Key topics include regular expressions, unigrams, and n-grams; word embeddings; syntactic (phrase-structure) and dependency parsing; semantic role labeling; language modeling; sentiment and affect analysis; question answering; text-based dialogue; discourse processing; and applications of machine learning to language processing.

This course is being offered in two editions, as CSC 495 and CSC 791. CSC 791 students must complete all the requirements for CSC 495 and in addition produce a term paper describing a research topic based on their project.

The research topic could be an elaboration of the main project you work on, a review of the literature on specific aspect of social computing, or your original contributions. There will be opportunities to discuss the term paper (including a brief proposal and draft versions) with me.

My hope is that this project or paper will eventually be of use to you in some additional way, e.g., in your thesis or dissertation, or your PhD written prelim. Please discuss any potential overlap of your paper with your other work with me and the other faculty members concerned. Also, report the overlap within your project proposal and final report.

Prerequisite

The course is self-contained. The main informal prerequisite is maturity in thinking about subtle concepts, such as might be gained through experience with conceptual modeling in databases or software.

Prior encounters with AI (knowledge representation and machine learning) or data science will help but aren't necessary.

From long experience, I have discovered that the material in CSC 226 is essential for my courses. Here is a (partial) list of topics that will be assumed: elementary set theory, relations, partial orders, functions, concept of a theorem, propositional logic, and predicate logic.

I recommend you brush up on these topics if you aren't comfortable with them. These topics are covered in CSC 226: Applied Discrete Mathematics. You may review Chapters 1 to 6 from the following book, which is sometimes used as the CSC 226 textbook:

Kenneth H. Rosen, Discrete Mathematics and its Applications, McGraw-Hill, 7th edition, 2012. ISBN 0-07-289905-0.

Student Learning Outcomes

Upon completion of this course, students will be able to do the following.

Course Structure

Grade Determination

I will assign +/- grades.

Component 791 campus 791 EOL 495
Exams 25 30 25
Programming 60 60 60
Homework 5 5 5
Class participation 5 0 5
Message-board participation 5 5 5
Term paper for CSC 791 10 10 NA
Total 110 110 100

The following programming assignments add up to the programming component of the course grade in the above table. The weights of the assignments are based on their expected complexity. I may change these weights as the semester progresses.

Assignment Weight
TBD 1 20
TBD 2 20
Project report (R0) 10
Project report (R1) 10
Project report (R2) 0
Project report (R3) and demo 40

Note that report R2 is optional and has zero points. You can submit it at your discretion if you want to receive early comments on your work.

CSC 791 students must submit a term paper worth approximately 9% of the total grade for them. A general-purpose rubric for term papers is here. However, if you base your term paper on the same topic as your semester project, you can submit it merely by extending your final project report by two pages. The project-based option will turn out to be far less work for you, in general.

Range Grade
98 ≤ Total A+
92 ≤ Total < 98 A
90 ≤ Total < 92 A-
88 ≤ Total < 90 B+
82 ≤ Total < 88 B
80 ≤ Total < 82 B-
78 ≤ Total < 80 C+
72 ≤ Total < 78 C
70 ≤ Total < 72 C-
68 ≤ Total < 70 D+
62 ≤ Total < 68 D
60 ≤ Total < 62 D-
Total < 60 F

Grade Point Average

The University's regulation on grades and grade point average is available here NCSU REG 02.50.03.

Satisfactory Outcome

The University's requirements for obtaining a grade of Satisfactory in the case the course is taken for S/U (Credit Only or Pass/Fail). Information about credit-only courses can be available here NCSU REG 02.50.03.

An S (satisfactory grade) requires performance equivalent to C- or better. Otherwise, the grade will be U (no credit grade).

Course Schedule

The tentative course schedule is here. The schedule provides the projected schedule of readings, assignments, quizzes, and tests.

Please note that one homework is due during the last week of the semester.

Please note that we will hold final project demonstrations during the last week of the semester. The tentative schedule shows a larger time range than we will use—we will narrow it down later. Each project team will get a 10 to 15-minute slot. Demos may be held over Zoom or in person, based on logistical constraints (and for online students).

The demo schedule will be finalized later; once we determine the available slots, we will collect student preferences and assign slots on a first-come first-serve basis, though with adjustments for some conflicts within the same week.

The tentative schedule indicates the main topics along with the estimated number of class days for each topic.

Student Expenses

None.

The following materials will be used in this course.

Readings and software

Late Assignments

Attendance Policy

Attendance is mandatory. You don't need to give an explanation for skipping up to three classes. You could lose points for participation if you skip too many classes or attend but do not participate adequately.

Academic Integrity

Academic integrity is not negotiable. Every piece of work except the project is required to be completed individually.

Review the Code of Student Conduct: NCSU POL11.35.01 and the Pack Pledge.

Violations of academic integrity will be handled in accordance with the Student Discipline Procedures (NCSU REG 11.35.02).

Students with Disabilities

Reasonable accommodations will be made for students with verifiable disabilities. In order to take advantage of available accommodations, students must register with the Disability Resource Office at Holmes Hall, Suite 304, 2751 Cates Avenue, Campus Box 7509, 919-515-7653. For more information on NC State's policy on working with students with disabilities, please see the Academic Accommodations for Students with Disabilities Regulation (NCSU REG 02.20.01).

Digital Course Components

Online course components include Moodle, Zoom, Google Suite, Google Groups, WordPress, Panopto, Gradescope, and possibly Piazza. Lectures will be recorded and made available on Panopto to current students and future viewers. If you wish to avoid being recorded, please let me know and we can discuss options with the recording team.

We may use plagiarism-detection software tools to review your deliverables. These tools may include iThenticate, MOSS, and others.

Online Exchanges among Students and the Instructor and TA

Students may be required to disclose personally identifiable information to other students in the course, via digital tools, such as email or web-postings, where relevant to the course. Examples include online discussions of class topics, and posting of student coursework. All students are expected to respect the privacy of each other by not sharing or using such information outside the course.

Additional NC State Rules and Regulations

Students are responsible for reviewing the NC State University Policies, Rules, and Regulations (PRRs) which pertain to their course rights and responsibilities, including those referenced both below and above in this syllabus: