Fall 2024
3 credits
Natural Language Processing. This course is self-contained, and provides the essential foundation in natural language processing. It identifies the key concepts underlying NLP applications as well as the main NLP paradigms and techniques. This course combines the core ideas developed in linguistics and in artificial intelligence to show how to understand language. Key topics include regular expressions, unigrams, and n-grams; word embeddings; syntactic [phrase-structure] and dependency parsing; semantic role labeling; language modeling; sentiment and affect analysis; question answering; text-based dialogue; discourse processing; and applications of machine learning to language processing. The course provides the necessary background in linguistics and artificial intelligence. This course is suitable for high-performing undergraduates who are willing and able to learn abstract concepts, complete programming assignments, and develop a student-selected project.
This course is self-contained, and provides the essential foundation in natural language processing. It identifies the key concepts underlying NLP applications as well as the main NLP paradigms and techniques.
This course combines the core ideas developed in linguistics and in artificial intelligence to show how to understand language. Key topics include regular expressions, unigrams, and n-grams; word embeddings; syntactic (phrase-structure) and dependency parsing; semantic role labeling; language modeling; sentiment and affect analysis; question answering; text-based dialogue; discourse processing; and applications of machine learning to language processing.
This course is being offered in two editions, as CSC 495 and CSC 791. CSC 791 students must complete all the requirements for CSC 495 and in addition produce a term paper describing a research topic based on their project.
The research topic could be an elaboration of the main project you work on, a review of the literature on specific aspect of social computing, or your original contributions. There will be opportunities to discuss the term paper (including a brief proposal and draft versions) with me.
My hope is that this project or paper will eventually be of use to you in some additional way, e.g., in your thesis or dissertation, or your PhD written prelim. Please discuss any potential overlap of your paper with your other work with me and the other faculty members concerned. Also, report the overlap within your project proposal and final report.
The course is self-contained. The main informal prerequisite is maturity in thinking about subtle concepts, such as might be gained through experience with conceptual modeling in databases or software.
Prior encounters with AI (knowledge representation and machine learning) or data science will help but aren't necessary.
From long experience, I have discovered that the material in CSC 226 is essential for my courses. Here is a (partial) list of topics that will be assumed: elementary set theory, relations, partial orders, functions, concept of a theorem, propositional logic, and predicate logic.
I recommend you brush up on these topics if you aren't comfortable with them. These topics are covered in CSC 226: Applied Discrete Mathematics. You may review Chapters 1 to 6 from the following book, which is sometimes used as the CSC 226 textbook:
Upon completion of this course, students will be able to do the following.
I will assign +/- grades.
Component | 791 campus | 791 EOL | 495 |
---|---|---|---|
Exams | 30 | 35 | 30 |
Programming | 55 | 55 | 55 |
Homework | 5 | 5 | 5 |
Class participation | 5 | 0 | 5 |
Message-board participation | 5 | 5 | 5 |
Term paper for CSC 791 | 10 | 10 | NA |
Total | 110 | 110 | 100 |
The following programming assignments add up to the programming component of the course grade in the above table. The weights of the assignments are based on their expected complexity. I may change these weights as the semester progresses.
The assignments listed as P1 and P2 in the table below may be combined and each may have more than one deliverable and deadlin.
Assignment | Weight |
---|---|
P1 | 10 |
P2 | 15 |
Project report (R0) | 5 |
Project report (R1) | 10 |
Project report (R2) | 0 |
Project report (R3) and demo | 60 |
Note that report R2 is optional and has zero points. You can submit it at your discretion if you want to receive early comments on your work.
CSC 791 students must submit a term paper worth approximately 9% of the total grade for them. A general-purpose rubric for term papers is here. However, if you base your term paper on the same topic as your semester project, you can submit it merely by extending your final project report by two pages. The project-based option will turn out to be far less work for you, in general.
Range | Grade |
---|---|
98 ≤ Total | A+ |
92 ≤ Total < 98 | A |
90 ≤ Total < 92 | A- |
88 ≤ Total < 90 | B+ |
82 ≤ Total < 88 | B |
80 ≤ Total < 82 | B- |
78 ≤ Total < 80 | C+ |
72 ≤ Total < 78 | C |
70 ≤ Total < 72 | C- |
68 ≤ Total < 70 | D+ |
62 ≤ Total < 68 | D |
60 ≤ Total < 62 | D- |
Total < 60 | F |
The University's regulation on grades and grade point average is available here NCSU REG 02.50.03.
The University's requirements for obtaining a grade of Satisfactory in the case the course is taken for S/U (Credit Only or Pass/Fail). Information about credit-only courses can be available here NCSU REG 02.50.03.
An S (satisfactory grade) requires performance equivalent to C- or better. Otherwise, the grade will be U (no credit grade).
The tentative course schedule is here. The schedule provides the projected schedule of readings, assignments, quizzes, and tests.
Campus students: Please plan to be available in person for all scheduled class times and for the final exam time (set according to the official calendar).
Please note that final project report and one homework are due during the last week of the semester. In addition, we will hold final project demos during the last week of the semester. For students who are expected to submit a term paper: the term paper is due during the last week of the semester.
I hope to use the last two lecture periods for in-class demos. (There is a possibility that these lecture periods go toward catching up if there are problems during the semester, but I hope not.) We (instructor and TA) will invite a few selected project teams to give an in-class demo (think of it as an honor) but those teams may opt out if they prefer not to present in class.
Each project team will get a 10 to 15-minute slot to give a demo of their project (including a presentation). Demos may be held over Zoom or in person, based on logistical constraints (and for online students).
The remaining demos will be held during the blocked times on the course calendar.
The demo schedule will be finalized later; once we determine the available slots, we will collect student preferences and assign slots on a first-come first-serve basis, though with adjustments for some conflicts within the same week, and a scheduling preference for EOL students. You won't be expected to attend demos by other teams, except those in class, but you should arrive early and listen in to any demo preceding yours.
The tentative schedule indicates the main topics along with the estimated number of class days for each topic.
None.
The following materials will be used in this course.
Attendance is mandatory. You don't need to give an explanation for skipping up to three classes. You could lose points for participation if you skip too many classes or attend but do not participate adequately.
Academic integrity is not negotiable. Every piece of work except the project is required to be completed individually.
Review the Code of Student Conduct: NCSU POL11.35.01 and the Pack Pledge.
Violations of academic integrity will be handled in accordance with the Student Discipline Procedures (NCSU REG 11.35.02).
Reasonable accommodations will be made for students with verifiable disabilities. In order to take advantage of available accommodations, students must register with the Disability Resource Office at Holmes Hall, Suite 304, 2751 Cates Avenue, Campus Box 7509, 919-515-7653. For more information on NC State's policy on working with students with disabilities, please see the Academic Accommodations for Students with Disabilities Regulation (NCSU REG 02.20.01).
Online course components include Moodle, Zoom, Google Suite, Google Groups, WordPress, Panopto, Gradescope, and possibly Piazza. Lectures will be recorded and made available on Panopto to current students and future viewers. If you wish to avoid being recorded, please let me know and we can discuss options with the recording team.
We may use plagiarism-detection software tools to review your deliverables. These tools may include iThenticate, MOSS, and others.
Students may be required to disclose personally identifiable information to other students in the course, via digital tools, such as email or web-postings, where relevant to the course. Examples include online discussions of class topics, and posting of student coursework. All students are expected to respect the privacy of each other by not sharing or using such information outside the course.
Students are responsible for reviewing the NC State University Policies, Rules, and Regulations (PRRs) which pertain to their course rights and responsibilities, including those referenced both below and above in this syllabus: