NCSU brick logo

CSC/ECE 506: Architecture of Parallel Computers

Spring 2025

course schedule: lectures

Toggle Sections

Lecture Schedule

Week 1[ 01/05 - 01/11 ]

[Tue 01/07] Lecture 1: Overview of parallel computation
In-class exercises

Survey and registration for AI chatbot  Submit  
Number of transistors on a chip  Submit  See
Multicore/manycore processor info  Submit  See
Top 500 observation  Submit  

[Thu 01/09] Lecture 2: Three parallel-programming models
In-class exercises

Advantages and disadvantages of SMP organization  Submit  See
Overheads of message-passing  Submit  See
Shared-memory vs. message-passing programming  Submit  See
Reflection  Submit  

Week 2[ 01/12 - 01/18 ]

[Tue 01/14] Lecture 3: GPU architecture
In-class exercises

Best definition of speedup  Submit  See
Amdahl's law example  Submit  See
Upload your answers to practice questions  Submit  

[Thu 01/16] Lecture 4: Shared-memory parallel programming
In-class exercises

Video on team formation in Expertiza  Submit  
The three levels of parallelism  Submit  See
Dependences example  Submit  See
Dependences in truncated 4-point iteration example  Submit  See
LDG for Loop Nest 2  Submit  See
Second dependences example  Submit  See
Reflection  Submit  

Week 3[ 01/19 - 01/25 ]

[Tue 01/21] Lecture 5: Dependences, DOACROSS, DOPIPE
In-class exercises

Dependences in function-parallelism example  Submit  See
Dependences in DOPIPE-parallelism example  Submit  See
Variable scopes - Example 1  Submit  See
Exercise 2: for i tasks  Submit  See
Reflection  Submit  

[Thu 01/23] Lecture 6: Variable scope
In-class exercises

Why is each variable privatizable?  Submit  See
Example 1: Which variables should be declared as shared/private?  Submit  See
Example 2: Which variables should be declared as shared/private?  Submit  See
Scopes in matrix multiplication - for k ||ization  Submit  See
Scopes in matrix multiplication - for i ||ization  Submit  See
Reflection  Submit  

Week 4[ 01/26 - 02/01 ]

[Tue 01/28] Lecture 7: Parallelizing the Ocean application
In-class exercises

Questions about the serial solver  Submit  See
Order of updating points  Submit  See
Concurrency along antidiagonals  Submit  See
Bad ways of exploiting parallelism in Ocean application  Submit  See
Red/black ordering  Submit  See
Does it matter that execution is no longer deterministic?  Submit  See

[Thu 01/30] Lecture 8: Parallelization in three models
In-class exercises

Advantages and disadvantages of assignment options  Submit  See
Block assignment and communication  Submit  See
Block partitioning  Submit  See
Synchronization in the shared-memory program  Submit  See
Barrier synchronization in shared-memory version  Submit  See
Questions about the message-passing program  Submit  See
Typos in message-passing if statements  Submit  See
Reflection  Submit  

Week 5[ 02/02 - 02/08 ]

[Tue 02/04] Lecture 9: Data-parallel algorithms
Online videos

9a. Control parallelism vs. data parallelism [4:53] Watch  
9b. Building blocks for data parallelism [13:16] Watch  
9c. Pointer doubling [10:14] Watch  
9d. Multiplying matrices [5:00] Watch  
9e. Labeling regions in an image [8:01] Watch  

[Thu 02/06] Lecture 10: Parallelizing linked data structures
In-class exercises

Parallelizing operations on linked data structures  Submit  See
Conflict between an insertion and a deletion  Submit  See
Fine-grain locking approach  Submit  See
Questions about insertion with fine-grain locks  Submit  See
Reflection  Submit  

Week 6[ 02/09 - 02/15 ]

[Thu 02/13] Test 1 - 7:30-9:30 PM

Week 7[ 02/16 - 02/22 ]

[Tue 02/18] Lecture 11: Caches
In-class exercises

Direct-mapped cache: field sizes  Submit  See
Fully associative cache: field sizes  Submit  See
Set-associative cache: field sizes  Submit  See
Write policy in two-level caches  Submit  See
Reflection  Submit  

[Thu 02/20] Lecture 12: The cache-coherence problem
Online videos

12a. Bus-based multiprocessors [6:08] Watch  
12b. The cache-coherence problem [2:54] Watch  
12c. Peterson's algorithm [6:53] Watch  
12d. Coherence vs. consistency [7:56] Watch  

Week 8[ 02/23 - 03/01 ]

[Tue 02/25] Lecture 13: Coherence and consistency
In-class exercises

How does write-through guarantee coherence?  Submit  
How many processors on a write-through bus?  Submit  
What happens when a block is ejected?  Submit  
Invalidation vs. update protocols  Submit  
Ordering of operations in two threads  Submit  
Why might A not print as 1?  Submit  
Reflection  Submit  

[Thu 02/27] Lecture 14: Invalidation and update protocols
Online videos

14a. The MSI protocol [14:20] Watch  
14b. The MESI protocol [10:35] Watch  
14c. The Dragon protocol [10:37] Watch  
14d. The Firefly protocol [6:52] Watch  

Week 9[ 03/02 - 03/08 ]

[Tue 03/04] Lecture 15: Physical and logical cache organization
In-class exercises

Steps in cache access  Submit  See
Parallelism in cache access  Submit  See
Alternatives for cache indexing and tagging  Submit  See
Multilevel cache design  Submit  See
Characteristics of inclusion properties  Submit  See
Reflection  Submit  

[Thu 03/06] Lecture 16: Multicore caches: organization & performance
In-class exercises

Hits and misses in set-associative cache  Submit  See
Hits and misses in direct-mapped cache  Submit  See
Coherence misses  Submit  See
Cache changes to reduce miss rate  Submit  See
Effects of increasing line size  Submit  See
Context-switch misses  Submit  See
Logical cache organization  Submit  See
Partitioned shared cache organization  Submit  See

Week 10[ 03/16 - 03/22 ]

[Tue 03/18] Lecture 17: Hardware support for locking
In-class exercises

Performance of test-and-set  Submit  See
TSL vs. TTSL  Submit  See
LL/SC vs. TTSL  Submit  See
Ticket locks vs. array-based queueing locks  Submit  See
Reflection  Submit  

[Thu 03/20] Lecture 18: Barrier implementations
In-class exercises

Permission form for study on dual-submission homework  Submit  
Interest in independent study/thesis topics  Submit  See
Ticket lock with MSI  Submit  See
Scalability at the barrier  Submit  See
Performance of combining-tree barrier  Submit  See
Reflection  Submit  

Week 11[ 03/23 - 03/29 ]

[Tue 03/25] Lecture 19: Memory consistency
In-class exercises

Example: Why is a memory consistency model needed?  Submit  See
Sequentially consistent vs. non-seq. consistent outcomes  Submit  See
Which outcomes are possible under SC?  Submit  See
Prefetching early and late  Submit  See
Reflection  Submit  

[Thu 03/27] Lecture 20: Relaxed memory-consistency models
In-class exercises

Need for relaxed consistency models  Submit  See
Causual-consistency example  Submit  See
Strongest consistency model  Submit  See
How can both processes be killed?  Submit  See
Weak ordering  Submit  See

Week 12[ 03/30 - 04/05 ]

[Tue 04/01] Test 2 - 7:30-9:30 PM
[Thu 04/03] Lecture 21: Caching in DSM machines
In-class exercises

Why doesn't a bus-based design scale?  Submit  See
Why aren't invalidations too slow?  Submit  See
Page placement without interleaving  Submit  See
Directory messages for read and write misses  Submit  See
Merging the directory with the LLC tag array  Submit  See
Reflection  Submit  

Week 13[ 04/06 - 04/12 ]

[Tue 04/08] Lecture 22: Coherence in DSM machines
In-class exercises

Pseudocode for full bit-vector approach  Submit  See
Block states in main memory  Submit  See
Optimizing a full bit-vector scheme  Submit  See
Reflection  Submit  

[Thu 04/10] Lecture 23: The Silicon Graphics S2MP architecture
Online videos

23a. Today's MP architectures [7:20] Watch  
23b. Directory-based coherence [8:41] Watch  
23c. Scaling the SMP model [7:05] Watch  
23d. SGI's Origin [5:55] Watch  
23e. Design issues [9:36] Watch  
23f. Directory organization [5:42] Watch  
23g. Coherence protocol and summary [10:33] Watch  

Week 14[ 04/13 - 04/19 ]

[Tue 04/15] Lecture 24: DSM implementation correctness & performance
In-class exercises

An invalidation to a node that no longer has a block  Submit  See
Transition from state U on a read request  Submit  See
Transition from state S on a readX request  Submit  See
Home-centric vs. requester-assisted approach  Submit  See
Reflection  Submit  

[Thu 04/17] Lecture 25: Caching in multicore architectures
In-class exercises

ReadX in state S or U with non-atomic message  Submit  See
ReadX to EM block with non-atomic message  Submit  See
What's wrong with imprecise directory info?  Submit  See
Increased power consumption and latency  Submit  See
Other problems with stale directory info  Submit  See
Accelerating thread migration  Submit  
Reflection  Submit  

Week 15[ 04/20 - 04/26 ]

[Tue 04/22] Lecture 26: Review
In-class exercises

Three orchestrations of Ocean  Submit  
Coherence and consistency  Submit  
Physical and logical cache organization  Submit  
Four "C"s of cache misses  Submit  
Summing a vector with copy-scan  Submit  
Miscellaneous questions  Submit  
Kahoot questions  Submit  

Week 16[ 04/27 - 05/03 ]

[Tue 04/29] Final Exam - 12:00-2:30 PM
©2007-2022 NC State University | Disclaimer
back to top