course schedule: lectures
Toggle SectionsLecture Schedule |
Week 1[ 01/05 - 01/11 ]
[Tue 01/07] Lecture 1: Overview of parallel computation
In-class exercises
Survey and registration for AI chatbot Submit
Number of transistors on a chip Submit See
Multicore/manycore processor info Submit See
Top 500 observation Submit
[Thu 01/09] Lecture 2: Three parallel-programming models
In-class exercises
Advantages and disadvantages of SMP organization Submit See
Overheads of message-passing Submit See
Shared-memory vs. message-passing programming Submit See
Reflection Submit
Week 2[ 01/12 - 01/18 ]
[Tue 01/14] Lecture 3: GPU architecture
In-class exercises
Best definition of speedup Submit See
Amdahl's law example Submit See
Upload your answers to practice questions Submit
[Thu 01/16] Lecture 4: Shared-memory parallel programming
In-class exercises
Video on team formation in Expertiza Submit
The three levels of parallelism Submit See
Dependences example Submit See
Dependences in truncated 4-point iteration example Submit See
LDG for Loop Nest 2 Submit See
Second dependences example Submit See
Reflection Submit
Week 3[ 01/19 - 01/25 ]
[Tue 01/21] Lecture 5: Dependences, DOACROSS, DOPIPE
In-class exercises
Dependences in function-parallelism example Submit See
Dependences in DOPIPE-parallelism example Submit See
Variable scopes - Example 1 Submit See
Exercise 2: for i tasks Submit See
Reflection Submit
[Thu 01/23] Lecture 6: Variable scope
In-class exercises
Why is each variable privatizable? Submit See
Example 1: Which variables should be declared as shared/private? Submit See
Example 2: Which variables should be declared as shared/private? Submit See
Scopes in matrix multiplication - for k ||ization Submit See
Scopes in matrix multiplication - for i ||ization Submit See
Reflection Submit
Week 4[ 01/26 - 02/01 ]
[Tue 01/28] Lecture 7: Parallelizing the Ocean application
In-class exercises
Questions about the serial solver Submit See
Order of updating points Submit See
Concurrency along antidiagonals Submit See
Bad ways of exploiting parallelism in Ocean application Submit See
Red/black ordering Submit See
Does it matter that execution is no longer deterministic? Submit See
[Thu 01/30] Lecture 8: Parallelization in three models
In-class exercises
Advantages and disadvantages of assignment options Submit See
Block assignment and communication Submit See
Block partitioning Submit See
Synchronization in the shared-memory program Submit See
Barrier synchronization in shared-memory version Submit See
Questions about the message-passing program Submit See
Typos in message-passing if statements Submit See
Reflection Submit
Week 5[ 02/02 - 02/08 ]
[Tue 02/04] Lecture 9: Data-parallel algorithms
Online videos
9a. Control parallelism vs. data parallelism [4:53] Watch
9b. Building blocks for data parallelism [13:16] Watch
9c. Pointer doubling [10:14] Watch
9d. Multiplying matrices [5:00] Watch
9e. Labeling regions in an image [8:01] Watch
[Thu 02/06] Lecture 10: Parallelizing linked data structures
In-class exercises
Parallelizing operations on linked data structures Submit See
Conflict between an insertion and a deletion Submit See
Fine-grain locking approach Submit See
Questions about insertion with fine-grain locks Submit See
Reflection Submit
Week 6[ 02/09 - 02/15 ]
[Thu 02/13] Test 1 - 7:30-9:30 PM
Week 7[ 02/16 - 02/22 ]
[Tue 02/18] Lecture 11: Caches
In-class exercises
Direct-mapped cache: field sizes Submit See
Fully associative cache: field sizes Submit See
Set-associative cache: field sizes Submit See
Write policy in two-level caches Submit See
Reflection Submit
[Thu 02/20] Lecture 12: The cache-coherence problem
Online videos
12a. Bus-based multiprocessors [6:08] Watch
12b. The cache-coherence problem [2:54] Watch
12c. Peterson's algorithm [6:53] Watch
12d. Coherence vs. consistency [7:56] Watch
Week 8[ 02/23 - 03/01 ]
[Tue 02/25] Lecture 13: Coherence and consistency
In-class exercises
How does write-through guarantee coherence? Submit
How many processors on a write-through bus? Submit
What happens when a block is ejected? Submit
Invalidation vs. update protocols Submit
Ordering of operations in two threads Submit
Why might A not print as 1? Submit
Reflection Submit
[Thu 02/27] Lecture 14: Invalidation and update protocols
Online videos
14a. The MSI protocol [14:20] Watch
14b. The MESI protocol [10:35] Watch
14c. The Dragon protocol [10:37] Watch
14d. The Firefly protocol [6:52] Watch
Week 9[ 03/02 - 03/08 ]
[Tue 03/04] Lecture 15: Physical and logical cache organization
In-class exercises
Steps in cache access Submit See
Parallelism in cache access Submit See
Alternatives for cache indexing and tagging Submit See
Multilevel cache design Submit See
Characteristics of inclusion properties Submit See
Reflection Submit
[Thu 03/06] Lecture 16: Multicore caches: organization & performance
In-class exercises
Hits and misses in set-associative cache Submit See
Hits and misses in direct-mapped cache Submit See
Coherence misses Submit See
Cache changes to reduce miss rate Submit See
Effects of increasing line size Submit See
Context-switch misses Submit See
Logical cache organization Submit See
Partitioned shared cache organization Submit See
Week 10[ 03/16 - 03/22 ]
[Tue 03/18] Lecture 17: Hardware support for locking
In-class exercises
Performance of test-and-set Submit See
TSL vs. TTSL Submit See
LL/SC vs. TTSL Submit See
Ticket locks vs. array-based queueing locks Submit See
Reflection Submit
[Thu 03/20] Lecture 18: Barrier implementations
In-class exercises
Permission form for study on dual-submission homework Submit
Interest in independent study/thesis topics Submit See
Ticket lock with MSI Submit See
Scalability at the barrier Submit See
Performance of combining-tree barrier Submit See
Reflection Submit
Week 11[ 03/23 - 03/29 ]
[Tue 03/25] Lecture 19: Memory consistency
In-class exercises
Example: Why is a memory consistency model needed? Submit See
Sequentially consistent vs. non-seq. consistent outcomes Submit See
Which outcomes are possible under SC? Submit See
Prefetching early and late Submit See
Reflection Submit
[Thu 03/27] Lecture 20: Relaxed memory-consistency models
In-class exercises
Need for relaxed consistency models Submit See
Causual-consistency example Submit See
Strongest consistency model Submit See
How can both processes be killed? Submit See
Weak ordering Submit See
Week 12[ 03/30 - 04/05 ]
[Tue 04/01] Test 2 - 7:30-9:30 PM
[Thu 04/03] Lecture 21: Caching in DSM machines
In-class exercises
Why doesn't a bus-based design scale? Submit See
Why aren't invalidations too slow? Submit See
Page placement without interleaving Submit See
Directory messages for read and write misses Submit See
Merging the directory with the LLC tag array Submit See
Reflection Submit
Week 13[ 04/06 - 04/12 ]
[Tue 04/08] Lecture 22: Coherence in DSM machines
In-class exercises
Pseudocode for full bit-vector approach Submit See
Block states in main memory Submit See
Optimizing a full bit-vector scheme Submit See
Reflection Submit
[Thu 04/10] Lecture 23: The Silicon Graphics S2MP architecture
Online videos
23a. Today's MP architectures [7:20] Watch
23b. Directory-based coherence [8:41] Watch
23c. Scaling the SMP model [7:05] Watch
23d. SGI's Origin [5:55] Watch
23e. Design issues [9:36] Watch
23f. Directory organization [5:42] Watch
23g. Coherence protocol and summary [10:33] Watch
Week 14[ 04/13 - 04/19 ]
[Tue 04/15] Lecture 24: DSM implementation correctness & performance
In-class exercises
An invalidation to a node that no longer has a block Submit See
Transition from state U on a read request Submit See
Transition from state S on a readX request Submit See
Home-centric vs. requester-assisted approach Submit See
Reflection Submit
[Thu 04/17] Lecture 25: Caching in multicore architectures
In-class exercises
ReadX in state S or U with non-atomic message Submit See
ReadX to EM block with non-atomic message Submit See
What's wrong with imprecise directory info? Submit See
Increased power consumption and latency Submit See
Other problems with stale directory info Submit See
Accelerating thread migration Submit
Reflection Submit
Week 15[ 04/20 - 04/26 ]
[Tue 04/22] Lecture 26: Review
Lecture notes, etc.
In-class exercises
Three orchestrations of Ocean Submit
Coherence and consistency Submit
Physical and logical cache organization Submit
Four "C"s of cache misses Submit
Summing a vector with copy-scan Submit
Miscellaneous questions Submit
Kahoot questions Submit