Block Scheduling
graph TD;
A[SM]-->B[Block];
B-->D[Thread];
- For a given SM, only a limited number of blocks can be assigned.
- Limited SMs and limited blocks per SM, only a limited number of blocks can be executed per device
Barrier Synchronization
__syncthreads() - point where all threads in block reach that location before mmoving to the next phase
Simply put, wait near the barrier until everyone reaches - No one is left behind!!
Deadlocks can happen - no syncthreads within condition