Directory-Based Cache Coherence Protocol Simulator
Follow these step-by-step instructions to understand and explore Directory-Based Cache Coherence using the interactive simulator.
Step 1: Understanding the Interface
Observe the Initial State
- Notice that all processor caches start in the Invalid (I) state
- The directory shows all memory blocks in Uncached (U) state
- All sharer vectors are empty (0000)
- The message log and performance counters are initialized
Familiarize with the Components
- Processor Caches: Four processor cache states displayed with color coding
- Directory Table: Central directory showing state, sharer vector, and owner for each memory block
- Network Visualization: Message flow between processors and directory
- Control Panel: Interface to select processor, operation, and memory address
- Performance Metrics: Counters for cache hits, misses, directory lookups, and network messages
Step 2: Single Processor Read Operation
First Read Access
- Select Processor 0 from the dropdown
- Choose Read operation
- Select memory address Block A
- Click "Execute Operation"
Observe the Protocol Flow
- Step 1: P0 cache miss → Read-Request sent to Directory
- Step 2: Directory (Uncached state) → Data-Reply sent to P0 with data from memory
- Step 3: Directory updates: State = Shared, Sharer Vector = [1000], P0 cache = Shared
- Performance: Note cache miss and directory lookup recorded
Verify Final State
- P0 cache: Block A = Shared
- Directory: Block A = [Shared, 1000, None]
- Message count: 2 (Read-Request + Data-Reply)
Step 3: Multiple Reader Sharing
Second Reader Access
- Select Processor 1
- Choose Read operation
- Select memory address Block A (same as Step 2)
- Click "Execute Operation"
Analyze Sharing Behavior
- Step 1: P1 cache miss → Read-Request sent to Directory
- Step 2: Directory (Shared state) → Data-Reply sent to P1 with data from memory
- Step 3: Directory updates: Sharer Vector = [1100] (both P0 and P1)
- Result: Both processors now share the same data block
Add More Readers
- Repeat with Processor 2 reading Block A
- Observe sharer vector becomes [1110]
- Notice that memory always provides data for shared reads
Step 4: Write Operations and Invalidations
Write to Shared Data
- Select Processor 1
- Choose Write operation
- Select Block A (currently shared by P0, P1, P2)
- Enter data value "0xFF"
- Click "Execute Operation"
Observe Invalidation Protocol
- Step 1: P1 cache miss → Write-Request sent to Directory
- Step 2: Directory sends Invalidate messages to P0 and P2
- Step 3: P0 and P2 acknowledge invalidations, cache lines become Invalid
- Step 4: Directory sends Data-Reply to P1
- Step 5: P1 cache = Modified, Directory = [Exclusive, 0100, P1]
Verify Exclusive Access
- P1 cache: Block A = Modified (exclusive access)
- P0, P2 caches: Block A = Invalid
- Directory: Block A = [Exclusive, 0100, P1]
Step 5: Owner-Based Data Forwarding
Read from Modified Data
- Select Processor 3
- Choose Read operation
- Select Block A (currently owned by P1)
- Click "Execute Operation"
Observe Cache-to-Cache Transfer
- Step 1: P3 cache miss → Read-Request sent to Directory
- Step 2: Directory identifies P1 as owner → Forward-Request sent to P1
- Step 3: P1 sends Data-Forward directly to P3 AND Writeback to Directory
- Step 4: P1 downgrades to Shared, P3 gets Shared copy
- Step 5: Directory updates: [Shared, 0110, None]
Performance Benefits
- Notice direct cache-to-cache transfer (P1 → P3)
- Lower latency compared to memory access
- Automatic writeback ensures memory consistency
Step 6: Performance Analysis
Compare Access Patterns
- Try different sequences: all reads vs. mixed read/write
- Observe message counts for different scenarios
- Note cache hit rates after establishing sharing
Network Traffic Analysis
- Read-only sharing: Minimal ongoing traffic
- Producer-consumer: Frequent ownership transfers
- Hot data: High directory lookup activity
Scalability Observations
- Count messages for N readers vs. bus-based broadcast (N messages)
- Notice point-to-point communication vs. broadcast storms
- Observe directory as potential bottleneck for popular blocks
Step 7: Advanced Scenarios
Ownership Transfer Chain
- P0 writes to Block B → P0 becomes owner
- P1 writes to Block B → Ownership transfers P0→P1
- P2 writes to Block B → Ownership transfers P1→P2
- Observe the forwarding chain and performance impact
Multiple Block Sharing
- Access different memory blocks (A, B, C, D) from different processors
- Observe independent directory entries
- Compare with shared blocks for traffic patterns
Cache Replacement Simulation
- Use the "Invalidate Cache Line" feature to simulate cache evictions
- Observe directory updates when shared copies are dropped
- Note the difference between voluntary and involuntary invalidations
Step 8: Protocol Comparison
Directory vs. Bus-Based Analysis
- Simulate the same access pattern in both protocols (if available)
- Compare total message counts
- Analyze scalability implications
Bottleneck Identification
- Identify scenarios where directory becomes a hotspot
- Observe network link utilization patterns
- Consider the impact of directory placement strategies
Expected Learning Outcomes
After completing this procedure, you should understand:
- Protocol Mechanics: How directory-based coherence maintains consistency without broadcasts
- Message Flow: The specific message types and their purposes in maintaining coherence
- Performance Trade-offs: When directory-based protocols excel vs. their limitations
- Scalability Benefits: Why this approach enables large-scale multiprocessor systems
- Implementation Challenges: Directory storage overhead and potential bottlenecks
Troubleshooting Tips
- Unexpected State: Reset the simulation and carefully follow the step sequence
- Message Count Confusion: Use the detailed log to trace each protocol step
- Performance Anomalies: Consider cache locality and sharing patterns in your analysis
- Directory Bottleneck: Try accessing different memory blocks to distribute directory load
Extension Activities
- Design Challenge: Propose optimizations for common sharing patterns you observe
- Scalability Analysis: Calculate directory storage requirements for different system sizes
- Network Design: Consider how directory placement affects message latency and throughput