MPI Matrix Multiplication Lab

System Configuration

Number of Processes

Matrix Size

Matrix Values

Animation Speed

Speed: 1.0x

Execution Mode

Simulation Control

Matrix Multiplication: A × B = C

Simulation Logs

00:00:00 System initialized and ready

MPI Code Download

Download complete MPI implementation for matrix multiplication

Configuration

MPI Processes

Number of parallel processes

0×0

Matrix Size

Current matrix dimensions

📱➡️📲

Please Rotate Your Device

For the best experience, please rotate your device to landscape mode to view this MPI matrix multiplication simulation.

🔧 System Configuration

Number of Processes: Choose 2, 4, 6, or 8 MPI processes. More processes = better parallelization for larger matrices.
Matrix Size: Select 4×4, 6×6, or 8×8 matrices. Larger matrices demonstrate parallel efficiency better.
Matrix Values:
- Random Values: System generates random numbers (0-9) for matrices A and B
- Manual Edit: Click on matrix cells to edit values manually for testing
Animation Speed: Control visualization speed (0.25x to 3x) to match your learning pace.
Execution Mode:
- Automatic: Watch the complete parallel algorithm execution
- Step-by-Step: Manually progress through each phase of the algorithm

📊 MPI Matrix Multiplication Algorithm

Phase 1 - Data Distribution (Scatter):
- Master process (rank 0) divides matrix A into row blocks
- Each worker process receives assigned rows + complete matrix B
- Row distribution shown by color coding on matrices
Phase 2 - Parallel Computation:
- Each process computes its assigned rows of result matrix C
- Processes work independently and simultaneously
- Matrix cells light up as calculations complete
Phase 3 - Result Collection (Gather):
- Master process collects computed row blocks from all workers
- Final result matrix C is assembled and displayed
- Performance comparison shows parallel vs sequential timing

🎮 Using the Simulation

Basic Workflow:
1. Configure processes and matrix size
2. Choose random or manual matrix values
3. Select automatic or step-by-step mode
4. Click "Start Simulation" to begin
Manual Matrix Editing:
- Switch to "Manual Edit" mode
- Click any cell in matrices A or B to edit values
- Use "Randomize Matrices" to generate new random values
Step-by-Step Mode:
- Click "Next Step" to progress through each phase
- Read detailed logs for each operation
- Perfect for understanding the algorithm step-by-step

📈 Understanding the Visualization

Process Grid: Shows all MPI processes with their current status and assigned work
Process Colors: Each process has a unique color that matches its assigned matrix rows
Matrix Color Coding: Rows in matrices are colored to show which process handles them
Process States:
- Idle: Process waiting for work assignment
- Computing: Process actively calculating matrix operations
- Communicating: Process sending/receiving data
- Completed: Process finished its assigned work
Matrix Cell Animation: Cells light up as calculations complete, showing real-time progress

🚀 Parallel Computing Concepts

Data Parallelism: Same operation (matrix multiplication) applied to different data chunks
Load Balancing: Work distributed evenly among processes (rows per process ≈ matrix_size / num_processes)
Communication Overhead: Time spent sending/receiving data between processes
Scalability: Performance improvement with more processes (ideally linear speedup)
Master-Worker Pattern: Rank 0 coordinates work distribution and result collection

📊 Performance Analysis

Speedup: How much faster parallel execution is compared to sequential
Efficiency: How well the parallel algorithm uses available processors
Communication vs Computation: Ratio of time spent communicating vs calculating
Optimal Process Count: Point where adding more processes doesn't improve performance
Matrix Size Impact: Larger matrices typically show better parallel efficiency

🎯 Recommended Experiments

Basic Parallel Execution:
- Start with 4 processes and 6×6 matrix
- Use automatic mode to see full algorithm flow
- Observe how work is distributed and collected
Scalability Testing:
- Run same matrix size with 2, 4, 6, and 8 processes
- Compare execution times and efficiency
- Find optimal process count for different matrix sizes
Matrix Size Impact:
- Use fixed process count (4) with different matrix sizes
- Observe how parallel efficiency changes with problem size
- Larger matrices should show better speedup
Algorithm Understanding:
- Use step-by-step mode with manual matrix values
- Create simple test cases (like identity matrix)
- Verify results and understand each phase
Communication Analysis:
- Use slow animation speed to observe communication patterns
- Compare communication overhead with different process counts
- Understand when communication becomes a bottleneck

💾 MPI Code Download

Complete Implementation: Download working MPI C code for matrix multiplication
Ready to Compile: Includes all necessary MPI functions and proper error handling
Educational Comments: Detailed explanations of each code section
Compilation Instructions: How to compile and run with mpicc and mpirun
Performance Measurements: Built-in timing functions to measure speedup

🔍 Key Observations

Load Distribution: Notice how rows are divided among processes (shown by colors)
Parallel Efficiency: More processes don't always mean faster execution (overhead matters)
Communication Patterns: Master-worker communication in scatter and gather phases
Synchronization: All processes must complete before final result assembly
Matrix Properties: Multiplication requires A rows × B columns operations
Memory Distribution: Each process needs its assigned A rows + complete matrix B

❓ Common Questions

Q: Why use more processes than CPU cores? A: Educational purposes - shows communication overhead effects
Q: Why do cells light up at different times? A: Visualizes parallel computation happening simultaneously
Q: What if processes > matrix rows? A: Some processes remain idle (realistic scenario)
Q: Why is sequential sometimes faster? A: Small matrices have high communication-to-computation ratio
Q: How does this relate to real HPC? A: Same principles apply to supercomputers with thousands of cores