Reorder Buffer and Out-of-Order Execution
Books and Textbooks
Hennessy, J. L., & Patterson, D. A. (2019). Computer Architecture: A Quantitative Approach (6th ed.). Morgan Kaufmann.
- Chapter 3: Instruction-Level Parallelism and Its Exploitation
- Section 3.4: Dynamic Scheduling: Examples and the Algorithm
- Section 3.6: Hardware-Based Speculation
Shen, J. P., & Lipasti, M. H. (2013). Modern Processor Design: Fundamentals of Superscalar Processors. Waveland Press.
- Chapter 5: Instruction Flow Techniques
- Chapter 6: Register Data Flow Techniques
- Chapter 7: Memory Data Flow Techniques
Stallings, W. (2018). Computer Organization and Architecture: Designing for Performance (11th ed.). Pearson.
- Chapter 14: Instruction-Level Parallelism and Superscalar Processors
- Section 14.3: Out-of-Order Execution
Sohi, G. S. (1990). Instruction Issue Logic for Pipelined Supercomputers. IEEE Transactions on Computers, 39(11), 1443-1455.
Foundational Research Papers
Smith, J. E., & Pleszkun, A. R. (1988). Implementing precise interrupts in pipelined processors. IEEE Transactions on Computers, 37(5), 562-573.
- Seminal paper on precise interrupt implementation
Hwu, W. M. W., & Patt, Y. N. (1986). Checkpoint repair for high-performance out-of-order execution machines. IEEE Transactions on Computers, C-35(12), 1496-1514.
- Early work on speculative execution and recovery
Sohi, G. S., & Vajapeyam, S. (1987). Instruction issue logic for high-performance, interruptible pipelined processors. Proceedings of the 14th Annual International Symposium on Computer Architecture, 27-34.
Johnson, M. (1991). Superscalar Microprocessor Design. Prentice Hall.
- Comprehensive treatment of superscalar design principles
Advanced Research
Kessler, R. E. (1999). The Alpha 21264 microprocessor. IEEE Micro, 19(2), 24-36.
- Detailed implementation of ROB in Alpha 21264
Yeager, K. C. (1996). The MIPS R10000 superscalar microprocessor. IEEE Micro, 16(2), 28-40.
- R10000 implementation with active list (ROB variant)
Papworth, D. B. (1996). Tuning the Pentium Pro microarchitecture. IEEE Micro, 16(2), 8-15.
- First mainstream x86 processor with ROB
Gwennap, L. (1995). Digital 21164 sets new standard. Microprocessor Report, 9(3), 11-16.
- Analysis of Alpha 21164 out-of-order execution
Modern Processor Implementations
Intel Corporation. (2019). Intel 64 and IA-32 Architectures Optimization Reference Manual.
- Chapter 2: Intel Microarchitecture
- Section on Out-of-Order Execution Engine
AMD Corporation. (2020). Software Optimization Guide for AMD Family 17h Processors.
- Chapter 2: Processor Architecture and Optimization
- Section on Reorder Buffer and Retirement
Boggs, D., Baktha, A., Hawkins, J., Marr, D. T., Miller, J. A., Roussel, P., ... & Nallapati, G. (2004). The microarchitecture of the Intel Pentium 4 processor on 90nm technology. Intel Technology Journal, 8(1), 1-17.
Koomey, J., Berard, S., Sanchez, M., & Wong, H. (2017). Implications of historical trends in the electrical efficiency of computing. IEEE Annals of the History of Computing, 39(3), 46-54.
Theoretical Analysis
Lam, M. S., & Wilson, R. P. (1992). Limits of control flow on parallelism. Proceedings of the 19th Annual International Symposium on Computer Architecture, 46-57.
Wall, D. W. (1991). Limits of instruction-level parallelism. ACM SIGARCH Computer Architecture News, 19(2), 176-188.
- Fundamental analysis of ILP limits
Rau, B. R. (1994). Iterative modulo scheduling: An algorithm for software pipelining loops. Proceedings of the 27th Annual International Symposium on Microarchitecture, 63-74.
Fisher, J. A. (1983). Very long instruction word architectures and the ELI-512. ACM SIGARCH Computer Architecture News, 11(3), 140-150.
Implementation Studies
Palacharla, S., Jouppi, N. P., & Smith, J. E. (1997). Complexity-effective superscalar processors. Proceedings of the 24th Annual International Symposium on Computer Architecture, 206-218.
- Analysis of complexity vs. performance trade-offs
Farkas, K. I., Chow, P., Jouppi, N. P., & Vranesic, Z. (1997). The multicluster architecture: Reducing cycle time through partitioning. Proceedings of the 30th Annual ACM/IEEE International Symposium on Microarchitecture, 149-159.
Patt, Y. N., Hwu, W. M. W., & Shebanow, M. (1985). HPS, a new microarchitecture: Rationale and introduction. Proceedings of the 18th Annual Workshop on Microprogramming, 103-108.
Butler, M., Barnes, L., Sarkar, D. D., & Sohhi, B. (1991). Single instruction stream parallelism is greater than two. ACM SIGARCH Computer Architecture News, 19(3), 276-286.
Performance Analysis
Riseman, E. M., & Foster, C. C. (1972). The inhibition of potential parallelism by conditional jumps. IEEE Transactions on Computers, C-21(12), 1405-1411.
Tjaden, G. S., & Flynn, M. J. (1970). Detection and parallel execution of independent instructions. IEEE Transactions on Computers, C-19(10), 889-895.
Thornton, J. E. (1964). Parallel operation in the control data 6600. Proceedings of the October 27-29, 1964, Fall Joint Computer Conference, Part II: Very High Speed Computer Systems, 33-40.
Modern Research Directions
Sanchez, D., & Kozyrakis, C. (2013). ZSim: Fast and accurate microarchitectural simulation of thousand-core systems. ACM SIGARCH Computer Architecture News, 41(3), 475-486.
Carlson, T. E., Heirman, W., & Eeckhout, L. (2011). Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation. Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, 1-12.
Miller, J. E., Kasture, H., Kurian, G., Gruenwald, C., Beckmann, N., Celio, C., ... & Agarwal, A. (2010). Graphite: A distributed parallel simulator for multicores. IEEE Micro, 30(1), 44-55.
Online Resources and Standards
Intel Corporation. (2021). Intel Architecture Instruction Set Extensions and Future Features Programming Reference.
- Latest developments in x86 microarchitecture
ARM Limited. (2020). ARM Cortex-A Series Programmer's Guide for ARMv8-A.
- Chapter 4: Out-of-Order Execution in ARM Processors
RISC-V International. (2019). The RISC-V Instruction Set Manual, Volume I: Unprivileged ISA.
- Considerations for RISC-V implementations with out-of-order execution
IEEE Computer Society. (2019). IEEE Standard for Information Technology - Portable Operating System Interface (POSIX). IEEE Std 1003.1-2017.
Survey Papers and Books
Mittal, S. (2016). A survey of techniques for improving energy efficiency in embedded computing systems. Renewable and Sustainable Energy Reviews, 54, 629-660.
Esmaeilzadeh, H., Blem, E., St. Amant, R., Sankaralingam, K., & Burger, D. (2011). Dark silicon and the end of multicore scaling. ACM SIGARCH Computer Architecture News, 39(3), 365-376.
Koomey, J., Berard, S., Sanchez, M., & Wong, H. (2011). Implications of historical trends in the electrical efficiency of computing. IEEE Annals of the History of Computing, 33(3), 46-54.
Academic Course Materials
University of California, Berkeley. CS152 Computer Architecture Course Materials.
- Lecture notes on Out-of-Order Execution and ROB
- Available at: https://inst.eecs.berkeley.edu/~cs152/
Carnegie Mellon University. 18-447 Introduction to Computer Architecture.
- Course materials on dynamic scheduling and speculation
- Available at: https://www.ece.cmu.edu/~ece447/
MIT OpenCourseWare. 6.823 Computer System Architecture.
- Advanced topics in superscalar processor design
- Available at: https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/
Technical Reports
Conte, T. M., Banerjia, S., Loh, S. Y., Menezes, K. N., & Sathaye, S. S. (1995). Instruction fetch mechanisms for superscalar microprocessors. Technical Report, Department of Electrical and Computer Engineering, North Carolina State University.
Austin, T. M., Larson, E., & Ernst, D. (2002). SimpleScalar: An infrastructure for computer system modeling. Technical Report, University of Michigan and University of Wisconsin.
Magnusson, P. S., Christensson, M., Eskilson, J., Forsgren, D., Hållberg, G., Högberg, J., ... & Werner, B. (2002). Simics: A full system simulation platform. Computer, 35(2), 50-58.
Conference Proceedings
International Symposium on Computer Architecture (ISCA) - Various years
- Premier venue for computer architecture research including out-of-order execution
International Symposium on Microarchitecture (MICRO) - Various years
- Key conference for microarchitectural innovations and ROB research
International Symposium on High-Performance Computer Architecture (HPCA) - Various years
- Important venue for performance-oriented architectural research
Industrial Whitepapers
IBM Corporation. (1995). IBM POWER2 Architecture. IBM Corporation.
- Early implementation of sophisticated out-of-order execution
Sun Microsystems. (1995). UltraSPARC-I User's Manual. Sun Microsystems, Inc.
- SPARC implementation with out-of-order execution capabilities
Digital Equipment Corporation. (1996). Alpha 21264 Microprocessor Hardware Reference Manual. Digital Equipment Corporation.
- Detailed hardware documentation of Alpha 21264 ROB implementation