Sr Staff Performance Modeling Architect (Load/Store Unit)

About SiFive

As the pioneers who introduced RISC-V to the world, SiFive is transforming the future of compute by bringing the limitless potential of RISC-V to the highest performance and most data-intensive applications in the world. SiFive’s unrivaled compute platforms are continuing to enable leading technology companies around the world to innovate, optimize and deliver the most advanced solutions of tomorrow across every market segment of chip design, including artificial intelligence, machine learning, automotive, data center, mobile, and consumer. With SiFive, the future of RISC-V has no limits.

At SiFive, we are always excited to connect with talented individuals, who are just as passionate about driving innovation and changing the world as we are.  

Our constant innovation and ongoing success is down to our amazing teams of incredibly talented people, who collaborate and support each other to come up with truly groundbreaking ideas and solutions.  Solutions that will have a huge impact on people's lives; making the world a better place, one processor at a time. 

Are you ready?  

To learn more about SiFive’s phenomenal success and to see why we have won the GSA’s prestigious Most Respected Private Company Award (for the fourth time!), check out our website and Glassdoor pages.

Job Description:


Load/Store Unit (LSU) performance architect who has the knowledge and experience with out-of-order loads and stores. Need to understand data hazards to maximize bandwidth and minimize latency.


  • Require 5+ years of direct industry experience in cycle-level modeling of modern core micro-architectures, preferably using a discrete event simulation (DES) framework


  • Strong foundation in computer architecture of high performance out-of-order CPU designs. Awareness of known industry micro-architectures is a plus.

  • Ability to independently analyze performance bottlenecks in micro-architecture and software stack. Awareness of potential security holes is a plus.

  • Fluency in C++ 17, particularly with regard to standard idioms, templates, template specializations for code optimization, STL containers and basic STL algorithms

  • Conversant in Python, sufficient to be comfortable writing object oriented Python scripts on the order of a few hundred lines of code

  • Strong object-oriented programming (OOP) skills, including encapsulation, class coherency, inheritance and polymorphism

  • Strong Discrete Event Simulation (DES) competency, particularly with regard to DES modeling techniques and best practices

  • Strong software optimization and design skills for code efficiency, algorithms and data structures

  • Familiarity with the Gamma et al Design Patterns and basic UML syntax (e.g. Observer, Composite, Adaptor, Proxy, Facade, etc.)

  • Competency in software engineering best practices needed to maintain and refactor very large object oriented code bases

  • Ability to research known techniques in branch prediction and data prefetching, and synthesize new, implementable approaches using those findings, keeping in mind both the performance uplift and also the implementation considerations for improvements

  • Ability to work independently, but also provide clear progress readouts throughout

  • Ability to use and adapt existing models and modeling infrastructure; ability to create new models if needed

  • Branch prediction:

    • Familiar with both TAGE-based and Perceptron-based branch prediction approaches, including the latest CBP and other academic research, for both direction prediction and indirect-address prediction

    • Understanding of gaps and implementation concerns with current best-of-breed branch prediction approaches

    • Able to analyze and innovate new ideas for both performance-at-all-cost designs where timing and accuracy are dominant concerns, as well as balanced designs where accuracy needs to be balanced against the area cost

    • Knowledgeable of branch prediction throughput and latency concerns in modern high-performance processors

  • Data prefetching:

    • Familiar with single-stride, multistride, temporal, spatial, bitmap, Best-Offset, LLC-prefetch, and other data prefetching approaches

    • Conversant in dead-block prediction and L2/L3 cache replacement policies such as LIP, BIP, DIP, and RRIP, and their interplay with prefetching schemes

  • Vector

    • Good working knowledge of SIMD ISAs

    • Preferred familiarity with RISC-V Vector, latest spec 1.0

    • Ability to program using assembler to write custom tests for vector analysis

    • Strong knowledge of distributed SIMD hardware designs


  • Familiar with git and branching/forking methodologies

  • Strong background with Linux-based development environments including python/shell programming

Additional Information:

This position requires a successful background and reference checks and satisfactory proof of your right to work in:

United States of America

Any offer of employment for this position is also contingent on the Company verifying that you are a authorized for access to export-controlled technology under applicable export control laws or, if you are not already authorized, our ability to successfully obtain any necessary export license(s) or other approvals.

SiFive is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.