SiFive - February 20, 2018

All Aboard, Part 10: How to Contribute to the RISC-V Software Ecosystem

We recently announced the HiFive Unleashed, a development board for Freedom U540-C000, the world's first Linux-capable RISC-V ASIC. The announcement of this board roughly lined up with the first upstream releases of Linux and glibc that contain RISC-V support. As a result, our news has driven a lot of interest from the open source software community -- that was really the whole point of announcing the board in the first place, so in that sense it's working out very well.

This new wave of interest has demonstrated that we don't have nearly enough information on how RISC-V software development is done. We're essentially in the same situation again as to what prompted starting this blog, but with a new set of tools: I originally started the blog when binutils and GCC landed upstream with descriptions of how those ports work for developers who became interested after those ports were released. We now have a whole new set of interested developers who found our Linux and glibc ports and want to know how to get involved.

While there are some technical descriptions of how the RISC-V binutils, GCC, and Linux ports are structured, there is no description of how our development flow works. Since our company's launch, we maintained out-of-tree RISC-V ports for a long time, and it's a bit confusing for new developers just starting with RISC-V as there are a lot of odd practices we used to use while maintaining our out-of-tree ports that we're slowly phasing out. After getting a handful of emails from people asking how they should contribute (including some from SiFive employees :)), I thought it'd be best to describe our development flows.

Now that we're upstream, the development flow is actually very simple: you can contribute to the RISC-V port of a project in exactly the same way you'd contribute to any other port of that project. Thus, if you're already familiar with how a project's development flow works then just keep doing what you're doing -- we read all the relevant mailing lists and bug trackers. It's generally best to indicate your patches are RISC-V related with a subject that looks something like "[PATCH] RISC-V: Fix a bug in..." to make sure your email doesn't get lost in the flood.

If you're not familiar with upstream development for a project, are looking for a more concrete list of things that need to get done, or are interested in distributing RISC-V software then feel free to keep reading :).

Backports, Repositories, Branch Names, and Tagged Releases

We've been trying to follow the same branching and release scheme in all the projects that I maintain, which currently include binutils, GCC, glibc and Linux. We have the following branches (using binutils for reference, as it's the first on the list):

  • master (or trunk, for SVN-based projects like GCC): The main development branch.
  • riscv-all: The RISC-V integration branch, which is based on the master and contains additional patches of any quality level. This branch might not be stable or high quality, but if you've found a bug then you should look through this branch to see if there's an in progress patch to fix it to avoid duplicating work. I don't recommend working directly off this branch as it is automatically generated and churns a lot.
  • binutils-2_30-branch: The upstream release branch. We backport RISC-V patches to this branch when appropriate. Note that the backport criteria tend to be fairly stringent here: the patch must have gone through upstream's code review, and must be a bug fix. That means feature additions and performance improvements generally aren't appropriate for backport to the upstream release branch.
  • riscv-binutils-2.30: The RISC-V release branch. This is based on the actual release from upstream (as opposed to the release branch, which moves) and contains as many RISC-V specific backports as possible. This includes backports that aren't suitable for upstream, such as new features and performance improvements, as long as they're straightforward to backport to older versions of the package. We maintain this branch for one release cycle. If you're looking to distribute RISC-V software then this is the best place to look: it's where SiFive's binary toolchain releases come from and what ends up tagged as stable.

With all these branches it's important to ensure that patches end up properly tracked. Thus we have a bit of process involved in tracking the lifecycle of a patch as it moves from a work-in-progress to a binary release. The life-cycle of a patch is:

  • Create a new wip-feature_name branch in your personal repository while working on your feature. If you're a regular contributor you can submit a pull request to add your repository to the list of places that riscv-all is generated from in riscv-linux-infra.
  • When you think your patch is in good enough shape to be merged upstream, submit it. You can either submit a pull request on github against master, or send a patch via email.
  • The patch will be collected and merged upstream.
  • If your patch should be backported to the upstream release branch, submit another pull request (against binutils-2_30-branch) or send another email (indicating it's a backport) asking for the patch to be backported.
  • If your patch shouldn't be backported to the upstream release branch but should be backported to the RISC-V specific release branch (in this case riscv-binutils-2.30), submit another pull request against that branch with your patch.
  • Eventually we will tag a RISC-V specific release with your patch in it, at which point it will be picked up by distributions (including the SiFive binary toolchain releases).

While I'm trying to impose this flow on all the projects I'm heavily involved with, there are a few differences between the projects in order to adapt them to the relevent upstream development flows.

GNU Toolchain (binutils, GCC, and glibc)

The vast majority of the code for our toolchains, both for embedded systems (based on newlib) and Linux (based on glibc), has been committed upstream and lives at the relevant FSF repositories. Specifically, that means:

  • Our binutils port lives in the sourceware.org git repository. In addition to binutils this repository contains the GDB and SIM ports, but we haven't quite gotten those merged upstream yet so these still exist as backports in the RISC-V GitHub Repository. In addition to the GDB and SIM ports, our GitHub repository also contains the RISC-V specific backport branches.
  • Our GCC port lives in the gcc.gnu.org SVN repository. Aside from day-to-day work, our whole port can be found in this upstream repository. We additionally maintain a git mirror in the RISC-V GitHub, which also contains our backport branches. Note that the main upstream development branch for GCC is called trunk instead of master because upstream uses subversion.
  • Our glibc port lives in the sourceware.org git repository.

The development flow for these projects is pretty straightforward: our ports are upstream and largely complete, and we do the vast majority of our development directly on the upstream master branches. As such, you should really be able to expect to use the main development branches (either master or trunk) of all these repositories together and end up with a working system -- or at least as working as you can expect when mixing together a bunch of development branches.

Despite the vast majority of the RISC-V ports of the various toolchain repositories already having been merged upstream there's still a lot of work that remains to be done. The work falls into three categories:

  • Support for new RISC-V ISAs. This includes the E base ISA everywhere, the RV32I base ISA in glibc, and future ISAs like the V and J ISAs. This is very extensive toolchain work so it's probably not a good place to get started.
  • Support for new features and performance optimizations. Examples here would be GCC tunings, better linker relaxation and optimized glibc string routines. Some projects in here are good places to get started, but we generally need to have good benchmarks in order to demonstrate that an optimization actually helps.
  • Fixes for bugs in our toolchain. Our test suite results are pretty good, so most of the bugs we find here will probably come from bringing up the RISC-V ports of various distributions. This is the best place to get started if you're interested in contributing to the RISC-V toolchain effort.

There is more information on how to get involved with the distribution porting effort below.

GDB

The RISC-V GDB port languished for a little while, but Andrew Burgess from Embecosm recently took over the port and progress has been rapid. Andrew submitted a patch to add RISC-V support to GDB last week, so hopefully we will soon be merged upstream, at which point we can treat our GDB port like a first-class member of the RISC-V software ecosystem!

If you're interested in contributing to the RISC-V GDB port then it's probably best to hop on the code review process and help out directly upstream.

QEMU

Michael Clark has taken over as the primary maintainer of the RISC-V QEMU port since coming on board at SiFive. Over the past few months he's managed to get QEMU up to the latest ISA specifications, fix a whole bunch of bugs, add new device models, and submit the code upstream for review multiple times. We're getting close to having something that's suitable for merging upstream, so hopefully we'll make the next release.

Much like GDB, if you're interested in helping out with QEMU then it's probably best to help out with the code review process and work directly upstream.

Linux

Linux 4.15, released toward the beginning of February, was the first upstream release that supported RISC-V. While this was a major milestone for RISC-V, there's still a long way to go when it comes to getting a full Linux-based system up and running -- essentially it boils down to missing device drivers, but we're missing drivers for every device so you really can't do much at all. Before we get into how to help bring up our devices, it's important to have some base knowledge about how the Linux development process works -- it's a little more complicated in Linux-land than in GNU-land because Linux releases more frequently and, as a result, has a more distributed development process.

Linux development is much less centralized than the development of the GNU toolchain, and is probably much less centralized than any project you're used to. In most projects, there is a canonical repository (maybe git or subversion) that contains the source code, and a set of developers that have write (or commit) access to that repository. Linux works a bit differently: there's one public git repo that contains the canonical Linux source code, but only Linus has write access to that repository. Instead of allowing contributors to write directly to that repository, they're expected to work in their own public git repository and submit pull requests to Linus. While this distributed model was really how git was designed to work, it's an uncommon development flow, so if you're not familiar with it then you might want to read up a bit on the Linux development flow. There are four repositories that are relevant for the RISC-V development flow:

  • kernel.org/torvalds/linux.git: The canonical Linux tree. The master branch on this repository is where releases come from, so it really defines Linux.
  • kernel.org/palmer/riscv-linux.git: The RISC-V development tree. This tree owns everything under arch/riscv and is where RISC-V specific pull requests come from. There are four branches in this tree, listed here in order of stability:
    • master: a copy of Linus' master, up to the latest RC.
    • for-linus: the RISC-V branch that's pulled into Linus' master branch. Code in here is meant to be of the highest quality, as it generally goes into master a few days after it goes into for-linus.
    • for-next: the RISC-V branch that's pulled into linux-next. This follows the standard linux-next rules, so it only contains code that's been reviewed and is essentially ready to go. All commits that go into for-linus must go through for-next first.
    • riscv-all: the RISC-V integration branch, which contains all work-in-progress patches. Since this tree contains work-in-progress patches, it might not even compile, but as we start to get more of our code upstream I hope to make this a fairly stable branch.
  • kernel.org/palmer/linux.git: My personal Linux development tree. This contains the work-in-progress patches that I've picked up (and will eventually clean up and submit upstream) at various stages of development. Branches from this repository are automatically merged together to form for-next and riscv-all in the canonical RISC-V repository, and are eventually submitted to Linus.
  • github.com/riscv/riscv-linux.git: The old RISC-V development tree. This is largely defunct, but I still monitor pull requests in here and move them through the Linux kernel development flow. If you're most comfortable using GitHub for development, then feel free to use this, but, if you like, use the standard Linux development flows

In addition to these repositories we have linux-riscv@lists.infradead.org and #linux-riscv on freenode.

Other Projects

In addition to the projects that I'm directly involved with, there are a handful of other projects that have RISC-V support upstream. While I'm not an expert in how their development flows work, I thought it would be good to provide at least a quick overview of where development happens for other RISC-V projects:

  • FreeBSD has a RISC-V port, and the wiki page appears to be high quality so it's probably the best place to start.
  • LLVM has a RISC-V port, which is led by Alex Bradbury from lowRISC. They have a status page, but I'm not sure it's up to date, as the last update appears to be from September 2017. One specific item to note is that the RISC-V GitHub organization has an LLVM repo, but that's unrelated to the real RISC-V LLVM port.
  • Go has a RISC-V port that is being maintained out-of-tree on a RISC-V GitHub repository. I'm not sure what the status is.
  • The beginnings of an OpenJDK port exist, but it's currently limited to mailing list posts. An out-of-tree implementation from Berkeley exists as well, at least as a slide set :).
  • Newlib has RISC-V port that has been upstream for a while and has been released as a tarball. Kito Cheng maintains this port, along with some help from Jim Wilson at SiFive. Development for newlib happens upstream, and there's a RISC-V GitHub repository that contains various backport branches and work-in-progress features.
  • Coreboot has a RISC-V port, but there doesn't seem to be much new content on their blog about the RISC-V port. Jonathan Neuschäfer is in charge of the port.
  • OpenEmbedded has the beginnings of a port for RISC-V, which is being led by Khem Raj. As far as I know, the most up-to-date RISC-V support lives on Khem's GitHub Repository. There is also some amount of support merged into upstream OpenEmbedded, see the relevant pull request.
  • OpenWRT has an on-going RISC-V port, which is being led by Zoltan Herapi. I'm not sure where to find more information about this port.
  • Debian has a RISC-V port in progress, and there's a wiki page that describes the port and contains a progress log.
  • Fedora has an on-going RISC-V port, which is in the bootstrap phase now. While I haven't used it personally, Jim has used it and it works for him. They have extensive information on their wiki page.

If your favorite project isn't listed here, then I've either managed to forget about it or there isn't a port going on right now. The best way to find all RISC-V software developers is to post on the the software development mailing list, or to hop into #riscv on freenode.