On (Not) Working With Open Source Software Packages

An aside observation on working with open source software packages (which I benefit from on a daily basis. The following is not intended as a particular criticism, it’s me reflecting on things I think I’ve spotted and which may help me contribute back more effectively.)

There are probably lots of ways of slicing and dicing how folk engage with open source projects, but I’m going to cut it this way:

  • maintainer;
  • contributor;
  • interested user.

The maintainer owns the repo and has the ultimate say; a contributor is someone who provides pull requests (PRs) and as such, tries to contribute code in; an interested user is someone who uses the package and knows the repo exists…

The maintainer is ultimately responsible for whether PRs are accepted.

I generally class myself as an interested user; if I find a problem, I try to raise a sensible issue; I also probably abuse issues by chipping in feature requests or asking support questions that may be better asked on Stack Overflow or within a project’s chat community or forums if it has them. (The problem with the latter is that sometimes they can be hard to find, sometimes they require sign on / auth; if I submit an issue to them, it’s also yet another place I need to keep track of to look for replies.)

On occasion, I do come up with code fragements that I share back into issues; on rare occasions, I make PRs.

The reasons I don’t step up more to “contributor” level are severalfold:

  • my code sucks;
  • I have a style problem…
    • I don’t use linters, though this is something I need to address;
    • I don’t really know how to run a linter properly over a codebase;
  • I don’t know how to:
    a) write tests;
    b) write tests properly;
    c) run tests over a codebase.
  • I don’t read documentation as thoroughly as perhaps I should…

Essentially, my software engineering skills suck. And yes, I know this is something I could / should work on, but I am really habituated to my own bad practice, stream-of-consciousness coding style…

One of the things I have noticed about stepping up is that is can be hard to step-up all the way, particularly in projects where the software engineering standards of the maintainer are enforced by the maintainer, and the contributors‘ contributions (for whatever reason: lack of time; lack of knowledge; lack of skills) don’t meet those standards.

What this means is that PRs that work for the contributor but don’t meet the standards of the maintainer, and the PR just sits, unaccepted, for months or years.

For the interested user, if they want the functionality of the PR, they may then be forced into using the fork created by the contributor.

However, a downside of this is that the PR may have been created by the contributor to fix an immediate does, does the job they need at the time, they use it, and move on, but as a goodwill gesture chip the PR in.

In such a case, the contributor may not have a long time commitment to the package (they may just have needed for a one off) so the overhead of building in tests that integrate well with the current test suite may be an additioanl overhead. (You could argue that they should have written tests anyway, but if it was a one off they may have been coding fast and using a “does it work”: metric as an implicit test on just the situation they needed to code to work in. Which raises another issue: a contributor may need code to work in a special case, but the maintainer needs it to work in the general case.)

For the contributor who just wanted to get something working, ensuring that the code style meets the maintainer’s standards is another overhead.

The commitment of the contributor to the project (and by that, I also mean their commitment in the sense of using the package regularly rather than as a one off, or perhaps more subtly, their commitment to using the package regularly and their PR regularly) perhaps has an impact on whether they value the PR actually making it into master. If they are likley to use the feature regularly, it’s in their interest to see it get into the main codebase. If they use it as a one off, or only regularly, their original PR may suffice. A downside of this is that over time, the code in the PR may well start to lag behind that of code in master. Which can cause a problem for a user who wants to use the latest master features and the niche feature (implemented off a now deprecated master) in the PR.

For the contributor, they may also not want to have to continue to maintain their contribution, and the maintainer may well have the same feeling: they’re happy to include the code but don’t necessarily want to have to maintain it, or even build on it (one good reason for writing packages that support plugin mechanisms, maybe? Extensions are maintained outside the core project and plugged in as required.)

By the by, a couple of examples that illustrate this if I return to this idea and try to pick it apart a bit further and test it against actual projects (I’m not intending to be critical about either the packages or the project participants; I use both these packages and value them highly; they just flag up issues I notice as a user):

  • integrating OpenSheetMusic (a javascript music score viewer that is ideal for rendering sheet music in Jupyter notebooks) into music21; an issue resulted in code that made it as far as a PR that was rejected, iterated on, but still fails a couple of minor checks…
  • hiding the display of a code cell in documentation generated by nbsphinx. There are several related issues (for example, this one, which refers to a couple of others) and two PRs, one of which has been sitting there for three years…

Now it may be that in the above case, the issues are both niche and relate to enabling or opening up ways of using the original packages that go beyond the original project’s mission, and the PRs are perhaps ways of the contributor co-opting the package to do something it wasn’t originally intended to do.

For example, the OpenSheetMusic display PR is really powerful for users wanting to use music21 in a Jupyter notebook, but this may be an environment that the current package community doesn’t use. Whilst the PR may make the package more likely to be used by notebook users and grow the community, it’s not core to the current community. (TBH, I haven’t really looked at how the music21 package has been used: a) at all, b) in the notebook community, for the last year or so. The lack of OpenSheetMusic support has been one reason why I drifted away from looking at music packages…)

In the case of nbsphinx which was perhaps developed as a documentation production tool, and as such benefits code always being displayed, the ability to hide input cells makes it really useful as a tool for publishing pages where the code is used to generate assets that are displayed in the page, but the means of production of those assets does not need to be shown. For example, a page that embeds a map generated from code: the intention is to publish the map, not show the code what demonstrates how to produce the map. (Note: hiding input can work in three ways: a) the input is completely removed from the published doc; b) the input is in the doc, but commented out, so it is not displayed in the rendered form; c) the code is hidden in the rendered form but can also be revealed.)

In both the above cases, I wonder whether the PR going outside the current community’s needs provides one of the reasons why the PRs don’t get integrated? For example, the PR might open the package to a community that doesn’t currently use the package, by enabling a necessary feature required by that new community. The original community may see the new use as “out-of-scope”, but under this lens we might ask: is there a question of territoriality in play? (“This package is not for that…”)

Author: Tony Hirst

I'm a Senior Lecturer at The Open University, with an interest in #opendata policy and practice, as well as general web tinkering...

%d bloggers like this: