Programming with Plasma

A Better Development Experience with Jupyter Notebooks in VS Code

2024-03-04T00:00:00+00:00

In one of my work projects, we have taken to using Jupyter Notebooks quite extensively as a useful mode for presenting tutorials of our code base and even example use cases. Notebooks allow for users or developers to incrementally step through workflows and see how each stage works in detail. What’s more, they can easily inspect intermediate objects in the workflows, and this can be an important debugging tool. I personally do this all the time.

However, there are a few issues with Jupyter Notebooks in a development setting. First, the default notebook .ipynb file format is terrible for storing in version control. It is basically a JSON file, and the real problem is it tracks far too much metadata that changes depending on the environment in which the notebook is run. The cleanest solution I have encountered for this is a tool called jupytext.

jupytext’s approach is to pair the ipynb Classic Notebook format with a simpler Python source file, called a Text Notebook, which can be more cleanly tracked in version control. Only the text notebook is tracked because the the ipynb notebook can be reproducible regenerated from the Python source file by jupytext. jupytext is able to achieve this by using some special markup in the text notebook. The markup can be conveyed through a variety of formats, but the most popular and robust seems to be the percent format or what jupytext calls py:percent.

Very quickly, notebook cells are delimited by # %%. Cell types can be specified with square brackets after this delimiter, e.g. a Markdown formatted cell is indicated by # %% [markdown]. All together, you get something that looks like this (taken from the jupytext documentation):

# %% [markdown]
# This is a multiline
# Markdown cell

# %% [markdown]
# Another Markdown cell


# %%
# This is a code cell
class A():
    def one():
        return 1

    def two():
        return 2

Note: the text notebook is always a valid Python script regardless of the format of the text notebook. Therefore, you can always run python text_notebook_filename.py, and it will work.

If you use the web interface for Jupyter (Lab or Notebook) and have installed the jupytext package in your environment, then you should be able to directly open any Python source file in percent format and have it rendered exactly as a normal .ipynb notebook. This is what the above file looks like when rendered in the usual Jupyter Notebook (actually JupyterLab) interface:

The jupytext plugin will automatically handle ensuring that the .py source is updated as you make changes in this notebook editing mode. You can also directly “pair” the text notebook .py file with a traditional Jupyter notebook .ipynb file by going to File > Jupytext > Pair Notebook with ipynb document. Once again, jupytext will handle making sure that the two files are synchronised.

If you are happy editing notebooks in the Jupyter{Lab,Notebook} interface, then this is case closed. As long as you only make changes here, jupytext handles the file syncing and saving, and then it is only up to you to commit changes to the text notebook .py file in your version control.

On the other hand, if you are a VS Code user like me and prefer editing notebooks from its interface, the situation is slightly less convenient because VS Code does not interface with the jupytext plugin. You can edit the .py files directly as you would any Python source file without a problem, but if you want to execute this file like it is a notebook, you will need to use VS Code’s Interactive Window feature of its Jupyter integration.

The Interactive Window interface in VS Code… kinda rubbish

So what’s the problem? Well, this Interactive Window interface is demonstrably worse than the Notebook rendering that VS Code also offers. Autocomplete frequently fails in the Interactive Window (right side pane in the figure above), and even when it does work, it tends to not detect any of the non-standard library Python packages you have installed. There have been more than a few people on the internet complaining about these severe deficits, but hey, it might work better for you, so certainly have try before taking my word for it.

The Notebook rendering interface in VS Code… much better

In order to instead use the Notebook interface, you need to have an .ipynb format file, but this won’t be present by default in a repository using jupytext because the whole point is to get rid of tracking this difficult file format. To generate a .ipynb file from a .py text notebook, run

jupytext --sync .py

That command might not work if the script has never been “paired” with a notebook file, so instead run

jupytext --set-formats "py:percent,ipynb" .py

Open the .ipynb file in VS Code, and you will automatically have the Notebook interface.

The last tricky bit is that any changes to the .ipynb will not be automatically synced with the paired .py text notebook that actually gets tracked by version control. There are two complementary options that help with this.

First, you can create a VS Code task that syncs the current notebook and .py script file you are editing. Then, if you set this as the default build task, it will be available to run manually with the shortcut Ctrl+Shift+B. You create this task with a task.json file in the .vscode/ folder of your workspace. It should look like:

{
  "version": "2.0.0",
  "tasks": [
    {
      "label": "Jupytext sync",
      "type": "shell",
      "command": "${command:python.interpreterPath}",
      "args": ["-m", "jupytext", "--sync", "${file}"],
      "problemMatcher": [],
      "group": {
        "kind": "build",
        "isDefault": true
      }
    }
  ]
}

What is nice about this task is that it works both ways. If you want to sync changes that you make in the .py text notebook to the .ipynb notebook, then execute the task when you are in the editor for the .py file and jupytext will handle putting those changes into the .ipynb file.

Of course, it is likely you will forget to manually trigger this after making changes you want to commit. To catch this, you can use the jupytext pre-commit hook that will automatically run the synchronisation of all scripts and notebooks and prevent a commit if these files are in an unsynchronised state. Assuming you use the pre-commit framework for your git hooks, the entry in your .pre-commit-config.yml should be:

  - repo: https://github.com/mwouts/jupytext
    rev: v1.15.1 # CURRENT_TAG/COMMIT_HASH
    hooks:
      - id: jupytext
        entry: jupytext
        args: ["--sync", "--pipe", "black", "studies/**/*.py"]
        always_run: true
        pass_filenames: false
        additional_dependencies:
          - black==23.3.0 # Matches your black hook if you use that

There are a some important notes about this that took me quite a while to figure out. The first is the always_run flag. This is needed because I intentionally don’t track .ipynb files and if the changes have been made in a .ipynb file, then by default this hook will not run because none of the relevant .py files under version control have changed. The other related caveat is that you therefore need to specify the location of all your text notebook files in the args of your hook (i.e the last entry in the list above).

Taking all of these little configuration odds and ends together, my experience developing with Jupyter Notebooks in VS Code is now much better, and perhaps yours will be to. Even if you don’t use VS Code, I imagine the pre-commit hook is something that would add value to your development workflow.

Bonus: Jupytext Configuration

Although not strictly necessary for the workflows above, it is worth mentioning that jupytext itself should be configured so that it doesn’t retain too much noisy metadata. By default, jupytext retains extraneous bits like the Python kernel version in the header metadata of a Text Notebook. As with Classic Notebooks, this will vary depending on who opens the notebook. This is the global configuration for jupytext that I currently use and is contained in pyproject.toml:

[tool.jupytext]
notebook_metadata_filter = "-kernelspec,-jupytext.text_representation.jupytext_version"
cell_metadata_filter = "-all"

Bonus 2: Automatically Triggered Sync in VS Code

Ideally, I would like to not have to remember to synchronise Classic and Text Notebooks when working in VS Code. As soon as I save one, the other should update to match. There default answer from an internet search seems to be the Run on Save extension, but it hasn’t been updated in 5 years, so I am somewhat hesitant to install it. I’m keen to hear if anyone else has solved this in a more elegant manner!

Retrospective on Code Review in Research

2023-07-21T00:00:00+00:00

Code review plays a crucial role in ensuring quality in the software development lifecycle, and it is a great practice for knowledge transfer within teams. However, like with most standard software engineering practices, it is much less prevalent in the world of research software development. It was this observation that led to the creation of the Research Code Review Community (RCRC) by Hollydawn Murray (Health Data Research UK). I would like to take some time in this post to reflect on the activities of the RCRC and some of the results for me personally that came out of my involvement.

The Research Code Review Community

Hollydawn led a rallying call at the beginning of 2021 to everyone in the research software community to build consensus and awareness around good practice in code review. I came across it via a lightning talk at SORSE. Acknowledging that fostering this type of culture practice would require change at many levels, the RCRC (then CRC) set out five working groups:

Diversity, equity, and inclusion
Code review during development
Code review at the time of publication
Recommendations for stakeholders
Training and education

I was involved with the “Code Review During Development” group, or “dev-review” for short. We made steady progress towards defining guidelines to implement code review in the research software development workflow. The final outcome was a website with flowcharts to guide anyone developing research software towards the practice of code review.

Sadly, like many volunteer-driven projects, the entire RCRC started to fizzle out after about a year. It is an interesting topic why this happened in our particular case as with lots of similar projects more broadly, but that is outside the scope of this current post. Personally, it was a hugely valuable experience to collaborate with people from diverse domains, roles, and countries, and it was one of my first true introductions into the power of the “open source” model. However, I was still left with the feeling that our resource needed some further advertisement.

Our dev-review group gave one last kick at the can at CW22, running a workshop to get feedback on the website and raise awareness about the project. But alas, we weren’t able to garner the interest we needed.

RSECon22: Making Connections

The story of the RCRC doesn’t quite end there. We all know how conferences can be great places to connect with peers, and RSECon22 is no different. I was lucky enough to bump into another SSI Fellow, Hannah Williams. In the course of our conversation, Hannah mentioned she had heard about my work with the RCRC and said she would be interested to have the guidelines about code review presented at her institution, the UK Health Security Agency (UKHSA).

Thankfully, Hannah followed up on this, and I was invited to give a talk at the UKHSA Code Review Workshop in December 2022.

UKHSA Code Review Workshop

I delivered these slides (embedded below) at the workshop, and they seemed to be generally well received. It was interesting to see some of the other code review practices happening at UKHSA, which contributed to my own awareness of the state of code review in UK public sector bodies. Being an employee of a public sector body myself, it is useful to know about the practices other organisations in similar positions are using.

Conclusion

My journey in the RCRC showcases the transformative power of collaboration but also some of its limitations when exclusively run by volunteers. I like to think that code review in research has improved because of the RCRC efforts, but admittedly it is only a small step forward. More positively, a chance encounter at RSECon22 facilitated by the SSI Fellowship shows how conferences and fellowships can bridge gaps and extend the impact of community efforts.

Cross-post Notification: BSSw Blog on The Anatomy of a Central RSE Team

2023-02-23T00:00:00+00:00

This is a very delayed notification that I contributed a blog post to the BSSw blog titled The Anatomy of a Central RSE Team. It builds on my summary of the session I gave a talk in at PASC22

Using the Functional Programming Language Elm for Advent of Code 2022

2023-02-18T00:00:00+00:00

It has been over a month now since the joyful time of Advent of Code 2022 has ended.¹ In my third year of participation, I decided that I wanted to try completing the challenges in a functional programming paradigm. Functional programming is not something that has taken much foothold yet in scientific computing², but I have been hearing about it more and more through some of the podcasts and blogs that I consume. Some of its selling points include verifiably correct software (through formal verification), enhanced testability, and robustness.

Elm in particular is a purely functional programming language that boasts “no runtime exceptions in practice” and has come up a few times as quite approachable for those new to functional programming. Sure, it is primarily focussed on web applications, which is far outside my domain of scientific computing, but it is nice to see what other areas of software engineering look like every now and then.

What I will try to convey is some general reflections of my experience doing Advent of Code in a functional language in the hopes it might be informative to both fellow beginners stumbling along and the more experienced experts who shape learning resources available. Invariably, many of my comments are probably specific to the Elm language itself, and I would welcome discussion how much some of the obstacles I approached are endemic to functional programming more broadly.

If you don’t really care about that stuff and just want to see a nice little website with my solutions to the problems (okay, just up to day 7), then you can get that at my deployed web app

Parsing Input

Something I truly didn’t expect to struggle with was parsing input. The Elm community and functional programming seems pretty sold on something called parser combinators. They live in the esoteric world of language parsing, with things like context-free grammars, LR parsers, and a whole load of other technical terms I don’t know being someone without a formal computer science training. I’m sure they are interesting and cool, but my-oh-my they were not easy to use in Elm. Perhaps this is just because of my inexperience with functional programming and parsing in general, but it is certainly not something I have run into with other languages when trying to do what are fairly simple parsing tasks. Take this Parser I wrote for day 5:

stackParser : Parser (List String)
stackParser =
    loop []
        (\crates ->
            oneOf
                [ succeed (\char -> Loop (char :: crates))
                    |. symbol "["
                    |= getChompedString (chompIf Char.isAlpha)
                    |. symbol "]"
                , succeed (\_ -> Loop ("" :: crates))
                    |= symbol (String.repeat 4 " ")
                , succeed (\_ -> Loop (crates))
                    |= chompIf (\c -> c == ' ')
                , succeed (Done <| List.reverse crates)
                ]
        )

The task was pretty simple: take a string like

"[A] [B]    [C]"

and extract the letters into the correct indices of an array. So 'A' -> 0, 'B' -> 1, 'C' -> 3. It took absolutely ages to figure out with Parsers in Elm and is anything but readable. The idea of parser pipelines seems nice in principle. Something like this from the documentation looks great and understandable:

type alias Point =
  { x : Float
  , y : Float
  }

point : Parser Point
point =
  succeed Point
    |. symbol "("
    |. spaces
    |= float
    |. spaces
    |. symbol ","
    |. spaces
    |= float
    |. spaces
    |. symbol ")"

But my experience was that as soon as you need to do anything slightly more involved, they get very complicated very quick. I’ll acknowledge regular expressions also aren’t the correct tool for this task, but I’m not convinced this is the best alternative.

Array Handling

In scientific computing, arrays and operating on them is core to the domain. Arrays are usually treated in a mutable manner (i.e. you can change the values of an array in place), so I expected to take some time habituating to immutable arrays, since immutable data is a tenet of functional programming. And yes, it did take time to get used to, and I was frustrated many a time when I just wanted to loop through some values changing them as I went along. But once I got the ethos and hang of what is effectively “map-filter-reduce”, I could see both the clarity and efficiency of treating array data in this manner. This is a pretty good example from Day 7 which involves a function looking to find the smallest directory that can be deleted to free up the required amount of space:

findSmallestDirToFreeSpace : Int -> Int -> T.Tree Int -> Int
findSmallestDirToFreeSpace total_size required_size tree =
    let
        minimum_size =
            required_size - (total_size - T.label tree)
    in
    T.flatten tree
        |> List.filter (\a -> a >= minimum_size)
        |> List.sort
        |> List.head
        |> Maybe.withDefault 0

I find this really clean and readable. The array flows through each step of the pipeline, and each of those steps is quite simple. However, the last line isn’t great because instead of indicating that a directory wasn’t found through a Maybe or Result type, I am giving up and just returning 0. In general, this was something I struggled a bit with: getting the required pipelines for handling Maybe and Result types. Coming from the C-based world of scientific computing, I’m more used to simple return values from functions indicating failure or success (i.e. 0 or 1) or raising exceptions or errors.

Recurse, Don’t Loop

Related to the immediately preceding point about handling arrays in immutable pipelines, one could still reasonably observe that it isn’t immediately obvious how this replaces our good friend the loop from imperative programming. Fair point! In my brief experience of using a purely functional language, loops tend to be replaced by recursive functions. Perhaps that is obvious to most, but it was an interesting realisation on my part.

Debugging

It goes without saying that debugging is an essential technique for any software developer, and so I was quite surprised to find that Elm does not have any debugger available for it! One is relegated to the technique of print statements³ with the built-in Debug package. But don’t expect these to become visible from the command line. No, you must run the program in the browser, enable developer mode, and find the debug console. I imagine this is standard stuff for web developers, but it was a pain point on my first debugging expedition given I am used to everything happening in an IDE or from the command line.

Deployment

Another area where onboarding to Elm falls short is how you actually release your creations into the wild. Sure, there is the “Elm architecture” guidance on how to make a web application, but I am talking about how one actually deploys that application and makes it accessible on the web. For a language that is focussed on web apps, there is paltry and diffuse information on actually disseminating an honest-to-goodness web application. Perhaps these steps are obvious to everyone in the Javascript and front-end world, but for a lowly scientific software developer like myself, it took far longer to figure this out than it should have.

An implicit and nearly unspoken assumption is that most Elm applications are “SPA”s (Single Page Application). These are quite different from the hierarchy of HTML pages that I am used to when dealing with things like static site generators. A SPA is basically a single HTML page with some embedded Javascript. All requests to the web site are routed through this single page which dynamically serves the correct content based on the request. I don’t claim to understand the internals of how this works, so you will have to settle for this hand-waving explanation or go try to decipher the Wikipedia page.

For the practicalities of deploying a SPA, it is similar to most web sites: get a HTTP server running somewhere and put the appropriate mix of HTML and Javascript in the places it expects. Again, this isn’t particularly difficult, but for someone new to web development it was far from obvious how to get this off the ground. I had to cobble together the steps below from a variety of different sources and scratch my head more than a few times along the way. It shouldn’t have been so difficult to get this information!

Build an optimised Javascript output from your main Elm program: elm make src/Main.elm --optimize --output=dist/elm.js. This instruction is available from a few different places, and some go further with things like uglify, presumably to get further optimisation. In the simple case, my feeling is this isn’t necessary and overcomplicates the deployment process.
Create a simple index.html file that will use the Javascript generated above. I scavenged something suitable from the Elm SPA tool:
```
 lang="en">

   charset="UTF-8">
   name="viewport" content="width=device-width, initial-scale=1.0">


  
  
```
Elm Spa along with Elm Land look appealing because they handle most of the behind the scenes practicalities that I am explaining now; however, if you haven’t started your project using these tools, then they provide no information how you can convert a project to use their frameworks. Therefore, these were not options for me because I copied my project from someone else who made a nice layout for their Elm advent of code solutions. But my main question remains: why isn’t this simple bit of HTML available more readily?!?!
Combine the above with an appropriate web deployment solution. Netlify seems to be the go-to solution at the moment, so that is what I opted for. Complete instructions are available on the Elm Land guide, and the only thing I had to change was the build command in the Netlify configuration file, netlify.toml:
```
# 1️⃣ Tells Netlify how to build your app, and where the files are
[build]
  command = "npm install --global elm && elm make src/Main.elm --optimize --output=dist/elm.js"
  publish = "dist"
   
# 2️⃣ Handles SPA redirects so all your pages work
[[redirects]]
  from = "/*"
  to = "/index.html"
  status = 200
```
The built application goes into the dist/ subdirectory, so that is also where the index.html will need to live from step 2.

Conclusions

Perhaps you can already infer that I was not swept off my feet by Elm. Whilst I really enjoyed some of the functional elements of the language, it was an absence of some essential bits like a debugger and deployment guide that resulted in pain points and will probably deter me from starting any serious projects with the language. I have also read a few stories on Hacker News lately how the language is effectively dead and many are not happy with how the core team has handled some updates, particularly the move from 0.18 to 0.19.

Footnotes

If you aren’t familiar with “Advent of Code (AoC)”, it is an annual recreational programming challenge where a single problem is released each day of advent (December 1 to 25). It’s like an advent calendar for computer geeks. There has been lots of activity and engagement around the problems that thread a whimsical narrative. Find out more at the official website. ↩
This Computational Science StackExchange post nicely summarises some of the reasons why functional programming has not found the adoption and attention in scientific computing like it has in software engineering communities. It also gives some of the relative pros of functional programming as a paradigm. ↩
There is nothing wrong with using print statements for debugging from time to time, but there are many cases where they simply won’t cut it and a full-featured debugger is indispensible. ↩

Software Sustainability in Computational Science and Engineering at PASC22

2022-10-14T00:00:00+00:00

Back at the end of June this year (⏲ where does the time go ⏲), I attended quite a unique and interesting conference called The Platform for Advanced Scientific Computing (PASC) 2022. As the name suggests, it is a highly interdisciplinary conference that pulls together scientific domains with a high reliance on computing—from plasma physics and molecular simulations to economics and data science. In a future post, I am hoping to give a broader picture of some of the content I came across at the conference, but in this post I want to focus on a minisymposium session in which I was a presenter.

The title of the minisymposium was Software and Data Sustainability in Computational Science and Engineering ¹. You can follow the link for the full description of what that means, but my personal distillation of the objective of this session was as follows:

In the context of the explosion of software in research and the complexity of the hardware and software environment upon which this relies, how do we ensure the sustainability of the software produced during research? In particular, what are the human factors that are a challenge for RSEs and RSEng?

This is a multi-faceted topic, and the strength of the symposium was that we each addressed this topic from quite a different level and scope. At the pan-institutional and international scale, there was Dr. Anna-Lena Lamprecht’s talk about policy and community in research software. One tier down from this was my talk looking at how a central RSE team can effectively promote software sustainability at an institution. And finally at the embedded RSE and individual level, there was Hannah Williams’ talk giving an insightful analysis of the trade-offs necessary between expediency and sustainability when operating under extreme time constraints. You can find and cite the presentations using our Figshare collection for the minisymposium.

I will now dive a bit deeper into each presentation and give my thoughts and impressions.

Good-Enough Practice – Reflections on a Pandemic Response

Hannah made a great statement at the start of her presentation: “no one sets out to do a ‘bad’ job”. If that’s so, then why does unsustainable research software exist? One reason is that as individuals and teams, we are constrained by situations and external forces that require trade-offs between good practice and delivery time. This probably better know in the software engineering world as the ETTO (efficiency-thoroughness-trade-off) principle.

We are all constantly making decisions about where our code will sit on the ETTO scale, and in Hannah’s case these decisions had to be made under incredible pressure. She is part of the UK Health Security Agency (UKHSA) and was involved in that department’s COVID-19 response. Government agencies and the public in general were dependent on the vitally important and timely information her team were producing with their analyses and codes. The main factors influencing Hannah’s team’s ETTO decisions were:

time
constantly evolving situations and requirements
staffing and onboarding
data access and quality
inability to release source code because of privacy concerns and lack of priority compared to other objectives.

Those sound pretty familiar to me! Even so, one might think that because this case is a bit extreme, it might not offer any lessons for one’s own software development occurring in far less high-stakes settings. However, my view is the opposite. I think Hannah’s experience provides an insightful lens with which to focus on the absolutely essential good practices that simply cannot be abandoned in a project. According to Hannah, these are:

Sense-checks and informal peer review when automated testing isn’t possible
Openness about limitations of results and including disclaimers
Modular code
Automate processes where possible (i.e. CI/CD)
Some documentation is better than none (even if it is just an email with some usage instructions)
Software community and coding clubs.

I am in near total agreement with all of these “good-enough” practices as they are usually some of the first aspects I try to implement in my own projects (viz. modularity, CI, documentation, code review, testing). In my opinion, automated testing should almost never be abandoned, with an obvious exception being a case like Hannah’s where the re-use of certain codes can be quite low and program verification is a lower priority than model validation. Using model validation and “sanity checks” as a quality assurance technique has been the status quo across science for many years, and it has served us well, but it is important to recognise that it can be inefficient when it comes to teasing out errors in computer code. I have written about the importance of software testing previously, so I won’t go into any further detail here.

Overall, Hannah’s presentation outlined some essential practices that all of us as individual RSEs should reflect on and put into practice. There is also the hint of the need for collective action through communities that I will expand on in the next section.

From the Trenches of a Central RSE Team

As an individual RSE, there is only so much one can do to change the tide of software sustainability. In large institutions like universities or national laboratories, dedicated teams of RSEs that exist independent of any particular research group have started forming to provide the resources and skills needed to improve software sustainability. My talk addressed the strategies and operational model that my particular central RSE team at UKAEA (UK Atomic Energy Authority) uses to improve the development of research software.

First, some context for my team is necessary. Fusion science is a broad domain both in terms of areas of science and spatial and temporal time scales, leading to what I describe as a “heterogenous computing environment”. For instance, creating a high fidelity “digital twin” of a fusion reactor would require approximately 10 TB and 5 million CPUs, putting it firmly in the exascale of computing, and it requires simulations based on plasma physics, materials science, mechanical engineering, etc.

In addition, we face the problems of:

legacy code
lack of awareness and buy-in to good software engineering practice
absence of unified software development policy

These are not unique to our institution, but it is worth sharing our strategy and activities for tackling them, which is summarised in the image below.

Bluteau, Matthew (2022): From the Trenches of a Central RSE Team: Successes and Challenges of Promoting Software Sustainability in a Multi-Scale Computational Setting. figshare. Presentation. https://doi.org/10.6084/m9.figshare.20473515.v1

I suspect like most central RSE teams, our time is predominantly spent on work funded by the projects of other research groups. It is our bread-and-butter, and it is truly important work to do. One of the best ways of spreading software sustainability is practicing what you preach. There is an important subtopic of this work that I call “consultancy”, which deserves mention because it is one way that we can directly influence the structure and planning of research projects so that they consider software sustainability from the beginning rather than as an afterthought, at which point it is much more difficult to correct course. We have advised on project funding proposals in the past and sat on interview panels, both of which plant seeds for future software sustainability. We have also done one-off code reviews, usually on quite large portions of existing code bases with the purpose of getting them fit for release. Whilst we aim to encourage regular code review as part of merge requests in all substantial software projects, this doesn’t always happen, and it is still important to support the projects that haven’t adopted this practice yet. This is a subject close to my heart because it was one of the first “side” projects I got assigned when I initially started as an RSE and because I have been an active participant in the Research Code Review Community.

Although project-funded work takes up the bulk of our time, I place an equal importance on the smaller portion of our work that is enabled by some core funding (the right side of the figure above). Why? Because not being tied to any particular research project means that we can do work that we believe benefits all research groups on site, allowing us to start building a research software infrastructure and baseline that will hopefully make our project work easier over time. The activities under this core funding umbrella are quite varied, and I have been lucky enough to participate in most of them. For example, under the “Community” heading our team runs something called the Coding Discussion Group, which is a monthly meeting dedicated to research software topics. At the moment, this takes the format of a short presentation to inspire subsequent discussion, but in the future, I am hoping to make it more interactive with actual coding activities to facilitate learning through practice, similar to the “coding club” Hannah described in her talk. And as Hannah explained, fostering this software community is a foundational part of good software development because it enables knowledge and skill exchange.

I would be remiss not to also mention the “Training” subtopic. For over five years, our team has been delivering the regular Python-based Software Carpentry Workshop along with some of our own material on automated testing and best practices. Early this year, I lead a successful pilot of the new Carpentries course “Intermediate Research Software Development in Python” as the main thrust of my SSI fellowship, and it has now become part of our regular offering.

It is my hope that this brief summary of my groups activities will be helpful to other groups in their effort to promote software sustainability.

Improving Support and Recognition for Research Software Personnel - the International Landscape

Expanding further the scope of consideration, it is necessary to acknowledge that even central and embedded RSE teams cannot solve all of the problems of software sustainability because they operate in a global research culture that still does not adequately value research software as an output. This means these teams are constantly fighting an up-hill battle and unable to operate at their full potential. The international action and culture change needed to improve this situation is what Anna-Lena’s talk addressed.

She succinctly gave the roadmap for culture change in the slide below:

Lamprecht, Anna-Lena; Barker, Michelle (2022): Improving Support and Recognition for Research Software Personnel - the International Landscape. figshare. Presentation. https://doi.org/10.6084/m9.figshare.20492739.v1

What I particularly liked was her comment that this is meant to be simultaneously a bottom-up and top-down approach, something I have long believed in. At the bottom of the pyramid are the categories of infrastructure and skills (substitute for “User Interface / Experience” in the figure), which are slightly different from the infrastructure and training that were mentioned in my talk above, which was focussed on RSEs providing the services for researchers. Rather, the infrastructure and skills Anna-Lena is talking about are specifically to support RSEs. An example of infrastructure she gave was something like the Citation File Format (CFF) that easily facilitates the citation of a code repository, helping RSEs get cited and therefore closer to proper recognition. This is also something that was produced directly by a few RSEs. Bottom up approach FTW!

On the skills side, Anna-Lena identified a new group called INTERSECT that is looking at what skills RSEs specifically need to develop their career. Again, this is different from more general research software skills that both researchers and RSEs require, like the foundational common ground the Carpentries has built. I have not had much contact with this group yet, but I am very interested in getting involved. It strikes me as a nice mix of both bottom up and top down approaches. It is supported by the NSF (top-down) but was presumably first submitted by a group of researchers (bottom-up).

Next in the pyramid is communities, a theme that arose in all three presentations. This highlights that communities are needed at many different levels in order to support software sustainability. As RSEs, we are quite lucky that an extensive international network of national RSE societies has developed with the movement. Wherever you are in the world, there will be a society you can join as an RSE to get support and network with other RSEs. These national and international communities are essential because not every institutions or research area will have an appropriate community for an RSE to join.

Moving one more level up the pyramid, we come to incentives. I think incentives are closely linked with recognition, and this is confirmed by the fact that Anna-Lena mentioned things like the Hidden REF and progress in the Netherlands towards considering all outputs (including software) when making decisions about hiring and promotion. Ultimately, software sustainability depends in large part on RSEs being widespread in research, and that can only happen when they are properly recognised and rewarded for their contributions to research.

Finally, at the top of the pyramid for creating culture change is policy, which encapsulates most of the “top-down” aspects of this framework. It is the policies of funding bodies, universities, and publishers that create the environment in which research culture forms, meaning there is only so much that research culture can deviate from the limits enforced by this policy environment. Concerted advocacy and lobbying over many years will be needed, and Anna-Lena pointed out organisations doing this work like the Research Software Alliance (ReSA), Software Sustainability Institute (SSI), and the FORCE11 Software Citation Working Group. I would add that many national RSE societies are also engaged in policy advocacy, a specific example being the discussions that SocRSE have had with funding bodies in the UK.

Those paying attention will notice that data sustainability is in the title but not really mentioned elsewhere in the post. This is because the session itself really focussed on the software side of this problem. Historically, there has been much more done on the data side (e.g. FAIR principles) in research than the software side, so it feels natural there has been a shift. Obviously, the two are intertwined, and both require the success of the other to fully achieve their aims. ↩

Review of Piloting a New Course: Intermediate Research Software Development in Python

2022-04-25T00:00:00+00:00

At the end of January and beginning of February this year (2022), I piloted a new course at my institution which sought to teach intermediate-level software development skills to researchers. In the immediate aftermath, I posted a short thread of tweets on Twitter to share some of the experience of running the course.

Last Wednesday, I finished delivering a pilot of a new course at @UKAEAOfficial with the help of two colleagues from @RSECulham (Kristian Zarebski and Sam Mason). Some initial impressions and further details... 🧵
— Matthew (@mattasdata) February 7, 2022

Understandably, this thread conveyed my initial impressions of how the course went, but a further and more in-depth analysis was always planned. So, here I am to make good on that intention and pick apart the course delivery, successes, and challenges in more detail. I’ll be quoting the content from the thread below because it introduces much of what I want to discuss. The target audience for this post is other people who want run this course at their institution.

The course is titled “Intermediate Research Software Development in Python” and you can find all of the material freely online. It is important to clarify that this is a course about software engineering practices and not more advanced features of Python. Python is merely the sandbox in which to demonstrate and learn the intermediate-level skills relevant to most forms of research software development in all languages.

Context and Motivation

❓Why? This was hatched as part of my SSI fellowship proposal. I identified a gap between what is taught by the essential, introductory @SoftwareCarpentry courses and the skills that researchers at my lab require day-to-day. Things like IDE use, automated testing, virtual environments, project structure and design, etc. The material developed by the SSI overlapped almost perfectly with these needs.

There isn’t much to add to this, except my own personal journey of learning the necessary software development practices for being a researcher. A Software Carpentry workshop at the beginning of my PhD gave me a solid foundation to work from, but I quickly found that the code for my research would require more than a novice understanding of version control, the Unix shell, and Python. Although I enjoyed independently learning about software engineering during my PhD, I often felt uncertain about where to find authoritative content on how researchers should be developing their code at an intermediate-to-advanced level. Sure, StackOverflow is great, but it doesn’t provide development path for someone follow. What is important to learn, and what should be the order of those topics? The shortcomings were obvious: I completed my PhD having never done, let alone heard, about automated software testing, and my approach to design and architecture was ad hoc and completely shaped by the code within my research group. Given how important I think testing is for software and the fact I was developing library codes, that was simply not acceptable—but certainly not all my fault!!!

A direct consequence of this personal shortcoming was my desire to rectify it for other present and future researchers, and hence, when I moved into an RSE role, I finally had the time and resources to direct towards achieving that through the delivery of training courses. This culminated in my SSI Fellowship 2021 and the formalisation of a plan to delivery an intermediate-level course at UKAEA.

Now you know why, so let’s get down to the practicalities of the course.

Scheduling

📅 Scheduling? The course was run over four separate, half-day sessions in the afternoon, split equally across two weeks (Tuesdays and Wednesdays). This was by far the preferred format from the post-workshop survey. It gave learners the time blocks they needed to focus on the material, while allowing flexible time between sessions to digest material and catch-up if needed.

These comments still hold true, but it is nice to see the other options that learners were choosing from, and how they ranked them following the course:

The second choice is “4 afternoon sessions, 1 session per week”. I can see this being suited to advanced learners who simply want a quick introduction at the beginning and then just sit in breakout rooms with some helpers as they read through the material and do the exercises. These are also the learners who tend to have busier schedules, likely being higher up the seniority ladder, and therefore a single session per week is much less of a time commitment. Although I see the potential value of this scheduling, it is unlikely we will shift to it unless it is clear more learners want it.

The third option is “4 afternoon sessions, all in the same week”. My main criticism of this is that it would make for quite an intense week, and the likelihood of scheduling conflicts increases greatly: someone is going to have a weekly meeting that clashes with one of the slots. While it isn’t fatal to step out of one of the sessions for a meeting, the process of catching up puts a strain on both learners and helpers/instructors.

It is quite interesting that both of the “full day” options came in at the bottom. Combined with the long form comments in the feedback, it is obvious that learners appreciated having the morning half of the day to use as they pleased, whether that be for usual work, or catching up on course content from the previous day. The same can be said about having the course split across two weeks: the interim time could be used for catch-up.

However, it is important to acknowledge the potential bias in the answers to these questions. Participants have only directly experienced one of the schedule formats, so they will undoubtedly tend to prefer that scheduling. Regardless, it is fair to conclude that they did not dislike the format nor was it inconvenient.

Teaching Format

📑 Format? A previous pilot of the course I helped with used longer breakout rooms for entire sections of the course with learners reading through the material. While this worked well, I wanted to see if a bit more instructor-led content might improve things further. So, I created a set of Jupyter-notebook slides to accompany the course website content: https://github.com/ukaea-rse-training/python-intermediate-development/tree/ukaea-instructor-led/slides Free for anyone to use. These give some guidance for introducing individual episodes, when to send learners into breakout rooms, and what material they should cover.

What I critically failed to mention was that in both cases the courses were run completely remotely via Zoom, and also in both cases breakout rooms played an essential role in the delivery of the course. It is important to clarify some terminology as well: a section is the top level division of the course, and each section is composed of episodes.

Therefore, the two key difference in the format of delivery was first the use of slides to introduce each episode and in some cases eliminate the need for learners to read content, and second the length and number of breakout room sessions. I like to think of the main difference being in terms of the frequency of “synchronisation” points. In the case of the UKAEA course that I instructed, the synchronisation between breakout rooms tended to happen at the beginning and end of each episode, whereas the SSI delivered courses are synchronised at the beginning and end of each section. Again, these are both equally valid approaches that I think have their respective merits and drawbacks.

What I like about the more frequent and shorter breakout rooms is that it contributes to a cohesive and collective feel to the course. If the breakout rooms are long, then it is effectively like each breakout room has their own unique experience of the course that isn’t shared with the other groups. By returning to the main room more frequently to go over some high-level concepts and share discussions from the breakout rooms, there is more collective experience of the material.

On the other hand, more frequent synchronisation means that advanced learners will more often be kept waiting in breakout rooms with dead time, even though the total amount of time they might wait is the same in either case. In the single breakout session per section format, advanced learners have a single block of time during which they might wait, and they can more effectively put that time to good use rather than the salami slicing that happens in the other format.

Moreover, whilst there was a lot of valuable discussion from the “report out” at the end of breakout sessions, it did throw off the timing of the course, so there will need to be some further reflection about whether it will be possible to continue to do this in future iterations or how it might be fit in better.

Overall, the feedback supports that the teaching style with slides was effective. The graph below shows the level of agreement with the statement “The mix of instructor-led tuition and independent study and exercises was effective”, with 1 = completely disagree and 5 = completely agree.

Asking a slightly different question showed similar results: “If the current delivery of the course was said to represent a “5” on the scale below, where do you think the balance of instructor-led to independent study should be for the course? (0 - more instructor led, 10 - more independent study)”

This figures suggests there might be a slight preference for their being more instructor-led components for the course, but likely the sample size is not large enough to firmly conclude this. Based on written feedback, it seems the main reason for why the instructor-led components were valuable was because they eliminated some of the reading required from the main course pages. Reading fatigue is one common point of feedback from the few iterations of this course I have helped with.

Another important component of the course was a shared Markdown document. This is fairly standard practice for Carpentries courses, but it is worth remembering that even more advanced learners can still find this tool useful. In fact, because there is even more detail and breadth of topics at an intermediate level, it could be argued that a document like this is even more important because it allows for references and short explainers to be added for which there were simply not enough time for in the course itself. Also, the shared document was used to outline the the exercises that would be done during the course, giving learners an overview through which they could browse and consult if they every lost the thread of the course. I am hoping to include a markdown template in the course materials at some point in the future.

Finally, a comment about what is to come. Moving to in-person learning will require some adaptation. Zoom breakout rooms are actually somewhat difficult to recreate in real life. Having seating arrangements that allow participants to be in small groups but then also access the slides delivered by the instructor is non-trivial. So, there needs to be thought about which venue is being booked for the training.

Feedback

💬 Feedback? We collected daily feedback from learners, and from initial inspection it is overwhelmingly positive. The breakout rooms with helpers was a consistent feature that learners found helpful. Some preferred a verbal delivery of the contents (i.e. using the slides) while others were quite impressed with the website content and happy to read through themselves whilst asking questions in breakout rooms. It is likely we will alternate between both formats in the future.

Having reviewed the full feedback, it is likely we will stick solely with the slide-based delivery of this course. To accommodate different learner pace, we will instead try to get some “additional exercises” into the course content that advanced learners can do while they wait. This feature has come up in other sessions of the course I have helped with, so it is likely to happen.

Universally, the content was seen to be useful and appropriately targeted. However, Section 3 (Software Architecture and Design) of the course does deserve some special mention. This is undoubtedly one of the more content dense sections, with modules on programming paradigms like Object-Oriented (OO) and Functional Programming. The figures show that this was perceived to be too much detail, slightly more difficult, and probably not quite what learners where expecting.

The lesson developers are aware of this from other runs of the session, so I am certain this will get ironed out in the long run. The main way that I will seek to mitigate this in the short term is to give more time for this section, whilst perhaps starting to make an attempt at unifying the exercises from the functional and OO episodes with the main “inflammation” project that is used in the rest of the content.

Improvements

📈 Improvements? VSCode seems to be a more popular editor at our institution, so we want to adapt the material to that. Keeping to time was difficult. It is likely the course needs one or two extra half-day sessions when delivered like this.

There is certainly still the desire to get the lesson material compatible with VSCode users. I am hoping to address this prior to the next iteration of the course in May.

For this next run, I have added an additional half-day session (5 half-day sessions in total), to ease the time constraints slightly. This is also more in line with how the course authors are now running the course.

Acknowledgements

It imperative that I thank some important people who made this possible. First, the original authors of the course material, Aleks Nenadic, Steve Crouch, and James Graham from the Software Sustainability Institute (SSI) deserve huge credit for developing this incredible resource and making it freely available for all to use. Second, my two colleagues from the UKAEA RSE Team, Kristian Zarebski and Sam Mason, have my gratitude for being helpers of the course and ensuring the breakout rooms were such a valuable learning environment.

Notes from the SSI Collaborations Workshop 2022

2022-04-11T00:00:00+00:00

Below are my lightly touched up notes from the SSI (Software Sustainability Institute) Collaborations Workshop 2022.

Figshare portal with all of the presentation materials from lightning talks, keynotes, etc

Day 1

Both keynotes available on YouTube

Collaborative notes with lots of useful links to other content from the sessions

Keynote on Code Review

Slightly misnamed because it was actually about code verification for publications
“Code execution during peer review”
Daniel Nust (Researcher, Institute for Geoinformatics, Uni of Munster)
Talking about CODECHECK and then Reproducible Agile (which is confusingly something for geoinformatics and not the project management methodology)
Slides: https://codecheck.org.uk/slides/cw22-keynote-daniel-nuest.html
The PDF produced for publication do not facilitate reproducibility and are poor representations of what happens leading up to publication and the scientific method (in computational science)
To rectify, there have been a number of efforts to check code output that have popped up, basically to check that the code associated with a paper can actually be run and not necessarily whether it produces accurate scientific results
- so this is more about reproducibility
ReproHack is something associated to this
Some interesting questions about how this initiative also relies on the free labour of academics
- the answer to this was that at least the free labour wouldn’t be supporting a parasitic for-profit publishing company

Keynote on Ethics

Pamela Ugwudike
IEEE Global Initiative’s mission: ensure every stakeholder involved in the design and development of autonomous systems is trained in and able to prioritise ethics and benefit to humanity
“new and emergent software should do no harm” – the Hippocratic Oath for SE and AI/ML
- but who decides harm?
- there needs to be an approach rooted in the context of where the software will be applied to determine the ethical norms applied in the software
Concept of “Digital Capital”
- another (newish) form of power in our society
- those with digital literacy and software development skills increasingly dictate many social processes

Lightning Talks

Rebecca Grant (F1000) - Making an impact: Software Tools Articles at F1000 Research
- The software tools article is a new journal article type that focuses on the software tool and not the scientific result
How green is an event?
- CO2 calculator for individual attending an event: https://cutt.ly/rg-co2-calculator
- not the quickest calculator when I tried…
- github.com/RemotelyGreenOrg
Eli Chadwick
- FAIR data analysis in muon science
- using a tool called Galaxy

Discussion Session

Notes document: https://docs.google.com/document/d/1Y7oEgNskVznSJDu7PDLaEeHi-cPvTb8C_DzogPYVcWA/edit#
The theme for our group was: “What are the practicalities of introducing researchers to code review?”
Some interesting perspectives from a PhD student in a group that is very hostile to anything that focuses on software sustainability
- it is seen as wasting time because it doesn’t result in publications
- as a result, we talked a lot about the current academic culture and it near complete reliance upon publications as a metric for reward and assessment
- it was acknowledged that this student simply has to take the personal hit of focussing on the software, and the impact upon relationship with PI and career advancement
There is then the time constraints that researchers operate under, and adding code review is yet another item on the list
- consequently, there needs to be a strong sell on the benefits, and long term this is something that needs to be written into job descriptions
- one group member mentioned that they did have a central RSE team available, but it could take about 6 months to get a code review done!!!
  - so, have flexible RSE teams that can take these sort of short term projects is important
We eventually got around to writing a blog post looking at some of the barriers for introducing code reivew
1. Time commitment (above)
2. Willingness and culture (above)
3. Finding reviewers: it isn’t always possible to get someone from your group, and venturing outside your group will run into a host of issues
4. Clarifying the goals of the review
  - there can’t be an expectation of doing a complete verification of the code base (this is more for testing)
Sidebar great idea from someone in the room: do a project Euler problem each month and then share the solutions to that
- a way to diminish code shyness
- could be good for CDG???

Day 2

Collaborative notes

Panel on Ethics

Case study about where ethics was not applied or considered
- in Nairobi, there was a live “experiment” of trying to discover how to get landlords in Nairobi to pay their water bill
- it ended up with people living in these buildings having their water cut for 9 months or so
- the researchers wanted to see if the tenants would have the bargaining power to force their landlord to pay the water bill, without considering the fact that these are already marginalized people with little power
ethics is not just a box-ticking exercise, not ‘one and done’
it is important that ethics has the power to stop a study!
researchers are not passive participants in the research and need to consider ethics from the beginning, not just as some requirement put on them
mention of how Turing Way has contracts that include contribution to open source
“reflexive” (reflective?) documentation exercises
- ways to identify stakeholders and thinking about them
- what they didn’t get to was how to engage with stakeholders (in particular those who the software and research will impact)
RSEs in particular might be brought into a project where the ethics have already been cleared
- it is essential to raise any concerns even at this “late” juncture
MTurk workers are a form a of Data Enhancement Workers, and there are ethical considerations around the fact that they are not given much credit in the results of ML (nor compensated well by companies that benefit from the ML systems)

Collaborative Ideas Session

Our group was in the “Interdisciplinary” category but we ended up being a bunch of Physicists and HPC people 😅️ (with one Biologist!)
After a meandering discussion, we eventually landed on the idea of “Code Review Cupid”
Problem: Finding code reviewers is difficult. Code reviewing when there are more researchers than RSE is not sustainable, so there needs to be leveraging of researchers reviewing the code of other researchers
Solution: create a matchmaking service for researchers looking to review code and have their code reviewed
- researchers/coders would create a profile with some basic relevant skill sets collected into a small database
- use a matching algorithm to make suggestions to both reviewers and coders
- supporting resources on how to perform code review

Lightning Talks 2

Stephan Druskat
- metadata for software publication
- http://software-metadata.pub/
M-x Research
- a support group for emacs users in research software
- https://m-x-research.github.io/
Datalad: basically like better Git-LFS???
- sort of, but built on git-annex and is meant to solve the higher-level problem of tracking a data object throughout a workflow
- https://datalad.org
Building bridges in Matrix
- I should really join this: https://joinmatrix.org/guide/

Miniworkshop/Demo Session 1

I helped run a feedback session on the website produced by the Code Review Community about doing code review during research (of which I was a main contributor)
Website: https://researchcodereviewcommunity.github.io/dev-review/
Notes document: https://docs.google.com/document/d/1bmb-qfRJAPFB4y5d1DTvVYwk3mcVCE1PI971oikMaEw/edit#
Intro to the session and the website: https://researchcodereviewcommunity.github.io/CW2022-miniworkshop/slides.html#/title-slide
Main portion was going into breakout rooms
- first 10 minutes was reading a portion of the website
  - Finding a Reviewer
  - Meet and Agree on Objectives
  - Code Author Communicates Code and Context
  - Reviewer Reviews Code
  - Author and Reviewer Meet
- then, we had some targeted questions to spark discussion (and gather feedback)
Result: some really great ideas for improving the website and our presentation of code review for researchers
- e.g. creating a markup of a piece of code with reviewer comments, creating some videos of good and bad tone from a reviewer during a meeting

Day 3

Miniworkshop/Demo Session 2

Technical Debt Talk

notes doc: https://docs.google.com/document/d/1dry3-jF_SD4TANIxu-RYv4CBA74q6leCodyBWfsgEqI/edit
big take home was opening up to refactoring and being more agile
something interesting to check out: https://hyperpolyglot.org

Common Workflow Language Novice Tutorial

notes: https://docs.google.com/document/d/11YeLNc40MI1U-wdZCe2uqlFGfinkZgiWUzMioq0cyRY/edit#
basically, a yaml based language for creating data analysis workflows
interesting, but there were some execution and timing problems with the demo
the input language itself is a bit repetitive
are workflow constructions more suited to GUI?

Testing in Research Software: Review of ICCS 2021 Part 2 and SeptembRSE

2021-11-29T00:00:00+00:00

On 16-18 June 2021, I attended the International Conference on Computational Science (ICCS) 2021 and subsequently wrote a post summarizing the first part of the Software Engineering for Computational Science (SE4Science) track. True to my word, I am back to complete the review of that track. The remaining part consisted of a speed blogging session, and my group focussed on software testing. Before ploughing ahead, it is worth mentioning that a lot of time has passed since this session, and notably the annual RSE conference returned in online form as SeptembRSE. Unsurprisingly, there was also an event at SeptembRSE that touched on software testing, so it seems natural to hit two birds with one stone and briefly review that event as well.

Speed Blogging at SE4S Track of ICCS

The notes from the speed blogging session are openly available in a Google Doc. Like many such sessions, we took quite a winding road across a variety of subjects, but I think predominantly we grappled with the question of how to make software testing, and particularly automated testing, accessible and relatable to researchers who already spend time on scientific verification and validation of their codes. Evidently, this is a large subject to unpack, but ultimately we decided to narrow down to a single factor that could help accessibility: expectations.

There are myriad different forms of software testing out there, and if a researcher or RSE doesn’t know which type of testing is appropriate for the software they are writing, then they will become overwhelmed with the options available and simply not do it. As a result, we came up with the idea of having a chart/table that provides a guide for what types of testing are expected at different maturity levels of research software. Breaking down maturity levels is somewhat arbitrary but an important task when communicating expectations. We ended up using something quite close to the DLR Application Classes:

Level 0: Personal use
Level 1: Research within a team
Level 2: Supported library for research community
Level 3: Product formally released to broad audience
Level 4: Critical application for operation

After agreeing on these levels, we then started to assign different testing types to the levels and to the transitions between levels. What eventually resulted after some tidying up by Neil Chue Hong is this table.

Table II from Chue Hong, Neil Philippe, Bluteau, Matthew, Lamprecht, Anna-Lena, & Peng, Zedong. (2021). A Framework for Understanding Research Software Sustainability. Collegeville Workshop 2021, Online. Zenodo. https://doi.org/10.5281/zenodo.4988277 .

For me, it is the leftmost column “Testing Approaches” that delivers the practical value. It tells a miniature story about how the testing should evolve as your software project matures. It starts in a familiar place for personal software projects at level 0: do some manual and interactive checking of the results from your code against some sort of oracle or reference. This is what all researchers already do; however, as soon as the use of the software expands beyond the individual, there needs to be work towards formalising tests in the software. For example, you want to share an analysis routine you just wrote with someone in your research group. Before you do that, this chart suggests that you write something that tests your routine. It doesn’t need to be a test using a formal testing framework, but there should be something codified to test that the routine works correctly. It could be a script that provides some well studied input and output to your routine and against which the user can verify that the numbers match what they get.

This might sound like a lofty ideal that no researcher is ever going to have time for, but think about it more. When you share your code, you are going to need to provide some instructions about use and expected output. It is not that far of a leap to provide a script that does this, at which point you basically have a software test. Moreover, there is whole lot that can go wrong between you sharing a piece of software and someone else using it on their own machine. Having some form of test gives you the piece of mind that the software goblins aren’t plotting to ruin your day sometime in the future.

Naturally, if your software progresses further and continues to mature, then the software tests should concurrently increase in formality. There will obviously be differences of opinion about where the thresholds for different types of testing lie, but generally I quite like where we arrived at as a group in our discussion. I would like to see testing frameworks being used in software that was written for a publication, but in this era of publish or perish, I understand that this is an unreasonable expectation. However, once the software escapes the confines of a single research group, the expectation for automated testing then becomes reasonable.

Discussion on Testing at SeptembRSE

Everything above is immediately relatable to the discussion session at SeptembRSE titled “Is testing overkill for most research software? How can we make it easier to test scripts?”. You can watch the full session on YouTube:

To tersely summarise the response to the first question in the title, participants quite uniformly said, “No. Testing is not overkill”. Of course, the response to that question depends on what type of testing one is talking about. A full suite of automated unit, integration, and system tests in a testing framework that is run in CI is probably overkill for a lot of research software, especially the projects at level 0 and 1 above. Indeed, the participants in the discussion did acknowledge there is a varying degree of software testing dependent on the maturity of the project, which largely agrees with the table we created above.

On the question of how to make it easier to test research software, there was one answer that stood out for me, and I will paraphrase it as “write tests early and often”. The earlier you start writing tests, the easier it will be compared to further down the line. This is because testing has a direct positive influence upon the design of your software. Unit tests in particular force you to write small reusable pieces of code, and if you find it difficult to test something you have written, that is probably because it exhibits some pathology of poor design.

Whilst I do really agree with this answer, I think it also needs to be tempered with the reality of how research is conducted, and in that respect I once again point to the table created above for what should ultimately guide researchers and RSEs when writing tests for their research software.

Review of the SE4Science Workshop at ICCS 2021: Part 1

2021-07-28T00:00:00+00:00

On 16-18 June 2021, I attended the International Conference on Computational Science (ICCS) 2021, and I want to relay some of the important pieces of information that I picked up. By its nature, it is a conference that encompasses a broad spectrum of scientific domains, from computational health and bioinformatics to quantum computing to flow and transport physics. Whilst this was an appealing feature initially, I was quickly reminded how specialised all domains of science have become and the enormous background of knowledge required to intelligently converse in each domain. So, I will admit that many of the sessions went over my head. That being said, there was still value in getting a peak into these domains and the types of discussions happening therein.

However, my main reason for attending the conference was one particular thematic track: Software Engineering for Computational Science (SE4Science). In this post, I am going to review this workshop, and I will look at the other sessions of the conference in a future post.

Presentations

The workshop had the following ambitious objectives:

Provide a venue for members of the SE and research software communities to discuss issues relevant of common interest.

Identify aspects of SE that should be considered for research software education programs.

Identify key areas of study where participants agree there is a lack of existing data or studies.

Support the building of a common research agenda to address the complex software development issues typical of research software.

Provide a venue for sharing early work and work-in-progress to obtain feedback from the wider community.

The initial session of the workshop consisted of three presentations, all of which are available at this website. Zedong Peng delivered the first presentation on “I/O Associations in Scientific Software: A Study of SWMM (Storm Water Management Model)”, which seeks to address the test oracle problem in scientific software. The test oracle problem, simply put, is the problem of determining the correct output for a given input to your code. This sounds easy, but if you have ever sat down and started to attempt writing automated tests for a piece of research code that has complex and perhaps non-deterministic outputs, then you know how difficult it can be. One approach is to specify behaviours or properties of your system rather than exact specification of the output. Known as “property-based testing”, the presenter dealt with a subclass called metamorphic testing. To generate metamorphic relations used to form tests, one needs to understand the input and output (I/O) relations of the code, and the presenter’s novel approach was to use the user manual and user forums to derive these relationships through statistical methods. It is an interesting approach that mimics what many users do when consulting documentation to figure out the characteristics and behaviours of the program.

The second presentation was titled “Understanding Equity, Diversity and Inclusivity Challenges Within the Research Software Community” and was delivered by Neil Chue Hong. Perhaps unsurprisingly, this was one of the most eye-opening. For me, the stand-out headline was the result from a survey on the diversity of different protected characteristics within the RSE profession (pictured below). In terms of gender and ethnicity, UK RSEs have a significantly smaller representation than either UK Academics or the UK Workforce overall. One can get quite desensitised to the often abysmal diversity statistics reported in the news, so perhaps I shouldn’t have been surprised by these numbers, but I will admit that I hadn’t expected such a large disparity. A partial explanation for this blind spot is my own perception of the RSE community as an incredibly open, friendly, and welcoming one. However, as the presenter responded in the Q&A, there is a difference between a welcoming community and an inclusive one. The analogy was used of a pool party. A welcoming host might tell you to jump on in, but an inclusive one would get out of the pool, lay out the various options to get engaged with the party, and perhaps modify the set up of the party to accommodate your needs. It sounds like quite a high bar, but it is likely part of the solution to rectify the severe disparities in our community.

Chue Hong, Neil; Cohen, Jeremy; Jay, Caroline (2021): Understanding Equity, Diversity and Inclusivity Challenges Within the Research Software Community. figshare. Presentation. https://doi.org/10.6084/m9.figshare.14787858.v2 .

I still suspect there are pipeline issues and raised this in the talk. The recruitment pipeline showed the biggest chunk of RSEs coming from a “Physics” background, but this is likely too coarse of a category. Most RSEs are (probably, need data!) coming from subdomains that have a longer history of computation with computers. What is the breakdown of protected characteristics in these areas? We simply don’t have the data on this at the moment. This is not to shift responsibility from the RSE community, but merely to point out that there is probably a concurrent question to answer, which is why there isn’t better representation in computational sciences.

Finally, the third presentation was “How has the COVID-19 Pandemic affected working conditions for Research Software Engineers?” delivered by Caroline Jay. Evidently, this was a pertinent topic given the international pandemic we have all collectively experienced. The basis for the presentation and associated paper was a “diary study” in which participants identifying as RSEs were asked to answer some long form questions about their working situation over the course of a few months immediately after the outbreak of the pandemic. Whilst a comprehensive survey of many RSEs was not possible in this format (sample size was 17), the breadth of responses did seem to capture a representative cross section of how RSEs have weathered the pandemic. Many experiences hit home with me: a porous boundary between work and life, more context switching needed between the two, and a deceptive increase in productivity because of longer hours being worked, leading to a potential for burnout. Overall, the participants reported that the technical side of moving to remote working was not much of a hurdle, which makes sense given the remote nature of much of an RSE’s work and jives with my own experience. Of course, that was not true for everyone, and some did struggle establishing a home office.

On the other hand, I was surprised to see that caring and remote learning duties did not figure as prominently as I would have expected. Parents in the study appeared to have found a good routine. Rather, it was mental health that came to the fore. The loss of casual interactions in workplaces combined with feelings of isolation radically altered the productivity of participants and the ways in which their teams operated. I would summarise the results as such: although RSE work is suited to remote environments, a majority of RSEs still require some level of co-location to operate effectively. Like the authors of this study, I hope that pandemic will bring much needed attention to the diverse needs that RSEs (and all workers) have, and there are some preliminary signs that this is partially underway.

Speed Blogging

The second session was a speed blogging one, and I will be covering that in a second post given the length of this one.

SORSE Panel Discussion on the Modelling of COVID-19

2021-02-22T00:00:00+00:00

Back in January, I had my first experience moderating a panel discussion. It was one of the weekly SORSE sessions, which overall have provided the RSE community with an opportunity to present and network in the absence of any in-person conferences. Fittingly, the subject of the panel discussion was on the modelling of the COVID-19 pandemic in the UK and how RSEs have contributed—and still are! It was a fascinating discussion, and you can see the full recording of the event below:

If you would like to cite the discussion, follow the handy badge: