Every year, we have to produce a total of used hours on our compute cluster. The tools for interacting with slurm’s accounting database are complex and the documentation is not very good, so this simple task is incredibly annoying. If you ever have to do this, hopefully you will have an easier time of it. This post is only an example of one way to do the job and there may be better ways.
We start by having sacct
dump statistics for all jobs from about a year ago:
|
|
Transport this file to anywhere you have powershell installed.
This file isn’t incredibly helpful to begin with as the sacct
command is primarily for human-readable reports. There are also json and yaml output options that would be a good start if you wanted to do this with less manual effort.
The first thing to do is edit the first two lines of the file to trim all the spaces from the column name, and delete the first row that just contains a bunch of -
characters. You don’t need to mess with any of the other values, even though they are also padded with spaces. You’ll end up with something looking like this:
Now all we need to do is write a little powershell to parse this into something useful. This script uses unix-style paths because it was written on a mac, swap the slashes out if you’re running on windows.
|
|
This took a while for a cluster with over 300K jobs over the course of a year, and there’s probably a more efficient way to do it, but I do this once a year so I don’t care. You’ll end up with a total amount of CPU time used:
That’s it - quick, dirty, and satisfying.
Aaron Glenn Admin/Mentor
Aaron is a Sr. Systems Administrator from Tulsa, specializing in Microsoft technologies, with an emphasis on powershell.