Optimizing GutterMarks.nvim · Blog

A few weeks ago, I shared GutterMarks.nvim (blog, GitHub) a Neovim plugin to allow displaying the nvim marks in the gutter of a buffer. It got positive feedback, and since then I added a couple of convenient actions, and fixed a couple of minor bugs that the community had found.

One thing I spent more time recently is optimizing the performances of the plugin. In the README I mention that the plugin is faster that other implementations (that I found), as I'm avoid using any timers to compute the marks. However, there is always headroom for faster and better. So I set to myself as an exercise to see how much more it could be optimized.

The first step for any optimization is: measure. I did not find any suitable Neovim benchmark suite for plugins. So I wrote a little helper benchmark suite. This is really the most basic implementation leveraging Mini.test: Create a reproducible environment, run a test N times, print results, run tests regularly to validate regression.

As this is a hobby project, I did not spent much time to remove the bias from the benchmark suite (warm-up, validate variance, remove constant factors...). If I ever do this, I may then publish this as a proper benchmark suite.

So now we have this common pattern:

T["refresh with 1 local mark"] = function()
  -- Setup a buffer with 100 lines
  bench_helpers.setup_local_marks(child, 1, 100)

  -- Run the benchmark, by default 100 times:
  local results = bench_helpers.benchmark("refresh with 1 local mark", function()
    child.lua([[require("guttermarks").refresh()]])
  end)

  -- Leverage MiniTest notes to display the test result:
  MiniTest.add_note(bench_helpers.format_note(results))
end

Which prints:

NOTE in test/bench/bench_refresh.lua | refresh with 1 local mark: it=0100
mean=0.036ms median=0.034ms min=0.028ms max=0.067ms stddev=0.007ms

While not perfect, having the mean, median, min, max and sdtddev gives us a good idea of the distribution, thus helps us validating our priors and if any optimization is worthwhile.

Note I'm only optimizing for latency (as opposed to memory or throughput), since for this project I'm working with the assumption that memory usage is low.

I decided to optimize the critical loop of the plugin: the refresh() function. This function goal is to list all the configured marks, and refresh the gutter with the latest values. In particular I created 2 scenarios:

Light Scenario: 3 marks in a 500 lines file, call refresh 1000 times
Heavy Scenario: 52 marks in a 5000 lines file, call refresh 1000 times

Here is the baseline that I found:

NOTE in test/bench/bench_cache.lua | refresh (x1000) - light: it=0010
mean=94.417ms median=94.636ms min=91.794ms max=95.743ms stddev=1.089ms

NOTE in test/bench/bench_cache.lua | refresh (x1000) - heavy: it=0010
mean=95.032ms median=95.293ms min=92.587ms max=97.148ms stddev=1.431ms

This reads as it takes around 94ms for nvim to call refresh 1000 times, on 10 attempts (or 94 micro-seconds per call). We note that light or heavy, it has little impact on the benchmark result, while a bit longer for the heavy test, it is negligible compared to the sdtdev.

After doing a bit of exploration I noticed that when refreshing the marks in the gutter, 2 functions where the most expensive:

Function vim.fn.getmarklist to get the list of marks (called 3 times for local, global and special marks)
Function vim.api.nvim_buf_set_extmark used to populate the gutter (called once per mark)

One common optimization pattern is to do nothing when not necessary. So I decided to remove the call to nvim_buf_set_extmark when unnecessary.

This is done by caching the last list of marks, and compare the list with the new one before refreshing the gutter. As simple as:

function M.refresh()
  …
  local cached_marks = M._marks_cache[bufnr]
  if cached_marks and utils.marks_equal(marks, cached_marks) then
    return;
  end
  
  update_extmark(bufnr, marks)
  
  M._marks_cache[bufnr] = marks
end

With this cache implementation we now have the following result:

NOTE in test/bench/bench_cache.lua | refresh (x1000) with cache - light: it=0010
mean=47.051ms median=47.522ms min=45.241ms max=47.871ms stddev=0.828ms

NOTE in test/bench/bench_cache.lua | refresh (x1000) with cache - heavy: it=0010
mean=47.703ms median=47.990ms min=46.070ms max=48.572ms stddev=0.830ms

We immediately get a 2.1x speed-up on this micro-benchmark.

A few conclusive notes

This only a micro-benchmark when marks are never updated, while not real it is matching the behavior where marks are rarely updated, and refresh() may be called many times between 2 marks update. A few more benchmark scenarios should be created with more mixed usage.

A few other areas to optimize I may work more in the future:

When having to actually update the marks, we could simply update the required marks, instead of clearing the namespace.
Could we configure the aucmd to call refresh() in more meaningful situations? I wonder if we could update nvim to publish a MarkUpdate aucmd only when a marks get updated.

Now, one could ask: is it worth the effort to optimize a function that takes less than a tenth of a millisecond to execute in a text editor? I believe that, all else being equal, if all plugin authors spent time optimizing their hot loop to divide by 2 the latency with little complexity overhead, using computers would be more enjoyable day to day.