Generating a bunch of decent mock data on https://landfill.bugzilla.ninja/

Hello!

Do you enjoy creative writing?
Would you like to help Bugzilla?

Right now, only myself and dkl are allowed access to sanitized bugzilla (bmo) database dumps. The vagrant and docker dev environments come with very rudimentry data sets (one product, a handful of versions) and it means the experience of our many contributors is sub-optimal. Frequently they’re not able to effectively test their ideas.

With this in mind, I have set up landfill.bugzilla.ninja. This is running the mozillabteam/bmo:latest docker container on a VM that I own. It contains no data that isn’t in the git repo.

I want to give people accounts on this – literally anyone – and have them:

1) Create products, components, versions, milestones, flags, tracking flags, keywords, groups, and other misc. metadata
2) Invent a bunch of fake bugs and comments. The more the better.

After we have a sufficient number of these created, I will take the data, sanitize it, and publish the data for all to use.

To get started, DM me on twitter (@dylan_hardison), send me an email dylan [at] hardison.net or hop in irc: #bugzilla on irc.mozilla.org

P.S.

Thematically, we can pretend this is a bug tracker for a fictional operating system written in Brainf*ck, or perhaps the bug tracker for the fictionary “sword art online” anime.
Or perhaps you can throw some machine learning at this – I don’t care as long as I get a diverse dataset for testing.

Looking back at Bugzilla and BMO in 2017

Looking Back

Recently in the Bugzilla Project meeting, Gerv informed us that he would be resigning, and it was pretty clear that my lack of technical leadership was the cause. While I am sad to see Gerv go, it did make me realize I need to write more about the things I do.

As is evident in this post, all of the things I’ve accomplished have been related to the BMO codebase and not upstream Bugzilla – which is why upstream must be rebased on BMO.
See Bug 1427884 for one of the blockers to this.

Accessibility Changes

In 2017, we made over a dozen a11y changes, and I’ve heard from a well-known developer that using BMO with a screen reader is far superior to other bugzillas. 🙂

Infrastructure Changes

BMO is quite happy to use carton to manage its perl dependencies, and Docker handle its system-level dependencies.

We’re quite close to being able to run on Kubernetes.

While the code is currently turned off in production, we also feature a very advanced query translator that allows the use of ElasticSearch to index all bugs and comments.

Performance Changes

I sort of wanted to turn each of these into a separate blog post, but I never got time for that – and I’m even more excited about writing about future work. But rather than just let them hide away in bugs, I thought I’d at least list them and give a short summary.

February

  • Bug 1336958 – HTML::Tree requires manual memory management or it leaks memory. I discovered this while looking at some unrelated code.
  • Bug 1335233 – I noticed that the job queue runner wasn’t calling the end-of-request cleanup code, and a result it was also leaking memory.

March

  • Bug 1345181 – make html_quote() about five times faster.
  • Bug 1347570 – make it so apache in the dev/test environments didn’t need to restart after every request (by enforcing a minimum memory limit)
  • Bug 1350466 – switched JSON serialization to JSON::XS, which is nearly 1000 times faster.
  • Bug 1350467 – caused more modules (those provided by optional features) to be preloaded at apache startup.
  • Bug 1351695 – Pre-load “.htaccess” files and allow apache to ignore them

April

  • Bug 1355127 – rewrote template code that is in a tight loop to Perl, saving a few hundred thousand method calls (no exageration!)
  • Bug 1355134 – fetch all groups at once, rather than row-at-a-time.
  • Bug 1355137 – Cache objects that represent bug fields.
  • Bug 1355142 – Instead of using a regular expression to “trick” Perl’s string tainting system, use a module to directly flip the “taint” bit. This was hundreds of times faster.
  • Bug 1352264 – Compile all templates and store them in memory. This actually saved both CPU time and RAM, because the memory used by templates is shared by all workers on a given node.

May

  • Bug 1362151 – Cache bzapi configuration API, making ‘bz export’ commands (on developer machines) faster by 2-5 seconds.
  • Bug 1352907 – Rewrite the Bugzilla extension loading system.
    The previous one was incredibly inefficient.

June

  • Bug 1355169 – Mentored intern to implement token-bucket based rate limiting. Not strictly a performance thing but it reduced API abuse.

December

  • Bug 1426963 – Use a hash lookup to determine group membership, rather than searching an unsorted list.
  • Bug 1427230 – Templates were slowed down because they use exceptions for control flow and we’re doing lots of work for each (caught!) exception. One should never use CGI::Carp, never overload CORE::GLOBAL::die, and never set a DIE handler

Developer Experience Changes

My favorite communities optimize for fun. Frequently fun means being able to get things done. So in 2017 I did the following:

  • Made a vagrant development environment setup that closely mapped to BMO production.
    • I tested installing it on various machines – Linux, OSX, Windows
    • I wrote a README explaining how to use it.
    • This dev environment has been tested by people with little or no experience with Bugzilla development.
  • I changed to a pull-request based workflow. We use Bugzilla to track bugs and tasks, but not do code review.
  • I made it so the entire test suite could run against pull requests. This isn’t trivial, you have to work a bit harder to build docker images and run them without having any dockerhub credentials. (Pull requests don’t get any dockerhub credentials, I have to say to make sure my friend ulfr doesn’t have a heart attack)
  • I made sure that I understood how to use Atom and Visual Studio Code. I actually rather like the later now – and more importantly it is easy to help out new-comers with these editors.
  • I adopted Perl::Critic for code linting and Perl::Tidy for code-formatting, using the PBP ruleset for the later. I also made it a point to not make code style a part of code review – let the machine do that.

Numbers

In the last year, we had almost 500 commits to the BMO repo,
from 20 different people. Some people were new, and some were returning contributors (such as Sebastin Santy).

BMO ❤️ Carton

Back when I started working on BMO
we couldn’t add new dependencies without having someone build an RPM. For no particularly good reason, this made it so in general we didn’t add new dependencies often.

However, about a year ago I started poking at carton and came up with a process to run carton in a docker container that mirrors production, and tar up the resulting local/ directory.

For the last 6 months or so we have been able to add dependencies whenever we want. We can also track changes to the
full dependency tree.

The code for this is on github as mozilla-bteam/carton-bundles and it is a little ugly, but packaging code is rarely elegant.

Optimizing BMO: Part 1, An Easy Fix

One way of squeezing performance out of apache, as noted in this blog post by Hayden James is to disable htaccess files – which are not needed when you have control over the httpd’s config files. Doing this allows the web server to spend less time calling stat() for .htaccess files that don’t even exist – for instance a request to https://something.something/foo/bar/baz is at least 4 calls to stat() .htaccess (once at /, then /foo, then /foo/bar, and finally /foo/bar/baz assuming that baz is also a directory.

As it turns out for BMO this is even easier: bugzilla already sends configuration to the apache process.

Because of this, we can search for .htaccess files at apache startup time, load them using the server->add_config() method and tell apache to not bother looking for them during subsequent requests.

This change was quite small and the performance gain in production should be noticeable (but not large). As it turns out, some of those stat() calls also hit NFS, which will be a discussion for Part 2.

Bigger and better search box on BMO

There is a lot of power hidden in quick search and my data suggests it is under-utilized.

For instance, searching for all review flags is the literal search tag review?
Similarly you can do needinfo?dylan@mozilla.com to find all bugs with needinfos directed at me (actually, this query performs a slightly broader search but I have code to fix that).

There are dozens more examples. The quick search help is a long read, and most people won’t bother.

A long time ago glob suggested stealing the UI from DXR, where you get a little quick-help on the operator
syntax for DXR searches. It’s a pretty simple change, right?

Well, our search box is small. So the first thing it needs is to be bigger.

bigger quicksearch box

More room to work with. This required replacing the table-based layout with some flex boxes. The top-line is nearly pixel perfect
to its previous table-based implementation.

We can also hide some things and begin making the UI responsive

portrait view of bmo quicksearch.

I hope to post a followup showing the quicksearch syntax helper,
but this is at the moment just a side task.

(Although it ties well into the goal of implementing elasticsearch on BMO).

Sorry, I meant I changed it from 226s to 184ms

About twenty days ago it came to my attention via my colleague Ed Morley that BMO’s bzapi was very slow.
It turns out he had reported the same issue the prior year as well!

Performance problems are very enjoyable to work on, I find. Especially when they are reproducible.

I lost most of the day on this, but in the end I was able to take the slowest function from executing at a leisurely 226 seconds to a very fast 184 milliseconds