The Binding of libcmark-gfm: Segfaults and Debugging

So for various reasons — including “I want bugzilla.mozilla.org to support markdown” I have been working to get a binding to github’s fork of cmark.

For the first part of this, I got some help in #native
writing a Perl module in the Alien:: namespace,
namely Alien::libcmark_gfm.

Armed with this module, I’ve been seeking to make CommonMark
work against the GitHub fork of libcmark.

So far things have been going well, and I decide to just be dumb. The API for libcmark-gfm is a bit different, so I’ll rename the packages from CommonMark to CommonMarkGFM.

Of course, this was the first problem: I was getting
errors about a package not existing, a package named CommonMarkGFM::N. What the hell does that mean? I haven’t changed much yet!

The problem was this bit of C code in the newly-renamed CommonMarkGFM.xs:

stash = gv_stashpvn("CommonMarkGFM::Node", 16, GV_ADD);

Okay, now I don’t know perlguts very well.
I don’t know what gv_stashpvn does (but I can find the docs for gv_stashpvn
and the name is a hint at what it does, in the terse nomenclature of Perl’s internal APIs)

The old string was 16 bytes long. Now it should be 19,
and that perfectly explains why I saw CommonMarkGFM::N.

So I get past that. and now the test suite segfaults.

1..10
ok 1 - use CommonMarkGFM;
ok 2 - markdown_to_html
ok 3 - 'parse_document' isa 'CommonMarkGFM::Node'
Segmentation fault

Hey, maybe this is the same as the first problem I fixed?

So I go looking for that problem, and I find it!
We have some lengths hard-coded in the typemap file
(no, aside from the fact it maps types, I don’t know what the typemap file does. I’m not usually hacking in perlapi).

T_NODE
    $var = (cmark_node*)S_sv2c(aTHX_ $arg, \"CommonMarkGFM::Node\", 19, cv,
                               \"$var\");
/* more omitted */

So I fix those problems, but they were not my problem.
I’m still getting a segfault…

I’m really quite excited at this moment! I have a problem that I can apply things I learned about from this wonderful blog by Julia Evans.

I’ve already been using a Dockerfile to try to compile and test this code so I just need to install Valgrind (and maybe gdb too) and see what happens.

So I run valgrind:

==16==  Access not within mapped region at address 0x88
==16==    at 0xEF4685C: cmark_render_html_with_mem (in /usr/local/lib64/perl5/auto/share/dist/Alien-libcmark_gfm/lib/libcmark-gfm.so.0.28.3.gfm.12)
==16==    by 0xED0A11D: XS_CommonMarkGFM__Node_interface_render (CommonMarkGFM.c:898)
==16==    by 0x4ED6814: Perl_pp_entersub (pp_hot.c:2888)
==16==    by 0x4ED4B05: Perl_runops_standard (run.c:40)
==16==    by 0x4E7D0D7: perl_run (perl.c:2435)
==16==    by 0x400E73: main (perlmain.c:117)

Huh, interesting. Okay, maybe I can use gdb to set a breakpoint there.

(gdb) b cmark_render_html_with_mem
Function "cmark_render_html_with_mem" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (cmark_render_html_with_mem) pending.

Our function doesn’t exist yet as it’s in a shared object that will get loaded later. This is fine — except it isn’t. My breakpoint never happens.

Huh! I guess (as it turns out, wrongly) that maybe I need to change my compilation options. And I also assume the segfaulting is because of something in the Perl extension code.

So maybe it’s that we compile with -02. My gcc is too old to support -Og, so let’s try -O0.

At this point, I’m just copying the line from make’s output and changing it. I just want to get some details in gdb damn it!

So I run the following:

perl Makefile.PL
make
gcc -c  -I/usr/local/lib64/perl5/auto/share/dist/Alien-libcmark_gfm/include -D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O0 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic   -DVERSION=\"0.280301\" -DXS_VERSION=\"0.280301\" -fPIC "-I/usr/lib64/perl5/CORE"   CommonMarkGFM.c
make install

Now I can run perl t/03_render.t again, under gdb, and see if I can get more details.

1..10
ok 1 - use CommonMarkGFM;
ok 2 - markdown_to_html
ok 3 - 'parse_document' isa 'CommonMarkGFM::Node'
ok 4 - parse_document works
ok 5 - render_xml
ok 6 - render_man
ok 7 - render_latex
ok 8 - render_commonmark
ok 9 - render functions return encoded utf8
ok 10 - render functions expect decoded utf8

My attitude thus far is clear in the tweet that followed:

Now I proceed to have fun.

I spent the time trying to figure out what -O1 vs. -O0 did, and I wrote a script to repeatedly re-compile that one file
with different options. Along the way, I learned how to make gcc spit out what options it is compiling with (gcc -Q -v ...).
I had some false positives, and then I went to sleep.

After a period of sleep, figured out I wanted the list of flags as a difference between -O0 and -O1. I cleaned up my compile.pl script
and ran it.

The answer is: all of them are fine. -O0 and all the feature flags of -O1 result in no segfault either. Adding -O1 back brings back the segfault. After some more searching of the gcc docs, it is implied some optimizations are just directly tied to the O level.

My fun is now over, and I’ll do the more boring task of figuring out why my code is broken.

Staring at my from gdb’s output is this:

warning: Error disabling address space randomization: Operation not permitted

After a bit of searching, I find a fix for this to run the docker image
with --security-opt seccomp=unconfined.

And suddenly, breakpoints work.

and I can debug the root variable that is passed to cmark_render_html_with_mem… and nothing is wrong there.
Probably I need to re-compile libcmark-gfm with more debugging, I think. Suddenly, I realize that cmark_render_html_with_mem takes three arguments, and the Perl XS code is only passing it two.
How does this work? Well, it appears to cast a pointer to a function pointer, and call it. Calling a function pointer with fewer arguments than it is declared to with is undefined behavior, and I guess the rest of the behavior I observed was nasal demons.

(as an FYI, this argument difference is an API change between upstream libcmark and libcmark-gfm).

Finally, this third argument is a linked list of syntax extensions,
and it’s not clear yet how I will need to pass that back and forth between perl and C. This is also indicative that CommonMarkGFM will need to be a fork of CommonMark

Generating a bunch of decent mock data on https://landfill.bugzilla.ninja/

Hello!

Do you enjoy creative writing?
Would you like to help Bugzilla?

Right now, only myself and dkl are allowed access to sanitized bugzilla (bmo) database dumps. The vagrant and docker dev environments come with very rudimentry data sets (one product, a handful of versions) and it means the experience of our many contributors is sub-optimal. Frequently they’re not able to effectively test their ideas.

With this in mind, I have set up landfill.bugzilla.ninja. This is running the mozillabteam/bmo:latest docker container on a VM that I own. It contains no data that isn’t in the git repo.

I want to give people accounts on this – literally anyone – and have them:

1) Create products, components, versions, milestones, flags, tracking flags, keywords, groups, and other misc. metadata
2) Invent a bunch of fake bugs and comments. The more the better.

After we have a sufficient number of these created, I will take the data, sanitize it, and publish the data for all to use.

To get started, DM me on twitter (@dylan_hardison), send me an email dylan [at] hardison.net or hop in irc: #bugzilla on irc.mozilla.org

P.S.

Thematically, we can pretend this is a bug tracker for a fictional operating system written in Brainf*ck, or perhaps the bug tracker for the fictionary “sword art online” anime.
Or perhaps you can throw some machine learning at this – I don’t care as long as I get a diverse dataset for testing.

Looking back at Bugzilla and BMO in 2017

Looking Back

Recently in the Bugzilla Project meeting, Gerv informed us that he would be resigning, and it was pretty clear that my lack of technical leadership was the cause. While I am sad to see Gerv go, it did make me realize I need to write more about the things I do.

As is evident in this post, all of the things I’ve accomplished have been related to the BMO codebase and not upstream Bugzilla – which is why upstream must be rebased on BMO.
See Bug 1427884 for one of the blockers to this.

Accessibility Changes

In 2017, we made over a dozen a11y changes, and I’ve heard from a well-known developer that using BMO with a screen reader is far superior to other bugzillas. 🙂

Infrastructure Changes

BMO is quite happy to use carton to manage its perl dependencies, and Docker handle its system-level dependencies.

We’re quite close to being able to run on Kubernetes.

While the code is currently turned off in production, we also feature a very advanced query translator that allows the use of ElasticSearch to index all bugs and comments.

Performance Changes

I sort of wanted to turn each of these into a separate blog post, but I never got time for that – and I’m even more excited about writing about future work. But rather than just let them hide away in bugs, I thought I’d at least list them and give a short summary.

February

  • Bug 1336958 – HTML::Tree requires manual memory management or it leaks memory. I discovered this while looking at some unrelated code.
  • Bug 1335233 – I noticed that the job queue runner wasn’t calling the end-of-request cleanup code, and a result it was also leaking memory.

March

  • Bug 1345181 – make html_quote() about five times faster.
  • Bug 1347570 – make it so apache in the dev/test environments didn’t need to restart after every request (by enforcing a minimum memory limit)
  • Bug 1350466 – switched JSON serialization to JSON::XS, which is nearly 1000 times faster.
  • Bug 1350467 – caused more modules (those provided by optional features) to be preloaded at apache startup.
  • Bug 1351695 – Pre-load “.htaccess” files and allow apache to ignore them

April

  • Bug 1355127 – rewrote template code that is in a tight loop to Perl, saving a few hundred thousand method calls (no exageration!)
  • Bug 1355134 – fetch all groups at once, rather than row-at-a-time.
  • Bug 1355137 – Cache objects that represent bug fields.
  • Bug 1355142 – Instead of using a regular expression to “trick” Perl’s string tainting system, use a module to directly flip the “taint” bit. This was hundreds of times faster.
  • Bug 1352264 – Compile all templates and store them in memory. This actually saved both CPU time and RAM, because the memory used by templates is shared by all workers on a given node.

May

  • Bug 1362151 – Cache bzapi configuration API, making ‘bz export’ commands (on developer machines) faster by 2-5 seconds.
  • Bug 1352907 – Rewrite the Bugzilla extension loading system.
    The previous one was incredibly inefficient.

June

  • Bug 1355169 – Mentored intern to implement token-bucket based rate limiting. Not strictly a performance thing but it reduced API abuse.

December

  • Bug 1426963 – Use a hash lookup to determine group membership, rather than searching an unsorted list.
  • Bug 1427230 – Templates were slowed down because they use exceptions for control flow and we’re doing lots of work for each (caught!) exception. One should never use CGI::Carp, never overload CORE::GLOBAL::die, and never set a DIE handler

Developer Experience Changes

My favorite communities optimize for fun. Frequently fun means being able to get things done. So in 2017 I did the following:

  • Made a vagrant development environment setup that closely mapped to BMO production.
    • I tested installing it on various machines – Linux, OSX, Windows
    • I wrote a README explaining how to use it.
    • This dev environment has been tested by people with little or no experience with Bugzilla development.
  • I changed to a pull-request based workflow. We use Bugzilla to track bugs and tasks, but not do code review.
  • I made it so the entire test suite could run against pull requests. This isn’t trivial, you have to work a bit harder to build docker images and run them without having any dockerhub credentials. (Pull requests don’t get any dockerhub credentials, I have to say to make sure my friend ulfr doesn’t have a heart attack)
  • I made sure that I understood how to use Atom and Visual Studio Code. I actually rather like the later now – and more importantly it is easy to help out new-comers with these editors.
  • I adopted Perl::Critic for code linting and Perl::Tidy for code-formatting, using the PBP ruleset for the later. I also made it a point to not make code style a part of code review – let the machine do that.

Numbers

In the last year, we had almost 500 commits to the BMO repo,
from 20 different people. Some people were new, and some were returning contributors (such as Sebastin Santy).

BMO ❤️ Carton

Back when I started working on BMO
we couldn’t add new dependencies without having someone build an RPM. For no particularly good reason, this made it so in general we didn’t add new dependencies often.

However, about a year ago I started poking at carton and came up with a process to run carton in a docker container that mirrors production, and tar up the resulting local/ directory.

For the last 6 months or so we have been able to add dependencies whenever we want. We can also track changes to the
full dependency tree.

The code for this is on github as mozilla-bteam/carton-bundles and it is a little ugly, but packaging code is rarely elegant.

Optimizing BMO: Part 1, An Easy Fix

One way of squeezing performance out of apache, as noted in this blog post by Hayden James is to disable htaccess files – which are not needed when you have control over the httpd’s config files. Doing this allows the web server to spend less time calling stat() for .htaccess files that don’t even exist – for instance a request to https://something.something/foo/bar/baz is at least 4 calls to stat() .htaccess (once at /, then /foo, then /foo/bar, and finally /foo/bar/baz assuming that baz is also a directory.

As it turns out for BMO this is even easier: bugzilla already sends configuration to the apache process.

Because of this, we can search for .htaccess files at apache startup time, load them using the server->add_config() method and tell apache to not bother looking for them during subsequent requests.

This change was quite small and the performance gain in production should be noticeable (but not large). As it turns out, some of those stat() calls also hit NFS, which will be a discussion for Part 2.

Bigger and better search box on BMO

There is a lot of power hidden in quick search and my data suggests it is under-utilized.

For instance, searching for all review flags is the literal search tag review?
Similarly you can do needinfo?dylan@mozilla.com to find all bugs with needinfos directed at me (actually, this query performs a slightly broader search but I have code to fix that).

There are dozens more examples. The quick search help is a long read, and most people won’t bother.

A long time ago glob suggested stealing the UI from DXR, where you get a little quick-help on the operator
syntax for DXR searches. It’s a pretty simple change, right?

Well, our search box is small. So the first thing it needs is to be bigger.

bigger quicksearch box

More room to work with. This required replacing the table-based layout with some flex boxes. The top-line is nearly pixel perfect
to its previous table-based implementation.

We can also hide some things and begin making the UI responsive

portrait view of bmo quicksearch.

I hope to post a followup showing the quicksearch syntax helper,
but this is at the moment just a side task.

(Although it ties well into the goal of implementing elasticsearch on BMO).