The Binding of libcmark-gfm: Segfaults and Debugging

So for various reasons — including “I want bugzilla.mozilla.org to support markdown” I have been working to get a binding to github’s fork of cmark.

For the first part of this, I got some help in #native
writing a Perl module in the Alien:: namespace,
namely Alien::libcmark_gfm.

Armed with this module, I’ve been seeking to make CommonMark
work against the GitHub fork of libcmark.

So far things have been going well, and I decide to just be dumb. The API for libcmark-gfm is a bit different, so I’ll rename the packages from CommonMark to CommonMarkGFM.

Of course, this was the first problem: I was getting
errors about a package not existing, a package named CommonMarkGFM::N. What the hell does that mean? I haven’t changed much yet!

The problem was this bit of C code in the newly-renamed CommonMarkGFM.xs:

stash = gv_stashpvn("CommonMarkGFM::Node", 16, GV_ADD);

Okay, now I don’t know perlguts very well.
I don’t know what gv_stashpvn does (but I can find the docs for gv_stashpvn
and the name is a hint at what it does, in the terse nomenclature of Perl’s internal APIs)

The old string was 16 bytes long. Now it should be 19,
and that perfectly explains why I saw CommonMarkGFM::N.

So I get past that. and now the test suite segfaults.

1..10
ok 1 - use CommonMarkGFM;
ok 2 - markdown_to_html
ok 3 - 'parse_document' isa 'CommonMarkGFM::Node'
Segmentation fault

Hey, maybe this is the same as the first problem I fixed?

So I go looking for that problem, and I find it!
We have some lengths hard-coded in the typemap file
(no, aside from the fact it maps types, I don’t know what the typemap file does. I’m not usually hacking in perlapi).

T_NODE
    $var = (cmark_node*)S_sv2c(aTHX_ $arg, \"CommonMarkGFM::Node\", 19, cv,
                               \"$var\");
/* more omitted */

So I fix those problems, but they were not my problem.
I’m still getting a segfault…

I’m really quite excited at this moment! I have a problem that I can apply things I learned about from this wonderful blog by Julia Evans.

I’ve already been using a Dockerfile to try to compile and test this code so I just need to install Valgrind (and maybe gdb too) and see what happens.

So I run valgrind:

==16==  Access not within mapped region at address 0x88
==16==    at 0xEF4685C: cmark_render_html_with_mem (in /usr/local/lib64/perl5/auto/share/dist/Alien-libcmark_gfm/lib/libcmark-gfm.so.0.28.3.gfm.12)
==16==    by 0xED0A11D: XS_CommonMarkGFM__Node_interface_render (CommonMarkGFM.c:898)
==16==    by 0x4ED6814: Perl_pp_entersub (pp_hot.c:2888)
==16==    by 0x4ED4B05: Perl_runops_standard (run.c:40)
==16==    by 0x4E7D0D7: perl_run (perl.c:2435)
==16==    by 0x400E73: main (perlmain.c:117)

Huh, interesting. Okay, maybe I can use gdb to set a breakpoint there.

(gdb) b cmark_render_html_with_mem
Function "cmark_render_html_with_mem" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (cmark_render_html_with_mem) pending.

Our function doesn’t exist yet as it’s in a shared object that will get loaded later. This is fine — except it isn’t. My breakpoint never happens.

Huh! I guess (as it turns out, wrongly) that maybe I need to change my compilation options. And I also assume the segfaulting is because of something in the Perl extension code.

So maybe it’s that we compile with -02. My gcc is too old to support -Og, so let’s try -O0.

At this point, I’m just copying the line from make’s output and changing it. I just want to get some details in gdb damn it!

So I run the following:

perl Makefile.PL
make
gcc -c  -I/usr/local/lib64/perl5/auto/share/dist/Alien-libcmark_gfm/include -D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O0 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic   -DVERSION=\"0.280301\" -DXS_VERSION=\"0.280301\" -fPIC "-I/usr/lib64/perl5/CORE"   CommonMarkGFM.c
make install

Now I can run perl t/03_render.t again, under gdb, and see if I can get more details.

1..10
ok 1 - use CommonMarkGFM;
ok 2 - markdown_to_html
ok 3 - 'parse_document' isa 'CommonMarkGFM::Node'
ok 4 - parse_document works
ok 5 - render_xml
ok 6 - render_man
ok 7 - render_latex
ok 8 - render_commonmark
ok 9 - render functions return encoded utf8
ok 10 - render functions expect decoded utf8

My attitude thus far is clear in the tweet that followed:

Now I proceed to have fun.

I spent the time trying to figure out what -O1 vs. -O0 did, and I wrote a script to repeatedly re-compile that one file
with different options. Along the way, I learned how to make gcc spit out what options it is compiling with (gcc -Q -v ...).
I had some false positives, and then I went to sleep.

After a period of sleep, figured out I wanted the list of flags as a difference between -O0 and -O1. I cleaned up my compile.pl script
and ran it.

The answer is: all of them are fine. -O0 and all the feature flags of -O1 result in no segfault either. Adding -O1 back brings back the segfault. After some more searching of the gcc docs, it is implied some optimizations are just directly tied to the O level.

My fun is now over, and I’ll do the more boring task of figuring out why my code is broken.

Staring at my from gdb’s output is this:

warning: Error disabling address space randomization: Operation not permitted

After a bit of searching, I find a fix for this to run the docker image
with --security-opt seccomp=unconfined.

And suddenly, breakpoints work.

and I can debug the root variable that is passed to cmark_render_html_with_mem… and nothing is wrong there.
Probably I need to re-compile libcmark-gfm with more debugging, I think. Suddenly, I realize that cmark_render_html_with_mem takes three arguments, and the Perl XS code is only passing it two.
How does this work? Well, it appears to cast a pointer to a function pointer, and call it. Calling a function pointer with fewer arguments than it is declared to with is undefined behavior, and I guess the rest of the behavior I observed was nasal demons.

(as an FYI, this argument difference is an API change between upstream libcmark and libcmark-gfm).

Finally, this third argument is a linked list of syntax extensions,
and it’s not clear yet how I will need to pass that back and forth between perl and C. This is also indicative that CommonMarkGFM will need to be a fork of CommonMark

BMO ❤️ Carton

Back when I started working on BMO
we couldn’t add new dependencies without having someone build an RPM. For no particularly good reason, this made it so in general we didn’t add new dependencies often.

However, about a year ago I started poking at carton and came up with a process to run carton in a docker container that mirrors production, and tar up the resulting local/ directory.

For the last 6 months or so we have been able to add dependencies whenever we want. We can also track changes to the
full dependency tree.

The code for this is on github as mozilla-bteam/carton-bundles and it is a little ugly, but packaging code is rarely elegant.

Sorry, I meant I changed it from 226s to 184ms

About twenty days ago it came to my attention via my colleague Ed Morley that BMO’s bzapi was very slow.
It turns out he had reported the same issue the prior year as well!

Performance problems are very enjoyable to work on, I find. Especially when they are reproducible.

I lost most of the day on this, but in the end I was able to take the slowest function from executing at a leisurely 226 seconds to a very fast 184 milliseconds

Perl + Bugzilla in Outreachy Round 13!

I am very proud to be mentoring in the 13th round of Outreachy.

We have a a list of ideas
and a growing list of bugs that would make for good “small contributions”.

I’m open to emails or irc discussions and I’ll try to answer
any and all questions.

email: dylan [at] mozilla [dot] com
or join irc.mozilla.org #bugzilla, I’m in IRC from
13:00 UTC until about 22:00 UTC.

Here’s are project blurb:

Perl is a highly capable, feature-rich programming language with over 28 years of development, making it one of the longest standing FOSS projects. The Perl Foundation is funding a position working on Bugzilla, a widely used, Perl-based issue tracker. In 18 years of development, Bugzilla has grown into a complex application that is used in many different workflows by organizations including Mozilla, the GNOME Project, Red Hat, and freedesktop.org. Some of this complexity is particularly evident in the search functionality, both in implementation and in user interface. We have several proposals to simplify and improve searching, which will positively impact Bugzilla sites around the world.

Fixed some memory leaks in bugzilla.mozilla.org

So last week I fixed Bug 1282606 which has resulted in a bit of a performance improvement for bugzilla.mozilla.org:

Restarts per Hour

Apache2::SizeLimit
is configured to kill processes once they use more than 700M and this happened about every 7 minutes.

About two weeks ago, while working on some performance issues relating to BMO’s new show_bug ui, I discovered that the problem could get worse: running out of memory every 60 seconds. Should everyone switch to the new UI (which is intended to put less load on the server) a lot more load would be on the server. That’s pretty bad, since we want everyone using the new UI as soon as possible. 🙂

This memory leak isn’t new, and I had filed an investigatory bug about it last year. Memory leaks in perl are caused by having cyclic references, and the solution is to not have cycles, use weak references, or to break the cycle when you’re done with whatever data structure it is part of.

I understand the problem, and I know how to fix it… but maybe I don’t know where the problem is?

Thankfully there is a tool for this on CPAN: Devel::MAT.

Using Devel::MAT, it is possible to dump the address space of a perl program and explore it in great detail in a GUI.

I didn’t set out to remove all the memory leaks this time, just the ones that were the biggest or grew the fastest. This meant the TrackingFlags extension Flag objects, the Bug object, and the Comment object.

The changes are on github for the curious,
and the resulting charts below speak for themselves.

Average Request Time

Requests Before Restart

Age of Process Before Restart

Bugzilla on Heroku: Part 2

I’ve gotten bugzilla to run acceptably well on Heroku. Still no memcached support (I need to work on my patch for that a little bit more and then add support to bugzilla for it) but I’m pretty happy with this.

Until I take it down, here is a working Bugzilla 5.1.1+ install on heroku, fronted by fast.ly. Emails works, so it is possible to create new accounts.
https://calm-sands-84076-herokuapp-com.global.ssl.fastly.net/

Bugzilla on heroku? And hacking on Memcached::libmemcached

I thought I’d take this long weekend to do something fun: get Bugzilla running under Heroku.

Steps:

  1. fork the perl/psgi buildpack to run bugzilla.
    bugzilla buildpack
  2. swear at Gerv for adding login name support and swear at
    checksetup.pl for spinning in an infinite loop when it can’t prompt for a missing config Bug 1284021.
  3. start writing support for storing the “params” data in the db instead of the filesystem, as the filesystem in heroku is ephermal.
  4. realize that you’ll want to memcache this, so might as well add memcache to heroku
  5. realize MemCachier (one of the herkoku memcached providers) requires username and passwords and that the perl bindings don’t support this

After some research… I realize this feature requires support for the binary protocol and is based on SASL. Fine. I’ll learn XS (perl’s FFI) and contribute code to Memcache::libmemcached. How hard can it be?

It turned out to be not very hard
but it’s a work in progress (and definitely leaks memory right now).

Oh yeah, and bugzilla does run on heroku, but it won’t be useful until the params stuff can be stored in the DB.