The Binding of libcmark-gfm: Segfaults and Debugging

So for various reasons — including “I want bugzilla.mozilla.org to support markdown” I have been working to get a binding to github’s fork of cmark.

For the first part of this, I got some help in #native
writing a Perl module in the Alien:: namespace,
namely Alien::libcmark_gfm.

Armed with this module, I’ve been seeking to make CommonMark
work against the GitHub fork of libcmark.

So far things have been going well, and I decide to just be dumb. The API for libcmark-gfm is a bit different, so I’ll rename the packages from CommonMark to CommonMarkGFM.

Of course, this was the first problem: I was getting
errors about a package not existing, a package named CommonMarkGFM::N. What the hell does that mean? I haven’t changed much yet!

The problem was this bit of C code in the newly-renamed CommonMarkGFM.xs:

stash = gv_stashpvn("CommonMarkGFM::Node", 16, GV_ADD);

Okay, now I don’t know perlguts very well.
I don’t know what gv_stashpvn does (but I can find the docs for gv_stashpvn
and the name is a hint at what it does, in the terse nomenclature of Perl’s internal APIs)

The old string was 16 bytes long. Now it should be 19,
and that perfectly explains why I saw CommonMarkGFM::N.

So I get past that. and now the test suite segfaults.

1..10
ok 1 - use CommonMarkGFM;
ok 2 - markdown_to_html
ok 3 - 'parse_document' isa 'CommonMarkGFM::Node'
Segmentation fault

Hey, maybe this is the same as the first problem I fixed?

So I go looking for that problem, and I find it!
We have some lengths hard-coded in the typemap file
(no, aside from the fact it maps types, I don’t know what the typemap file does. I’m not usually hacking in perlapi).

T_NODE
    $var = (cmark_node*)S_sv2c(aTHX_ $arg, \"CommonMarkGFM::Node\", 19, cv,
                               \"$var\");
/* more omitted */

So I fix those problems, but they were not my problem.
I’m still getting a segfault…

I’m really quite excited at this moment! I have a problem that I can apply things I learned about from this wonderful blog by Julia Evans.

I’ve already been using a Dockerfile to try to compile and test this code so I just need to install Valgrind (and maybe gdb too) and see what happens.

So I run valgrind:

==16==  Access not within mapped region at address 0x88
==16==    at 0xEF4685C: cmark_render_html_with_mem (in /usr/local/lib64/perl5/auto/share/dist/Alien-libcmark_gfm/lib/libcmark-gfm.so.0.28.3.gfm.12)
==16==    by 0xED0A11D: XS_CommonMarkGFM__Node_interface_render (CommonMarkGFM.c:898)
==16==    by 0x4ED6814: Perl_pp_entersub (pp_hot.c:2888)
==16==    by 0x4ED4B05: Perl_runops_standard (run.c:40)
==16==    by 0x4E7D0D7: perl_run (perl.c:2435)
==16==    by 0x400E73: main (perlmain.c:117)

Huh, interesting. Okay, maybe I can use gdb to set a breakpoint there.

(gdb) b cmark_render_html_with_mem
Function "cmark_render_html_with_mem" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (cmark_render_html_with_mem) pending.

Our function doesn’t exist yet as it’s in a shared object that will get loaded later. This is fine — except it isn’t. My breakpoint never happens.

Huh! I guess (as it turns out, wrongly) that maybe I need to change my compilation options. And I also assume the segfaulting is because of something in the Perl extension code.

So maybe it’s that we compile with -02. My gcc is too old to support -Og, so let’s try -O0.

At this point, I’m just copying the line from make’s output and changing it. I just want to get some details in gdb damn it!

So I run the following:

perl Makefile.PL
make
gcc -c  -I/usr/local/lib64/perl5/auto/share/dist/Alien-libcmark_gfm/include -D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O0 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic   -DVERSION=\"0.280301\" -DXS_VERSION=\"0.280301\" -fPIC "-I/usr/lib64/perl5/CORE"   CommonMarkGFM.c
make install

Now I can run perl t/03_render.t again, under gdb, and see if I can get more details.

1..10
ok 1 - use CommonMarkGFM;
ok 2 - markdown_to_html
ok 3 - 'parse_document' isa 'CommonMarkGFM::Node'
ok 4 - parse_document works
ok 5 - render_xml
ok 6 - render_man
ok 7 - render_latex
ok 8 - render_commonmark
ok 9 - render functions return encoded utf8
ok 10 - render functions expect decoded utf8

My attitude thus far is clear in the tweet that followed:

https://twitter.com/dylan_hardison/status/954919967887011840

Now I proceed to have fun.

I spent the time trying to figure out what -O1 vs. -O0 did, and I wrote a script to repeatedly re-compile that one file
with different options. Along the way, I learned how to make gcc spit out what options it is compiling with (gcc -Q -v ...).
I had some false positives, and then I went to sleep.

After a period of sleep, figured out I wanted the list of flags as a difference between -O0 and -O1. I cleaned up my compile.pl script
and ran it.

The answer is: all of them are fine. -O0 and all the feature flags of -O1 result in no segfault either. Adding -O1 back brings back the segfault. After some more searching of the gcc docs, it is implied some optimizations are just directly tied to the O level.

My fun is now over, and I’ll do the more boring task of figuring out why my code is broken.

Staring at my from gdb’s output is this:

warning: Error disabling address space randomization: Operation not permitted

After a bit of searching, I find a fix for this to run the docker image
with --security-opt seccomp=unconfined.

And suddenly, breakpoints work.

and I can debug the root variable that is passed to cmark_render_html_with_mem… and nothing is wrong there.
Probably I need to re-compile libcmark-gfm with more debugging, I think. Suddenly, I realize that cmark_render_html_with_mem takes three arguments, and the Perl XS code is only passing it two.
How does this work? Well, it appears to cast a pointer to a function pointer, and call it. Calling a function pointer with fewer arguments than it is declared to with is undefined behavior, and I guess the rest of the behavior I observed was nasal demons.

(as an FYI, this argument difference is an API change between upstream libcmark and libcmark-gfm).

Finally, this third argument is a linked list of syntax extensions,
and it’s not clear yet how I will need to pass that back and forth between perl and C. This is also indicative that CommonMarkGFM will need to be a fork of CommonMark

One thought on “The Binding of libcmark-gfm: Segfaults and Debugging

Comments are closed.