So for various reasons — including “I want bugzilla.mozilla.org to support markdown” I have been working to get a binding to github’s fork of cmark.
For the first part of this, I got some help in #native
writing a Perl module in the Alien:: namespace,
Armed with this module, I’ve been seeking to make CommonMark
work against the GitHub fork of libcmark.
So far things have been going well, and I decide to just be dumb. The API for libcmark-gfm is a bit different, so I’ll rename the packages from
Of course, this was the first problem: I was getting
errors about a package not existing, a package named
CommonMarkGFM::N. What the hell does that mean? I haven’t changed much yet!
The problem was this bit of C code in the newly-renamed
stash = gv_stashpvn("CommonMarkGFM::Node", 16, GV_ADD);
Okay, now I don’t know perlguts very well.
I don’t know what
gv_stashpvn does (but I can find the docs for gv_stashpvn
and the name is a hint at what it does, in the terse nomenclature of Perl’s internal APIs)
The old string was 16 bytes long. Now it should be 19,
and that perfectly explains why I saw
So I get past that. and now the test suite segfaults.
1..10 ok 1 - use CommonMarkGFM; ok 2 - markdown_to_html ok 3 - 'parse_document' isa 'CommonMarkGFM::Node' Segmentation fault
Hey, maybe this is the same as the first problem I fixed?
So I go looking for that problem, and I find it!
We have some lengths hard-coded in the typemap file
(no, aside from the fact it maps types, I don’t know what the typemap file does. I’m not usually hacking in perlapi).
T_NODE $var = (cmark_node*)S_sv2c(aTHX_ $arg, \"CommonMarkGFM::Node\", 19, cv, \"$var\"); /* more omitted */
So I fix those problems, but they were not my problem.
I’m still getting a segfault…
I’m really quite excited at this moment! I have a problem that I can apply things I learned about from this wonderful blog by Julia Evans.
I’ve already been using a Dockerfile to try to compile and test this code so I just need to install Valgrind (and maybe gdb too) and see what happens.
So I run valgrind:
==16== Access not within mapped region at address 0x88 ==16== at 0xEF4685C: cmark_render_html_with_mem (in /usr/local/lib64/perl5/auto/share/dist/Alien-libcmark_gfm/lib/libcmark-gfm.so.0.28.3.gfm.12) ==16== by 0xED0A11D: XS_CommonMarkGFM__Node_interface_render (CommonMarkGFM.c:898) ==16== by 0x4ED6814: Perl_pp_entersub (pp_hot.c:2888) ==16== by 0x4ED4B05: Perl_runops_standard (run.c:40) ==16== by 0x4E7D0D7: perl_run (perl.c:2435) ==16== by 0x400E73: main (perlmain.c:117)
Huh, interesting. Okay, maybe I can use gdb to set a breakpoint there.
(gdb) b cmark_render_html_with_mem Function "cmark_render_html_with_mem" not defined. Make breakpoint pending on future shared library load? (y or [n]) y Breakpoint 1 (cmark_render_html_with_mem) pending.
Our function doesn’t exist yet as it’s in a shared object that will get loaded later. This is fine — except it isn’t. My breakpoint never happens.
Huh! I guess (as it turns out, wrongly) that maybe I need to change my compilation options. And I also assume the segfaulting is because of something in the Perl extension code.
So maybe it’s that we compile with
-02. My gcc is too old to support
-Og, so let’s try
At this point, I’m just copying the line from make’s output and changing it. I just want to get some details in gdb damn it!
So I run the following:
perl Makefile.PL make gcc -c -I/usr/local/lib64/perl5/auto/share/dist/Alien-libcmark_gfm/include -D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O0 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -DVERSION=\"0.280301\" -DXS_VERSION=\"0.280301\" -fPIC "-I/usr/lib64/perl5/CORE" CommonMarkGFM.c make install
Now I can run
perl t/03_render.t again, under gdb, and see if I can get more details.
1..10 ok 1 - use CommonMarkGFM; ok 2 - markdown_to_html ok 3 - 'parse_document' isa 'CommonMarkGFM::Node' ok 4 - parse_document works ok 5 - render_xml ok 6 - render_man ok 7 - render_latex ok 8 - render_commonmark ok 9 - render functions return encoded utf8 ok 10 - render functions expect decoded utf8
My attitude thus far is clear in the tweet that followed:
Incredibly fun news! when I compile CommonMarkGFM with -O2: Segmentation fault But with -O0, it doesn't happen.—
Dylan Hardison (@dylan_hardison) January 21, 2018
Now I proceed to have fun.
I spent the time trying to figure out what
-O0 did, and I wrote a script to repeatedly re-compile that one file
with different options. Along the way, I learned how to make
gcc spit out what options it is compiling with (
gcc -Q -v ...).
I had some false positives, and then I went to sleep.
After a period of sleep, figured out I wanted the list of flags as a difference between -O0 and -O1. I cleaned up my compile.pl script
and ran it.
The answer is: all of them are fine.
-O0 and all the feature flags of
-O1 result in no segfault either. Adding
-O1 back brings back the segfault. After some more searching of the gcc docs, it is implied some optimizations are just directly tied to the
My fun is now over, and I’ll do the more boring task of figuring out why my code is broken.
Staring at my from gdb’s output is this:
warning: Error disabling address space randomization: Operation not permitted
After a bit of searching, I find a fix for this to run the docker image
And suddenly, breakpoints work.
and I can debug the
root variable that is passed to
cmark_render_html_with_mem… and nothing is wrong there.
Probably I need to re-compile libcmark-gfm with more debugging, I think. Suddenly, I realize that
cmark_render_html_with_mem takes three arguments, and the Perl XS code is only passing it two.
How does this work? Well, it appears to cast a pointer to a function pointer, and call it. Calling a function pointer with fewer arguments than it is declared to with is undefined behavior, and I guess the rest of the behavior I observed was nasal demons.
(as an FYI, this argument difference is an API change between upstream
Finally, this third argument is a linked list of syntax extensions,
and it’s not clear yet how I will need to pass that back and forth between perl and C. This is also indicative that
CommonMarkGFM will need to be a fork of CommonMark