<?xml version='1.0' encoding='utf-8'?>
<?xml-stylesheet type="text/xsl" href="/sheet.xsl"?><rss version="2.0"><channel><title>Compilade's Blog</title><item><title>How to pack ternary numbers in 8-bit bytes</title><link>https://compilade.net/blog/ternary-packing</link><description>Packing ternary numbers with efficient SIMD-friendly unpacking on binary computers</description><ns0:encoded xmlns:ns0="http://purl.org/rss/1.0/modules/content/">&lt;main morss_own_score="2.8520345252774355" morss_score="69.7178313565915"&gt;

&lt;h1&gt;How to pack ternary numbers in 8-bit bytes&lt;/h1&gt;
&lt;p&gt;with efficient SIMD-friendly unpacking&lt;/p&gt;
&lt;hr&gt;

&lt;span&gt;Published: &lt;time&gt;2024-06-26&lt;/time&gt;
&lt;/span&gt;


&lt;p&gt;There are 3 possible values in a digit of a ternary number.
3 possible values, which could actually be anything.&lt;/p&gt;




&lt;text&gt;-1&lt;/text&gt;

&lt;text&gt;0&lt;/text&gt;

&lt;text&gt;1&lt;/text&gt;




&lt;text&gt;0&lt;/text&gt;

&lt;text&gt;1&lt;/text&gt;

&lt;text&gt;2&lt;/text&gt;



&lt;p&gt;I've been recently nerd-sniped&lt;sup&gt;&lt;a href="https://compilade.net/blog/ternary-packing#fn:1"&gt;1&lt;/a&gt;&lt;/sup&gt; into trying to pack the ternary weights of &lt;a href="https://arxiv.org/abs/2402.17764"&gt;BitNet b1.58&lt;/a&gt; into something close to that theoretical ideal of &lt;code&gt;log(3) / log(2)&lt;/code&gt; bits&lt;sup&gt;&lt;a href="https://compilade.net/blog/ternary-packing#fn:2"&gt;2&lt;/a&gt;&lt;/sup&gt; per ternary digit.&lt;/p&gt;
&lt;p&gt;I'll be calling a "ternary digit" a "trit", like a "binary digit" is called a "bit".&lt;/p&gt;
&lt;h2&gt;Block size&lt;/h2&gt;
&lt;p&gt;Since the goal of this is to allow fast &lt;strong&gt;&lt;em&gt;parallel&lt;/em&gt;&lt;/strong&gt; unpacking, blocks of trits can't be infinitely big.
A small "block" size needs to be found, ideally one which is both efficient with information density and which is convenient on current hardware.&lt;/p&gt;
&lt;p&gt;To find a good block size, we'll need to find a power of 3 for which the next power of 2 is very close.&lt;/p&gt;

&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;trits&lt;/th&gt;
&lt;th&gt;3&lt;sup&gt;trits&lt;/sup&gt;&lt;/th&gt;
&lt;th&gt;bits&lt;/th&gt;
&lt;th&gt;2&lt;sup&gt;bits&lt;/sup&gt;&lt;/th&gt;
&lt;th&gt;bits per trit&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;27&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;td&gt;1.666...&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;81&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;128&lt;/td&gt;
&lt;td&gt;1.75&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;243&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;256&lt;/td&gt;
&lt;td&gt;1.6&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;It's very fortunate that 5 trits fit quite tight into 8 bits at &lt;code&gt;1.6 bits&lt;/code&gt; per trit.
When compared to perfect packing, this is &lt;code&gt;99.06%&lt;/code&gt; efficient.&lt;/p&gt;
&lt;h2&gt;1.6 bits per trit&lt;/h2&gt;
&lt;p&gt;The basic idea with this packing scheme is simply to make a number out of the ternary digits.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&lt;span&gt;def&lt;/span&gt; &lt;span&gt;pack_number&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;digits&lt;/span&gt;&lt;span&gt;:&lt;/span&gt; &lt;span&gt;list&lt;/span&gt;&lt;span&gt;[&lt;/span&gt;&lt;span&gt;int&lt;/span&gt;&lt;span&gt;],&lt;/span&gt; &lt;span&gt;base&lt;/span&gt;&lt;span&gt;:&lt;/span&gt; &lt;span&gt;int&lt;/span&gt;&lt;span&gt;)&lt;/span&gt; &lt;span&gt;-&amp;gt;&lt;/span&gt; &lt;span&gt;int&lt;/span&gt;&lt;span&gt;:&lt;/span&gt;
    &lt;span&gt;number&lt;/span&gt; &lt;span&gt;=&lt;/span&gt; &lt;span&gt;0&lt;/span&gt;

    &lt;span&gt;for&lt;/span&gt; &lt;span&gt;digit&lt;/span&gt; &lt;span&gt;in&lt;/span&gt; &lt;span&gt;digits&lt;/span&gt;&lt;span&gt;:&lt;/span&gt;
      &lt;span&gt;assert&lt;/span&gt; &lt;span&gt;digit&lt;/span&gt; &lt;span&gt;&amp;lt;&lt;/span&gt; &lt;span&gt;base&lt;/span&gt;

      &lt;span&gt;number&lt;/span&gt; &lt;span&gt;=&lt;/span&gt; &lt;span&gt;number&lt;/span&gt; &lt;span&gt;*&lt;/span&gt; &lt;span&gt;base&lt;/span&gt;
      &lt;span&gt;number&lt;/span&gt; &lt;span&gt;=&lt;/span&gt; &lt;span&gt;number&lt;/span&gt; &lt;span&gt;+&lt;/span&gt; &lt;span&gt;digit&lt;/span&gt;

    &lt;span&gt;return&lt;/span&gt; &lt;span&gt;number&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Packing trits into bytes should be similar enough.&lt;/p&gt;
&lt;h3&gt;Fast multiplication unpacking&lt;/h3&gt;
&lt;p&gt;While repeated remainder and divisions can be used to extract the digits of a number, the problem with divisions and modulo is that they are not usually supported on integers in &lt;abbr title="Single Instruction Multiple Data"&gt;SIMD&lt;/abbr&gt; programming.&lt;/p&gt;
&lt;p&gt;A way around this is &lt;em&gt;obviously&lt;/em&gt; to view numbers differently.&lt;/p&gt;
&lt;p&gt;Would it be nice if instead of extracting the least significant digit with modulo, we could extract the most significant digit with a multiplication?&lt;/p&gt;
&lt;p&gt;Fixed point numbers to the rescue!&lt;/p&gt;




&lt;text&gt;0x7F.&lt;/text&gt;

&lt;text&gt;11201.&lt;/text&gt;

&lt;text&gt;.11201&lt;/text&gt;

&lt;text&gt;0x0.86&lt;/text&gt;

&lt;text&gt;0x86.&lt;/text&gt;


&lt;text&gt;same number&lt;/text&gt;


&lt;text&gt;divide by 243&lt;/text&gt;


&lt;text&gt;same number, round up&lt;/text&gt;


&lt;text&gt;multiply by 256&lt;/text&gt;



&lt;p&gt;Tada!&lt;/p&gt;
&lt;p&gt;Now digits can be easily extracted from the top two bits of the resulting 10-bit number when multiplying this 8-bit byte by 3.&lt;/p&gt;
&lt;p&gt;This is much more convenient than modulo when unpacking with &lt;abbr title="Single Instruction Multiple Data"&gt;SIMD&lt;/abbr&gt;.&lt;/p&gt;
&lt;p&gt;The only place where there are divisions in this scheme when packing trits into bytes.
This assumes that packing is done less often than unpacking, which is very true in the context of &lt;abbr title="Large Language Model"&gt;LLM&lt;/abbr&gt; weights.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&lt;span&gt;# Take a list of values in -1, 0, 1 and pack them in bytes&lt;/span&gt;
&lt;span&gt;def&lt;/span&gt; &lt;span&gt;pack_trits&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;digits&lt;/span&gt;&lt;span&gt;:&lt;/span&gt; &lt;span&gt;list&lt;/span&gt;&lt;span&gt;[&lt;/span&gt;&lt;span&gt;int&lt;/span&gt;&lt;span&gt;])&lt;/span&gt; &lt;span&gt;-&amp;gt;&lt;/span&gt; &lt;span&gt;bytearray&lt;/span&gt;&lt;span&gt;:&lt;/span&gt;
    &lt;span&gt;assert&lt;/span&gt; &lt;span&gt;len&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;digits&lt;/span&gt;&lt;span&gt;)&lt;/span&gt; &lt;span&gt;%&lt;/span&gt; &lt;span&gt;5&lt;/span&gt; &lt;span&gt;==&lt;/span&gt; &lt;span&gt;0&lt;/span&gt;  &lt;span&gt;# padding isn't handled here&lt;/span&gt;

    &lt;span&gt;n_bytes&lt;/span&gt; &lt;span&gt;=&lt;/span&gt; &lt;span&gt;len&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;digits&lt;/span&gt;&lt;span&gt;)&lt;/span&gt; &lt;span&gt;//&lt;/span&gt; &lt;span&gt;5&lt;/span&gt;
    &lt;span&gt;packed&lt;/span&gt; &lt;span&gt;=&lt;/span&gt; &lt;span&gt;bytearray&lt;/span&gt;&lt;span&gt;()&lt;/span&gt;

    &lt;span&gt;for&lt;/span&gt; &lt;span&gt;i&lt;/span&gt; &lt;span&gt;in&lt;/span&gt; &lt;span&gt;range&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;n_bytes&lt;/span&gt;&lt;span&gt;):&lt;/span&gt;
        &lt;span&gt;b&lt;/span&gt; &lt;span&gt;=&lt;/span&gt; &lt;span&gt;0&lt;/span&gt;
        &lt;span&gt;for&lt;/span&gt; &lt;span&gt;j&lt;/span&gt; &lt;span&gt;in&lt;/span&gt; &lt;span&gt;range&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;5&lt;/span&gt;&lt;span&gt;):&lt;/span&gt;
            &lt;span&gt;digit&lt;/span&gt; &lt;span&gt;=&lt;/span&gt; &lt;span&gt;digits&lt;/span&gt;&lt;span&gt;[&lt;/span&gt;&lt;span&gt;5&lt;/span&gt;&lt;span&gt;*&lt;/span&gt;&lt;span&gt;i&lt;/span&gt; &lt;span&gt;+&lt;/span&gt; &lt;span&gt;j&lt;/span&gt;&lt;span&gt;]&lt;/span&gt;
            &lt;span&gt;digit&lt;/span&gt; &lt;span&gt;=&lt;/span&gt; &lt;span&gt;max&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;-&lt;/span&gt;&lt;span&gt;1&lt;/span&gt;&lt;span&gt;,&lt;/span&gt; &lt;span&gt;min&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;digit&lt;/span&gt;&lt;span&gt;,&lt;/span&gt; &lt;span&gt;1&lt;/span&gt;&lt;span&gt;))&lt;/span&gt;  &lt;span&gt;# clamp between -1 and 1&lt;/span&gt;
            &lt;span&gt;digit&lt;/span&gt; &lt;span&gt;+=&lt;/span&gt; &lt;span&gt;1&lt;/span&gt;  &lt;span&gt;# from -1, 0, 1 to 0, 1, 2&lt;/span&gt;
            &lt;span&gt;b&lt;/span&gt; &lt;span&gt;*=&lt;/span&gt; &lt;span&gt;3&lt;/span&gt;
            &lt;span&gt;b&lt;/span&gt; &lt;span&gt;+=&lt;/span&gt; &lt;span&gt;digit&lt;/span&gt;

        &lt;span&gt;b&lt;/span&gt; &lt;span&gt;=&lt;/span&gt; &lt;span&gt;((&lt;/span&gt;&lt;span&gt;b&lt;/span&gt; &lt;span&gt;*&lt;/span&gt; &lt;span&gt;256&lt;/span&gt;&lt;span&gt;)&lt;/span&gt; &lt;span&gt;+&lt;/span&gt; &lt;span&gt;(&lt;/span&gt;&lt;span&gt;243&lt;/span&gt; &lt;span&gt;-&lt;/span&gt; &lt;span&gt;1&lt;/span&gt;&lt;span&gt;))&lt;/span&gt; &lt;span&gt;//&lt;/span&gt; &lt;span&gt;243&lt;/span&gt;

        &lt;span&gt;packed&lt;/span&gt;&lt;span&gt;.&lt;/span&gt;&lt;span&gt;append&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;b&lt;/span&gt;&lt;span&gt;)&lt;/span&gt;

    &lt;span&gt;return&lt;/span&gt; &lt;span&gt;packed&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The relevant interesting line is this one:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;        &lt;span&gt;b&lt;/span&gt; &lt;span&gt;=&lt;/span&gt; &lt;span&gt;((&lt;/span&gt;&lt;span&gt;b&lt;/span&gt; &lt;span&gt;*&lt;/span&gt; &lt;span&gt;256&lt;/span&gt;&lt;span&gt;)&lt;/span&gt; &lt;span&gt;+&lt;/span&gt; &lt;span&gt;(&lt;/span&gt;&lt;span&gt;243&lt;/span&gt; &lt;span&gt;-&lt;/span&gt; &lt;span&gt;1&lt;/span&gt;&lt;span&gt;))&lt;/span&gt; &lt;span&gt;//&lt;/span&gt; &lt;span&gt;243&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;It does what is depicted in the diagram above, but multiplication is done first because these are integer operations.
Doing a ceiling division here is necessary to cancel the off-by-one error from truncating when extracting digits later.&lt;/p&gt;
&lt;p&gt;To unpack &lt;em&gt;without&lt;/em&gt; using the modulo operator:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&lt;span&gt;def&lt;/span&gt; &lt;span&gt;unpack_trits&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;packed&lt;/span&gt;&lt;span&gt;:&lt;/span&gt; &lt;span&gt;bytes&lt;/span&gt;&lt;span&gt;)&lt;/span&gt; &lt;span&gt;-&amp;gt;&lt;/span&gt; &lt;span&gt;list&lt;/span&gt;&lt;span&gt;[&lt;/span&gt;&lt;span&gt;int&lt;/span&gt;&lt;span&gt;]:&lt;/span&gt;
    &lt;span&gt;trits&lt;/span&gt;&lt;span&gt;:&lt;/span&gt; &lt;span&gt;list&lt;/span&gt;&lt;span&gt;[&lt;/span&gt;&lt;span&gt;int&lt;/span&gt;&lt;span&gt;]&lt;/span&gt; &lt;span&gt;=&lt;/span&gt; &lt;span&gt;[]&lt;/span&gt;

    &lt;span&gt;for&lt;/span&gt; &lt;span&gt;byte&lt;/span&gt; &lt;span&gt;in&lt;/span&gt; &lt;span&gt;packed&lt;/span&gt;&lt;span&gt;:&lt;/span&gt;
        &lt;span&gt;b&lt;/span&gt; &lt;span&gt;=&lt;/span&gt; &lt;span&gt;byte&lt;/span&gt;
        &lt;span&gt;for&lt;/span&gt; &lt;span&gt;i&lt;/span&gt; &lt;span&gt;in&lt;/span&gt; &lt;span&gt;range&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;5&lt;/span&gt;&lt;span&gt;):&lt;/span&gt;
            &lt;span&gt;b&lt;/span&gt; &lt;span&gt;=&lt;/span&gt; &lt;span&gt;b&lt;/span&gt; &lt;span&gt;*&lt;/span&gt; &lt;span&gt;3&lt;/span&gt;
            &lt;span&gt;trit&lt;/span&gt; &lt;span&gt;=&lt;/span&gt; &lt;span&gt;b&lt;/span&gt; &lt;span&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span&gt;8&lt;/span&gt;
            &lt;span&gt;trits&lt;/span&gt;&lt;span&gt;.&lt;/span&gt;&lt;span&gt;append&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;trit&lt;/span&gt; &lt;span&gt;-&lt;/span&gt; &lt;span&gt;1&lt;/span&gt;&lt;span&gt;)&lt;/span&gt;  &lt;span&gt;# 0, 1, 2 =&amp;gt; -1, 0, 1&lt;/span&gt;
            &lt;span&gt;b&lt;/span&gt; &lt;span&gt;=&lt;/span&gt; &lt;span&gt;b&lt;/span&gt; &lt;span&gt;&amp;amp;&lt;/span&gt; &lt;span&gt;0xFF&lt;/span&gt;

    &lt;span&gt;return&lt;/span&gt; &lt;span&gt;trits&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;To convince myself that this works, I wrote a C program checking that this really is lossless:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&lt;span&gt;#include&lt;/span&gt; &lt;span&gt;&amp;lt;stdint.h&amp;gt;&lt;/span&gt;
&lt;span&gt;#include&lt;/span&gt; &lt;span&gt;&amp;lt;stdio.h&amp;gt;&lt;/span&gt;
&lt;span&gt;#include&lt;/span&gt; &lt;span&gt;&amp;lt;string.h&amp;gt;&lt;/span&gt;

&lt;span&gt;int&lt;/span&gt; &lt;span&gt;main&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;void&lt;/span&gt;&lt;span&gt;)&lt;/span&gt; &lt;span&gt;{&lt;/span&gt;
    &lt;span&gt;char&lt;/span&gt; &lt;span&gt;s1&lt;/span&gt;&lt;span&gt;[&lt;/span&gt;&lt;span&gt;6&lt;/span&gt;&lt;span&gt;]&lt;/span&gt; &lt;span&gt;=&lt;/span&gt; &lt;span&gt;{&lt;/span&gt;&lt;span&gt;0&lt;/span&gt;&lt;span&gt;};&lt;/span&gt;
    &lt;span&gt;char&lt;/span&gt; &lt;span&gt;s2&lt;/span&gt;&lt;span&gt;[&lt;/span&gt;&lt;span&gt;6&lt;/span&gt;&lt;span&gt;]&lt;/span&gt; &lt;span&gt;=&lt;/span&gt; &lt;span&gt;{&lt;/span&gt;&lt;span&gt;0&lt;/span&gt;&lt;span&gt;};&lt;/span&gt;

    &lt;span&gt;for&lt;/span&gt; &lt;span&gt;(&lt;/span&gt;&lt;span&gt;uint8_t&lt;/span&gt; &lt;span&gt;i&lt;/span&gt; &lt;span&gt;=&lt;/span&gt; &lt;span&gt;0&lt;/span&gt;&lt;span&gt;;&lt;/span&gt; &lt;span&gt;i&lt;/span&gt; &lt;span&gt;&amp;lt;&lt;/span&gt; &lt;span&gt;243&lt;/span&gt;&lt;span&gt;;&lt;/span&gt; &lt;span&gt;++&lt;/span&gt;&lt;span&gt;i&lt;/span&gt;&lt;span&gt;)&lt;/span&gt; &lt;span&gt;{&lt;/span&gt;
        &lt;span&gt;uint8_t&lt;/span&gt; &lt;span&gt;n&lt;/span&gt; &lt;span&gt;=&lt;/span&gt; &lt;span&gt;i&lt;/span&gt;&lt;span&gt;;&lt;/span&gt;
        &lt;span&gt;// Get the number representation in base 3&lt;/span&gt;
        &lt;span&gt;// by repeatedly extracting the least significant digit with modulo&lt;/span&gt;
        &lt;span&gt;for&lt;/span&gt; &lt;span&gt;(&lt;/span&gt;&lt;span&gt;int&lt;/span&gt; &lt;span&gt;j&lt;/span&gt; &lt;span&gt;=&lt;/span&gt; &lt;span&gt;5&lt;/span&gt;&lt;span&gt;;&lt;/span&gt; &lt;span&gt;j&lt;/span&gt;&lt;span&gt;--&lt;/span&gt; &lt;span&gt;&amp;gt;&lt;/span&gt; &lt;span&gt;0&lt;/span&gt;&lt;span&gt;;)&lt;/span&gt; &lt;span&gt;{&lt;/span&gt;
            &lt;span&gt;s1&lt;/span&gt;&lt;span&gt;[&lt;/span&gt;&lt;span&gt;j&lt;/span&gt;&lt;span&gt;]&lt;/span&gt; &lt;span&gt;=&lt;/span&gt; &lt;span&gt;(&lt;/span&gt;&lt;span&gt;n&lt;/span&gt; &lt;span&gt;%&lt;/span&gt; &lt;span&gt;3&lt;/span&gt;&lt;span&gt;)&lt;/span&gt; &lt;span&gt;+&lt;/span&gt; &lt;span&gt;'0'&lt;/span&gt;&lt;span&gt;;&lt;/span&gt;
            &lt;span&gt;n&lt;/span&gt; &lt;span&gt;/=&lt;/span&gt; &lt;span&gt;3&lt;/span&gt;&lt;span&gt;;&lt;/span&gt;
        &lt;span&gt;}&lt;/span&gt;
        &lt;span&gt;// Turn that number into a fixed-point number smaller than 1&lt;/span&gt;
        &lt;span&gt;uint8_t&lt;/span&gt; &lt;span&gt;q&lt;/span&gt; &lt;span&gt;=&lt;/span&gt; &lt;span&gt;(((&lt;/span&gt;&lt;span&gt;uint16_t&lt;/span&gt;&lt;span&gt;)&lt;/span&gt; &lt;span&gt;i&lt;/span&gt;&lt;span&gt;)&lt;/span&gt; &lt;span&gt;*&lt;/span&gt; &lt;span&gt;256&lt;/span&gt; &lt;span&gt;+&lt;/span&gt; &lt;span&gt;(&lt;/span&gt;&lt;span&gt;243&lt;/span&gt; &lt;span&gt;-&lt;/span&gt; &lt;span&gt;1&lt;/span&gt;&lt;span&gt;))&lt;/span&gt; &lt;span&gt;/&lt;/span&gt; &lt;span&gt;243&lt;/span&gt;&lt;span&gt;;&lt;/span&gt;
        &lt;span&gt;// This extracts the most significant digit first&lt;/span&gt;
        &lt;span&gt;for&lt;/span&gt; &lt;span&gt;(&lt;/span&gt;&lt;span&gt;int&lt;/span&gt; &lt;span&gt;j&lt;/span&gt; &lt;span&gt;=&lt;/span&gt; &lt;span&gt;0&lt;/span&gt;&lt;span&gt;;&lt;/span&gt; &lt;span&gt;j&lt;/span&gt; &lt;span&gt;&amp;lt;&lt;/span&gt; &lt;span&gt;5&lt;/span&gt;&lt;span&gt;;&lt;/span&gt; &lt;span&gt;++&lt;/span&gt;&lt;span&gt;j&lt;/span&gt;&lt;span&gt;)&lt;/span&gt; &lt;span&gt;{&lt;/span&gt;
            &lt;span&gt;uint16_t&lt;/span&gt; &lt;span&gt;m&lt;/span&gt; &lt;span&gt;=&lt;/span&gt; &lt;span&gt;q&lt;/span&gt; &lt;span&gt;*&lt;/span&gt; &lt;span&gt;3&lt;/span&gt;&lt;span&gt;;&lt;/span&gt;
            &lt;span&gt;s2&lt;/span&gt;&lt;span&gt;[&lt;/span&gt;&lt;span&gt;j&lt;/span&gt;&lt;span&gt;]&lt;/span&gt; &lt;span&gt;=&lt;/span&gt; &lt;span&gt;(&lt;/span&gt;&lt;span&gt;m&lt;/span&gt; &lt;span&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span&gt;8&lt;/span&gt;&lt;span&gt;)&lt;/span&gt; &lt;span&gt;+&lt;/span&gt; &lt;span&gt;'0'&lt;/span&gt;&lt;span&gt;;&lt;/span&gt;
            &lt;span&gt;q&lt;/span&gt; &lt;span&gt;=&lt;/span&gt; &lt;span&gt;m&lt;/span&gt; &lt;span&gt;&amp;amp;&lt;/span&gt; &lt;span&gt;0xFF&lt;/span&gt;&lt;span&gt;;&lt;/span&gt;
        &lt;span&gt;}&lt;/span&gt;
        &lt;span&gt;printf&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;"%s, %s: %s&lt;/span&gt;&lt;span&gt;\n&lt;/span&gt;&lt;span&gt;"&lt;/span&gt;&lt;span&gt;,&lt;/span&gt; &lt;span&gt;s1&lt;/span&gt;&lt;span&gt;,&lt;/span&gt; &lt;span&gt;s2&lt;/span&gt;&lt;span&gt;,&lt;/span&gt; &lt;span&gt;strcmp&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;s1&lt;/span&gt;&lt;span&gt;,&lt;/span&gt; &lt;span&gt;s2&lt;/span&gt;&lt;span&gt;)&lt;/span&gt; &lt;span&gt;==&lt;/span&gt; &lt;span&gt;0&lt;/span&gt; &lt;span&gt;?&lt;/span&gt; &lt;span&gt;"&lt;/span&gt;&lt;span&gt;\033&lt;/span&gt;&lt;span&gt;[1;32mPASS&lt;/span&gt;&lt;span&gt;\033&lt;/span&gt;&lt;span&gt;[0m"&lt;/span&gt; &lt;span&gt;:&lt;/span&gt; &lt;span&gt;"&lt;/span&gt;&lt;span&gt;\033&lt;/span&gt;&lt;span&gt;[1;31mFAIL&lt;/span&gt;&lt;span&gt;\033&lt;/span&gt;&lt;span&gt;[0m"&lt;/span&gt;&lt;span&gt;);&lt;/span&gt;
    &lt;span&gt;}&lt;/span&gt;

    &lt;span&gt;return&lt;/span&gt; &lt;span&gt;0&lt;/span&gt;&lt;span&gt;;&lt;/span&gt;
&lt;span&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Compile and run with:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&lt;span&gt;$ &lt;/span&gt;gcc ternary-packing.c -o ternary-packing
&lt;span&gt;$ &lt;/span&gt;./ternary-packing
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;And I'm getting &lt;code&gt;PASS&lt;/code&gt; for each of the 243 ternary numbers which fit in 8 bits.&lt;/p&gt;
&lt;p&gt;And this is the technique used in the ternary types in &lt;code&gt;llama.cpp&lt;/code&gt; for TriLMs and BitNet b1.58, for which the pull request is &lt;a href="https://github.com/ggml-org/llama.cpp/pull/8151"&gt;https://github.com/ggml-org/llama.cpp/pull/8151&lt;/a&gt;, with &lt;abbr title="Single Instruction Multiple Data"&gt;SIMD&lt;/abbr&gt; implementations for both AVX2 and ARM NEON.&lt;/p&gt;

&lt;hr&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;obviously referring to &lt;a href="https://xkcd.com/356/"&gt;https://xkcd.com/356/&lt;/a&gt;, but the initial motivation actually started from &lt;a href="https://github.com/ggml-org/llama.cpp/pull/7931#discussion_r1640265346"&gt;this review comment I posted&lt;/a&gt; on the initial BitNet b1.58 pull-request for &lt;code&gt;llama.cpp&lt;/code&gt; &lt;a href="https://compilade.net/blog/ternary-packing#fnref:1" title="Jump back to footnote 1 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;log(3) / log(2)&lt;/code&gt; is also known as &lt;code&gt;1.584962500721156&lt;/code&gt;. &lt;a href="https://compilade.net/blog/ternary-packing#fnref:2" title="Jump back to footnote 2 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;/main&gt;
</ns0:encoded><pubDate>Wed, 26 Jun 2024 00:00:00 UTC</pubDate></item><item><title>Why Python over Lua over shell scripts to build my blog</title><link>https://compilade.net/blog/why-python-ssg</link><description>The best way I found to completely understand how to use a static site generator was to build my own. I now understand at least one!</description><ns0:encoded xmlns:ns0="http://purl.org/rss/1.0/modules/content/">&lt;main morss_own_score="2.698645598194131" morss_score="163.375244105817"&gt;

&lt;h1&gt;Why Python over Lua over shell scripts to build my blog&lt;/h1&gt;
&lt;p&gt;The best way I found to completely understand how to use a static site generator was to build my own. I now understand at least one!&lt;/p&gt;
&lt;hr&gt;

&lt;span&gt;Published: &lt;time&gt;2023-10-22&lt;/time&gt;
&lt;/span&gt;


&lt;p&gt;It's very common for the first post in a technical blog to be about how the site was built.
I'm not making an exception here; this is an explanation of my many attempts at making a static site generator (and why I ended up using Python).&lt;/p&gt;
&lt;h2&gt;A little history&lt;/h2&gt;
&lt;p&gt;Before I started working on building my own static site generator, but after I first got this domain name, I already had some kind of Fossil repositories, so &lt;a href="https://fossil.compilade.net/compilade/index"&gt;I used one as a home page&lt;/a&gt; for a while.&lt;/p&gt;
&lt;p&gt;&lt;a href="https://fossil-scm.org/"&gt;Fossil&lt;/a&gt; is nice and all, but since it's mainly a version control system, there are a bunch of pages in the web interface that are not necessary for a personal website (help pages, logins, etc.), and some features are missing (like &lt;a href="https://sitemaps.org/protocol.html"&gt;&lt;code&gt;sitemap.xml&lt;/code&gt;&lt;/a&gt; generation).&lt;/p&gt;
&lt;p&gt;Even if Fossil's source code is relatively easy to extend, removing parts of the web interface was a big enough change that it made me consider "easier" ways instead.&lt;/p&gt;
&lt;p&gt;I figured that my preferred way would involve building a static site from Markdown files and some kind of script.&lt;/p&gt;
&lt;h2&gt;The POSIX attempt&lt;/h2&gt;
&lt;p&gt;&lt;a href="https://git.compilade.net/fch/blog/src/commit/b055a1690a16896af7784a8dc922ce5a0f1211ab/build.sh"&gt;My initial attempt&lt;/a&gt; used a POSIX shell script that called Pandoc with some custom &lt;a href="https://pandoc.org/lua-filters.html"&gt;Lua filters&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;At first, the script was simple enough, but when I got around to generate the blog's &lt;a href="https://validator.w3.org/feed/docs/atom.html"&gt;Atom&lt;/a&gt; feed &lt;em&gt;and&lt;/em&gt; the page which lists the blog posts (which both share lots of information), I &lt;em&gt;really&lt;/em&gt; felt like I needed arrays and/or hash tables, neither of which can be straightforwardly done in POSIX shell scripts (and no, a big string of many substrings separated by rare control characters doesn't count).&lt;/p&gt;
&lt;p&gt;At that point, I probably should simply have assumed that I could use all of &lt;code&gt;bash&lt;/code&gt;'s features instead of limiting my script to POSIX compatibility.
But &lt;em&gt;no&lt;/em&gt;, I made a POSIX shell script, so that meant a lot of calls to &lt;code&gt;sed&lt;/code&gt; and &lt;code&gt;tr&lt;/code&gt; that could have been avoided with some &lt;code&gt;bash&lt;/code&gt; variable expansion magic.&lt;/p&gt;
&lt;p&gt;Anyway, I was ready to rewrite it all in another scripting language.
I still wanted to use some kind of "small" programming language to minimize the build dependencies of my site.&lt;/p&gt;
&lt;h2&gt;Lua, simpler?&lt;/h2&gt;
&lt;p&gt;I was already using Lua filters with Pandoc, so it wasn't much of a leap to also use Lua for the rest of the build script.&lt;/p&gt;
&lt;p&gt;So &lt;a href="https://git.compilade.net/fch/blog/src/branch/main/build.lua"&gt;I rewrote the script in Lua&lt;/a&gt;, including some kind of custom command-line arguments parser (because Lua doesn't have something like &lt;a href="https://manpages.debian.org/bookworm/bash/bash.1.en.html#getopts"&gt;&lt;code&gt;getopts&lt;/code&gt;&lt;/a&gt; built-in).&lt;/p&gt;
&lt;p&gt;Apart from args parsing, the Lua script was much cleaner than its shell script counterpart.
I could &lt;em&gt;finally&lt;/em&gt; use lists and tables instead of putting everything in strings!
And I could even include the build script as a module to access my helper functions in the Lua filters passed to Pandoc! (which strangely did not add measurable overhead)&lt;/p&gt;
&lt;h3&gt;Pandoc troubles&lt;/h3&gt;
&lt;p&gt;&lt;a href="https://pandoc.org/"&gt;Pandoc&lt;/a&gt; is very good software and is extremely useful.&lt;/p&gt;
&lt;p&gt;But Pandoc is also a bit slow to run (&lt;code&gt;20 ms&lt;/code&gt; on my machine&lt;sup&gt;&lt;a href="https://compilade.net/blog/why-python-ssg#fn:my-machine"&gt;1&lt;/a&gt;&lt;/sup&gt;, compared to &lt;code&gt;2 ms&lt;/code&gt; for lowdown), and this compounds with the fact that one Pandoc call can only process a single file.
This means the &lt;em&gt;minimum&lt;/em&gt; time it would take to run Pandoc on only 5 files would be &lt;code&gt;100 ms&lt;/code&gt;!
And this excludes the additional time it takes when Pandoc has to do syntax highlighting (around &lt;code&gt;300 ms&lt;/code&gt; per file)!&lt;/p&gt;
&lt;p&gt;So I made &lt;a href="https://git.compilade.net/fch/blog/src/commit/50d4163b1020d39f66a8008cf3dd8a840360b1c9/build.lua#L255"&gt;some one-liner abomination to run Pandoc in parallel&lt;/a&gt; to lessen the effect of Pandoc's relatively slow run time.&lt;/p&gt;
&lt;p&gt;I hope you like scrolling sideways, because this line is waaaaaay too long (and somewhat of an unmaintainable mess) :&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&lt;span&gt;os.execute&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;"find "&lt;/span&gt;&lt;span&gt;..&lt;/span&gt;&lt;span&gt;escape&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;dir&lt;/span&gt;&lt;span&gt;)..&lt;/span&gt;&lt;span&gt;" -maxdepth 1 -iname "&lt;/span&gt;&lt;span&gt;..&lt;/span&gt;&lt;span&gt;escape&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;glob&lt;/span&gt;&lt;span&gt;)..(&lt;/span&gt;&lt;span&gt;exclude_pat&lt;/span&gt; &lt;span&gt;and&lt;/span&gt; &lt;span&gt;" '!' -iname "&lt;/span&gt;&lt;span&gt;..&lt;/span&gt;&lt;span&gt;escape&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;exclude_pat&lt;/span&gt;&lt;span&gt;)&lt;/span&gt; &lt;span&gt;or&lt;/span&gt; &lt;span&gt;""&lt;/span&gt;&lt;span&gt;)..&lt;/span&gt;&lt;span&gt;" -print0 | sed -z -E -e "&lt;/span&gt;&lt;span&gt;..&lt;/span&gt;&lt;span&gt;escape&lt;/span&gt;&lt;span&gt;([[&lt;/span&gt;&lt;span&gt;s&lt;/span&gt;&lt;span&gt;,&lt;/span&gt;&lt;span&gt;^&lt;/span&gt;&lt;span&gt;(.&lt;/span&gt;&lt;span&gt;*/&lt;/span&gt;&lt;span&gt;)&lt;/span&gt;&lt;span&gt;?&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;\&lt;/span&gt;&lt;span&gt;.&lt;/span&gt;&lt;span&gt;?&lt;/span&gt;&lt;span&gt;[&lt;/span&gt;&lt;span&gt;^&lt;/span&gt;&lt;span&gt;.&lt;/span&gt;&lt;span&gt;/&lt;/span&gt;&lt;span&gt;]&lt;/span&gt;&lt;span&gt;*&lt;/span&gt;&lt;span&gt;?&lt;/span&gt;&lt;span&gt;)&lt;/span&gt;&lt;span&gt;\&lt;/span&gt;&lt;span&gt;.[&lt;/span&gt;&lt;span&gt;^/&lt;/span&gt;&lt;span&gt;]&lt;/span&gt;&lt;span&gt;*&lt;/span&gt;&lt;span&gt;$&lt;/span&gt;&lt;span&gt;,&lt;/span&gt;&lt;span&gt;\&lt;/span&gt;&lt;span&gt;2&lt;/span&gt;&lt;span&gt;,]])..&lt;/span&gt;&lt;span&gt;" | xargs -0 -P 4 -I '{}'"&lt;/span&gt;&lt;span&gt;..&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;verbose&lt;/span&gt; &lt;span&gt;and&lt;/span&gt; &lt;span&gt;" -t"&lt;/span&gt; &lt;span&gt;or&lt;/span&gt; &lt;span&gt;""&lt;/span&gt;&lt;span&gt;)..&lt;/span&gt;&lt;span&gt;" pandoc"&lt;/span&gt;&lt;span&gt;..&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;pandoc_args&lt;/span&gt; &lt;span&gt;and&lt;/span&gt; &lt;span&gt;" "&lt;/span&gt;&lt;span&gt;..&lt;/span&gt;&lt;span&gt;table.concat&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;escaped_pandoc_args&lt;/span&gt;&lt;span&gt;,&lt;/span&gt; &lt;span&gt;" "&lt;/span&gt;&lt;span&gt;)&lt;/span&gt; &lt;span&gt;or&lt;/span&gt; &lt;span&gt;""&lt;/span&gt;&lt;span&gt;)..&lt;/span&gt;&lt;span&gt;" "&lt;/span&gt;&lt;span&gt;..&lt;/span&gt;&lt;span&gt;escape&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;"--output="&lt;/span&gt;&lt;span&gt;..&lt;/span&gt;&lt;span&gt;out&lt;/span&gt;&lt;span&gt;..&lt;/span&gt;&lt;span&gt;"/{}."&lt;/span&gt;&lt;span&gt;..&lt;/span&gt;&lt;span&gt;out_ext&lt;/span&gt;&lt;span&gt;:&lt;/span&gt;&lt;span&gt;gsub&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;"^%."&lt;/span&gt;&lt;span&gt;,&lt;/span&gt;&lt;span&gt;""&lt;/span&gt;&lt;span&gt;))..&lt;/span&gt;&lt;span&gt;" "&lt;/span&gt;&lt;span&gt;..&lt;/span&gt;&lt;span&gt;escape&lt;/span&gt;&lt;span&gt;(&lt;/span&gt;&lt;span&gt;dir&lt;/span&gt;&lt;span&gt;..&lt;/span&gt;&lt;span&gt;"/{}."&lt;/span&gt;&lt;span&gt;..&lt;/span&gt;&lt;span&gt;ext&lt;/span&gt;&lt;span&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;But it somehow felt necessary for performance.&lt;/p&gt;
&lt;h4&gt;Syntax highlighting&lt;/h4&gt;
&lt;p&gt;The syntax highlighting of Pandoc also is not ideal; it's a bit slow and it does not support all the syntaxes I want to use.&lt;/p&gt;
&lt;p&gt;While it can highlight the syntax of most programming languages, (at the time of writing) it cannot highlight shell sessions out-of-the-box&lt;sup&gt;&lt;a href="https://compilade.net/blog/why-python-ssg#fn:shell-session-hl"&gt;2&lt;/a&gt;&lt;/sup&gt;!&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&lt;span&gt;$ &lt;/span&gt;&lt;span&gt;echo&lt;/span&gt; &lt;span&gt;"Hello world"&lt;/span&gt;
&lt;span&gt;Hello world&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;See? This works because I'm using &lt;a href="https://pygments.org/"&gt;Pygments&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The standalone &lt;code&gt;pygmentize&lt;/code&gt; tool also has a slow startup speed, but at least it supports all the syntaxes I'm interested in using.&lt;/p&gt;
&lt;p&gt;I've looked at other syntax highlighters before settling on Pygments.
One of them was &lt;code&gt;chroma&lt;/code&gt;, but its command-line tool had no way to safely get a language name without also possibly interpreting it as a file name&lt;sup&gt;&lt;a href="https://compilade.net/blog/why-python-ssg#fn:chroma-lang-file"&gt;3&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;At one point I even considered not using syntax highlighting &lt;em&gt;at all&lt;/em&gt; for my website to make the build go faster.&lt;/p&gt;
&lt;h4&gt;Pandoc Templates&lt;/h4&gt;
&lt;p&gt;A cool thing with Pandoc is that it has its own relatively simple syntax for templates&lt;sup&gt;&lt;a href="https://compilade.net/blog/why-python-ssg#fn:pandoc-template-syntax"&gt;4&lt;/a&gt;&lt;/sup&gt;.
It seemed ideal for a static site generator!&lt;/p&gt;
&lt;p&gt;But then I realized that the default Pandoc templates (which I was (at the time) using as a base for my own custom templates) are not in the public domain; they are licensed under GPLv2+ and BSD3 (see &lt;a href="https://github.com/jgm/pandoc-templates"&gt;https://github.com/jgm/pandoc-templates&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;For some reason, I wanted to be able to use the license of my choosing on the source of my website.
And I didn't want to have to include a copy of the BSD3 license of another project&lt;sup&gt;&lt;a href="https://compilade.net/blog/why-python-ssg#fn:pandoc-template-bsd3"&gt;5&lt;/a&gt;&lt;/sup&gt; only for the templates.&lt;/p&gt;
&lt;p&gt;That meant I had to start over (again) and build my own templates from scratch and avoid Pandoc and all of its niceties like Lua filters.&lt;/p&gt;
&lt;h3&gt;No Pandoc?&lt;/h3&gt;
&lt;p&gt;But I still wanted to run filters on the HTMLized Markdown of my site!&lt;/p&gt;
&lt;p&gt;Unfortunately (and predictably), Lua can't parse HTML without either using a library for XML (like &lt;a href="https://github.com/lunarmodules/luaexpat"&gt;&lt;code&gt;luaexpat&lt;/code&gt;&lt;/a&gt;) or parsing it on my own.&lt;/p&gt;
&lt;p&gt;Since I no longer wanted to use Pandoc for my site, Lua stopped having the advantage of already being a dependency, so this led me to search for a programming language with a bigger standard library.&lt;/p&gt;
&lt;h2&gt;Python!&lt;/h2&gt;
&lt;p&gt;I never really used Python seriously like this before.&lt;/p&gt;
&lt;p&gt;It's kind of enlightening that something I thought was so awful works so well.&lt;/p&gt;
&lt;h3&gt;Slow startup&lt;/h3&gt;
&lt;p&gt;What makes &lt;code&gt;pygmentize&lt;/code&gt; start up slowly is Python's own startup speed (&lt;code&gt;23 ms&lt;/code&gt; minimum on my machine).
For comparison, both Lua and Perl start in &lt;code&gt;3 ms&lt;/code&gt; on the same machine&lt;sup&gt;&lt;a href="https://compilade.net/blog/why-python-ssg#fn:my-machine"&gt;1&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;&lt;span&gt;$ &lt;/span&gt;&lt;span&gt;time&lt;/span&gt; python3 -c &lt;span&gt;'print("Hello world")'&lt;/span&gt;
&lt;span&gt;Hello world&lt;/span&gt;

&lt;span&gt;real    0m0.023s&lt;/span&gt;
&lt;span&gt;user    0m0.018s&lt;/span&gt;
&lt;span&gt;sys     0m0.004s&lt;/span&gt;

&lt;span&gt;$ &lt;/span&gt;&lt;span&gt;time&lt;/span&gt; lua -e &lt;span&gt;'print("Hello world")'&lt;/span&gt;
&lt;span&gt;Hello world&lt;/span&gt;

&lt;span&gt;real    0m0.003s&lt;/span&gt;
&lt;span&gt;user    0m0.001s&lt;/span&gt;
&lt;span&gt;sys     0m0.002s&lt;/span&gt;

&lt;span&gt;$ &lt;/span&gt;&lt;span&gt;time&lt;/span&gt; perl -e &lt;span&gt;'print("Hello world\n")'&lt;/span&gt;
&lt;span&gt;Hello world&lt;/span&gt;

&lt;span&gt;real    0m0.003s&lt;/span&gt;
&lt;span&gt;user    0m0.001s&lt;/span&gt;
&lt;span&gt;sys     0m0.002s&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;And that's even after a few previous runs.&lt;/p&gt;
&lt;p&gt;A solution is to use Pygments directly from a Python script.
That way, the Python interpreter is only started once, even if thousands of snippets need highlighting.&lt;/p&gt;
&lt;p&gt;Perhaps paradoxically, Python's slow startup speed &lt;em&gt;can&lt;/em&gt; be an incentive to use Python,
since anything else calling Python libraries is slower (unless you're using C or C++ and want to &lt;a href="https://docs.python.org/3/extending/embedding.html"&gt;embed the Python interpreter&lt;/a&gt; in your program).&lt;/p&gt;
&lt;p&gt;Remember, I want to use Pygments because it seems well-maintained and one of the best server-side syntax highlighter I found which can highlight shell sessions.&lt;/p&gt;
&lt;h3&gt;Args parsing&lt;/h3&gt;
&lt;p&gt;I was very pleasantly surprised to find out about the &lt;a href="https://docs.python.org/3.10/library/argparse.html"&gt;&lt;code&gt;argparse&lt;/code&gt;&lt;/a&gt; module in Python's standard library.&lt;/p&gt;
&lt;p&gt;It even makes the &lt;code&gt;--help&lt;/code&gt; from the descriptions of each sub-commands and flags!&lt;/p&gt;
&lt;p&gt;It's much more concise than manually writing and structuring the &lt;code&gt;--help&lt;/code&gt; output in shell scripts, and &lt;strong&gt;&lt;em&gt;much easier&lt;/em&gt;&lt;/strong&gt; than manually parsing the command-line arguments like I did in Lua.&lt;/p&gt;
&lt;h3&gt;Filters&lt;/h3&gt;
&lt;p&gt;Python's standard library natively supports XML&lt;sup&gt;&lt;a href="https://compilade.net/blog/why-python-ssg#fn:python-xml"&gt;6&lt;/a&gt;&lt;/sup&gt;!&lt;/p&gt;
&lt;p&gt;But HTML is not XML.&lt;/p&gt;
&lt;p&gt;The solution I'm happy with for now is to build XHTML pages (which are valid XML) and then run what I'm calling "XML filters" on those pages.
This makes it "easy" to make every internal link relative and to make them use the final location of their destination.
It also makes it relatively simple to clean up what is included in the Atom feed of the blog.&lt;/p&gt;
&lt;p&gt;For example, my posts include syntax highlighting for code snippets, but the spans and classes inserted by Pygments are useless in a feed since styling is mostly ignored by feed readers.
So I made filters to remove all &lt;code&gt;class&lt;/code&gt; attributes and all &lt;code&gt;span&lt;/code&gt; elements for the included XHTML in the Atom feed.&lt;/p&gt;
&lt;p&gt;I'm not sure if there are other ways to do all of this, but I'm &lt;em&gt;pretty happy&lt;/em&gt; with the versatility of XML.&lt;/p&gt;
&lt;h3&gt;Type checking&lt;/h3&gt;
&lt;p&gt;I vaguely knew that &lt;a href="https://peps.python.org/pep-0484/"&gt;type annotations&lt;/a&gt; were supported by Python, so I used them to make some of the function definitions clearer.&lt;/p&gt;
&lt;p&gt;But type annotations are ignored by the Python interpreter, so I looked for a static type checker.
I found out about &lt;a href="https://github.com/microsoft/pyright"&gt;Pyright&lt;/a&gt;, and it's impressively useful once set up correctly.&lt;/p&gt;
&lt;p&gt;Static type checking in a scripting language is &lt;strong&gt;&lt;em&gt;so nice&lt;/em&gt;&lt;/strong&gt;, and it hugely helps with refactoring (which I've heard a lot, but which I first really experienced with this script because even after 2 rewrites, I still had many other ideas of things to change).&lt;/p&gt;
&lt;h2&gt;You're looking at the result&lt;/h2&gt;
&lt;p&gt;This is my site, and it's built from &lt;a href="https://git.compilade.net/fch/compilade-pages/src/branch/main/build.py"&gt;this Python script&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;All the programming languages I considered for my build script are (interpreted) scripting languages.
Why? Because it felt simpler to avoid making a build script for my build script.&lt;/p&gt;
&lt;p&gt;The build script is evolving along with the site, so that means it changes as much or as little as I want, and only when I want it to change (unlike with static site generators like &lt;a href="https://gohugo.io"&gt;Hugo&lt;/a&gt;).
And no need to avoid breaking changes, since I'm (or should be) the only user of my build script!&lt;/p&gt;
&lt;p&gt;Hopefully, I'll be writing more blog posts instead of shaving the yak.&lt;/p&gt;

&lt;hr&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;For the record, my main machine is not &lt;em&gt;that&lt;/em&gt; powerful compared to what's available nowadays.
It's a low-power laptop whose processor is an &lt;a href="https://www.intel.com/content/www/us/en/products/sku/185282/intel-core-m38100y-processor-4m-cache-up-to-3-40-ghz/specifications.html"&gt;Intel Core m3-8100Y&lt;/a&gt;. &lt;a href="https://compilade.net/blog/why-python-ssg#fnref:my-machine" title="Jump back to footnote 1 in the text"&gt;↩&lt;/a&gt;&lt;a href="https://compilade.net/blog/why-python-ssg#fnref2:my-machine" title="Jump back to footnote 1 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It's still possible to use custom syntax definitions to highlight shell sessions in Pandoc, see &lt;a href="https://github.com/jgm/skylighting/issues/67"&gt;https://github.com/jgm/skylighting/issues/67&lt;/a&gt; &lt;a href="https://compilade.net/blog/why-python-ssg#fnref:shell-session-hl" title="Jump back to footnote 2 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The only option of the &lt;abbr title="Command-line interface"&gt;CLI&lt;/abbr&gt; of &lt;code&gt;chroma&lt;/code&gt; that can set the language to be highlighted (as desirable for snippets on a website) is &lt;code&gt;--lexer&lt;/code&gt; (or &lt;code&gt;-l&lt;/code&gt;).
See &lt;a href="https://github.com/alecthomas/chroma/blob/7eb0305e1b3b0d6af8aac735417a58c7322c932d/cmd/chroma/main.go#L53"&gt;the description of &lt;code&gt;chroma&lt;/code&gt;'s lexer flag&lt;/a&gt;. &lt;a href="https://compilade.net/blog/why-python-ssg#fnref:chroma-lang-file" title="Jump back to footnote 3 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;And that's even when assuming the dual licensing of &lt;code&gt;pandoc-templates&lt;/code&gt; means &lt;em&gt;either&lt;/em&gt; license can be used, &lt;strong&gt;but this is probably wrong&lt;/strong&gt; since that project's README says it's under &lt;strong&gt;&lt;em&gt;both&lt;/em&gt;&lt;/strong&gt; licenses, so that would mean the safest way to use variations of the Pandoc templates would be to license &lt;em&gt;everything&lt;/em&gt; (including my build script) under the GPLv2+ license, while &lt;em&gt;also&lt;/em&gt; keeping the BSD3 license of &lt;code&gt;pandoc-templates&lt;/code&gt;.
Too many licenses to think about (and ways to make mistakes) in this case, so I'm glad I started from scratch to avoid being burdened by licenses I did not choose (even though I &lt;em&gt;might&lt;/em&gt; end up using the GPL anyway). &lt;a href="https://compilade.net/blog/why-python-ssg#fnref:pandoc-template-bsd3" title="Jump back to footnote 5 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;I was very happy when I found out XML parsing and tree building are built into Python. &lt;a href="https://docs.python.org/3/library/xml.html"&gt;https://docs.python.org/3/library/xml.html&lt;/a&gt; &lt;a href="https://compilade.net/blog/why-python-ssg#fnref:python-xml" title="Jump back to footnote 6 in the text"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;/main&gt;
</ns0:encoded><pubDate>Sun, 22 Oct 2023 00:00:00 UTC</pubDate></item></channel></rss>