Random thoughts about things

Extending mailcap's capabilities

I've been messing around with lynx(1), my terminal browser of choice, and as part of this I've been learning about mailcap, how lynx and other programs dealing with internet stuff manage opening files downloaded from the net. Now, to explain mailcap, I will need to first explain what MIME is. MIME is pretty important to identify what sort of file you are downloading from the internet. For example, if you were to run this command: curl -I https://commodorian.org/blog/2023-07-05.html | grep -i "^Content-Type:" in your terminal right now, you will see that this document has a MIMEtype of text/html. Knowing this is very important for your browser to correctly display this page. If your browser, say, thought this page was text/plain, you would get all these ugly XML markers everywhere. Gross. Luckily, because my webserver makes the MIMEtype of files it serves known, you don't have to deal with that.

So what is this mailcap thing? It is, as said before, a way of defining what to do with files you download from the 'net. A basic mailcap file would look like this:

    # Open up plain text files with less
    text/plain; less %s
    # Open html files with lynx
    text/html; lynx %s
    # Open image files with nsxiv
    image/*; nsxiv %s

So what does this do? Any program reading this mailcap file will go from the top down, comparing the defined MIMEtype of each line with the MIMEtype of the file it is attempting to open. If the file is text/plain, it will open the file with less(1); likewise with text/html and lynx. You will notice the use of a wildcard (*) in there when defining what command images should be sent to. This will make all image/* MIMEtypes go to nsxiv(1). You can make commands span multiple lines for readability's sake by escaping the newline.

So this seems pretty good, you define what types of files should go to which commands, why would you want to extend it's capabilities? Well, some websites are notoriously bad at defining MIMEtypes correctly. One such site is Phoronix, who are usually pretty good about tech stuff but for whatever reason their image/svg+xml files are reported as text/html. Now this introduces two issues:

What do? Well as any technical person with an eye for standardisation, I should not make a script to check every text/html file A. being from phoronix and B. being actually image/svg+xml, no. I should of course extend, and by extension break, a well-known standard! Introducing version 1 of:

mailcap-ng

(No, obviously I'm not suggesting this should be adopted. I'm just mulling over what I would do to remedy a problem that absolutely should be fixed by Phoronix's sysadmins)

mailcap-ng's files are similar to mailcap's files, except that extra parameters prefixing the MIMEtype are allowed. All prefixes before the mimetype will be separated from the MIMEtype with a colon :. An example mailcap-ng file is below:

    # Open up plain text files with less
    text/plain; less %s
    # First mailcap-ng extended entry
    .svgz!phoronix.com:text/html; magick %s PNG:%s && nsxiv %s
    # Next mailcap-ng extended entry
    4channel.net:text/html; 4ch %u
    # Open html files with lynx
    text/html; lynx %s
    # Open image files with nsxiv
    image/*; nsxiv %s

Obviously, you can define specific domains' MIMEtypes to other commands than the default for that MIMEtype. You can also define specific endings to URLs to be matched to a different mimetype. I've also added another format identifier, %u, to be the URL for the resource. I think this would be helpful. :)

sfeed

I've been using sfeed(1) for quite a while now, and am really enjoying the experience. You can find them on the web here. I would recommend learning awk(1) a little before delving deep into using sfeed, helped me quite a lot. I will write some more on sfeed in the near future, including releasing some software I've been working on for it.

awk

Speaking of awk, you should learn it. There are may resources out there, although of course the best is pirated copies of awk books from libgen. If you can, though, get the real things. Always a lot more enjoyable reading a real book. A lot of the programs I have written for sfeed use awk heavily, and I have been really enjoying learning it.