This blog has an RSS feed
I recently added an RSS feed to my blog, prompted by a friend asking if I had one. I write these posts in HTML. I don’t use a static site generator. So I assumed I wouldn’t be able to generate a feed using existing software. (Later, I realized there exist tools that build an RSS feed from a given webpage. I looked at one and saw the feed it built didn’t have anywhere near the format I wanted.)
I wrote
a small TypeScript program
to generate the RSS feed. It uses
htmlparser2
to parse the posts and
rss
to
generate the RSS feed from the contents.
The right tool for the job
I soon realized htmlparser2
wasn’t the easiest tool to
use. The library uses a callback interface: It lets you provide
functions that it then calls when it encounters opening and closing
tags, text nodes, and other document features. It’s up to the code
using the library to track its location in the parse tree.
Instead, I could have used a library like
parse5
, which parses the HTML into a JavaScript object and makes it easy to
access elements with known locations in the parse tree. It’s a
tradeoff, though: If I’d written the script using that approach, small
changes in the layout of the HTML would have broken it. In fact, I
think it would have been easiest to write this script using browser
APIs like Document#querySelector
. For that reason, maybe
I should have used a DOM implementation like
jsdom
.
Parsing the posts
In any case, solving this problem with htmlparser2
was a
fun challenge. I wanted to extract three pieces of information from
each post: its title and creation date and the post body. First, I
worked on extracting the title.
I added callbacks for opening tags, closing tags, and text nodes.
htmlparser2
does a pre-order traversal of the parse tree:
If you identify the calls to the opening and closing tag callbacks for
a given element, any calls to the text node callback in between must
be for text nodes inside that element. Each post’s
title
tag contains the title, so all I had to do was
extract the text node inside that tag. Since each post only has one
title
tag, I did this by setting an
inTitle
variable to true when I saw an opening
title
tag, storing the contents of the text node seen
when inTitle
was true in a second variable, and setting
inTitle
to false on a closing title
tag.
I did something similar to extract the creation date, which is
contained in a p
tag with the class
timestamp
. The opening tag callback takes a map of HTML
attributes as well as a tag name, so it was easy to recognize this
particular p
tag by its class. The closing tag callback
doesn’t provide the tag’s attributes, but I realized I could set
inDate
to false on any closing p
tag, since
the paragraph containing the timestamp doesn’t have any paragraphs
inside it.
Getting the post contents
Extracting the contents of each post was more difficult. The body of
each blog post lives in a section
tag. It can contain
HTML itself—so far I’ve mainly used a
and
code
tags. However, htmlparser2
doesn’t let
you read the contents of a specific HTML element. It just calls your
callback functions when it sees opening tags, closing tags, and text
nodes. I needed to rebuild the HTML inside the
section
tag only using the information from these
function calls.
To keep track of whether it’s inside the section
tag, the
script uses an inSection
Boolean variable, similar to how
it keeps track of whether it’s inside the title
tag. It
appends any text it sees inside the section
tag to a
content
string variable that starts off empty. When the
script sees an opening tag inside the section
tag, it
uses the tag name and map of attributes to build a string containing
an HTML opening tag, then appends it to content
. This
logic closes self-closing tags but leaves other tags open. On
encountering a closing tag inside the section
, the script
appends a closing tag to content
, as long as the tag
isn’t self-closing. The pre-order traversal ensures the script appends
tags and text to content
in the same order it appears in
the original post.
Generating the RSS feed
Generating the feed was simpler. Each post lives in its own folder, so the script gets a list of those folders, reads and parses each post, adds an item to the feed for each parsed post, and writes the feed to an XML file. After running the script, I commit the updated file to my website’s Git repository and push to publish.
Why TypeScript?
I wrote this script in TypeScript but didn’t benefit much from doing
so compared to writing it in JavaScript. Most of the compile errors I
encountered would have been runtime errors in JS, but the cause would
have been equally obvious. On the other hand, I didn’t waste much time
using TS. It didn’t take long to install ts-node
and type
definitions for the libraries I was using, and I don’t expect
ts-node
was much slower than Node for such a small
script. Plus, I would have benefited more from TS with a different
development environment. I wrote this script using Vim, without even
TS syntax highlighting. If I’d used an IDE with type-aware code
completion and integrated type-checking, or installed Vim plugins for
those features, TS would have been an improvement over JS.
Why RSS?
So why RSS? I’m a big consumer of RSS feeds myself and I know I’m not alone, even if they’re not as popular as they used to be. RSS is a simple way to share new content with people interested in your work. Of course, it’s not the only way. I could share my posts over Twitter or Facebook, but I started using RSS in the first place to get off social media. (A topic for another post.) Substack and Mailchimp make it easy to share content over email, but I like having full control over the distribution of my posts. I’d rather not give that control to a third party, at least for a passion project that I don’t plan to monetize, like this blog. I’d definitely consider using Substack for a paid newsletter to avoid integrating with a payment processor.
P.S.: Why not an SSG or a CMS?
It’s also worth examining why I don’t use a static site generator or a CMS for my blog. I certainly could have built it more quickly using one of those tools. Well, again, I like having full control over my website, both its appearance and its code. A tool like Wordpress doesn’t give you that control without a lot of customization. I also appreciate that my website’s code is simple and human-readable. Finally, I enjoy solving problems and writing software. I don’t see the time spent working on projects like this one as wasted. I even enjoy the process of turning each post into an HTML document. It’s a good Vim exercise and the process somehow makes me feel more proud of my posts.
Edit: The day after publishing this blog post, I decided to move my posts into YAML files and generate both the HTML for them and my blog's RSS feed from those files. Each post shares a lot of code and it's painful to change something in all of them (e.g. adding a link to the header). Now that each post is based on a template HTML file, it's easy to make these kinds of changes. It's also a lot easier to parse YAML than HTML. And I still get to hand-edit each post's HTML! (Although I might look into generating them from Markdown instead.)