This is still the header! Main site

Fancier RSS Feeds!

2022/02/13

... in which we start auto-generating them.

This is post no. 80 for Kev Quirk's #100DaysToOffload challenge. The point is to write many things, not to write good ones. Please adjust quality expectations accordingly :)

A short history

Once upon a time, this site didn't even have an RSS feed. I was just posting links to articles on my Fosstodon account... that was it. But then I decided to make things a little bit more reasonable, so I decided to add an actual RSS feed.

Of course, just like with the rest of the site, I was writing even the feed by hand. Which was... not a lot of work, plus it came with the extra benefit of not having to publish terrible-quality articles right away. I also had a pet theory: RSS feeds shouldn't contain entire articles. It's just... weirdly redundant, having the entire contents of the site bundled up in a single XML file.

Said theory was also somewhat convenient, given how pasting entire articles into an XML file and then escaping all the HTML angle brackets... was... not happening. I would happily continue producing more and more posts, sometimes writing RSS that pointed to the wrong article, sometimes just being super late with it, and, most importantly, finding great joy in inflicting the Principle of No Redundant Article Content on everyone.

Sadly, however, my evil practices were exposed by Joelchrono12's article; thanks to his efforts, now it's not just Miniflux users that can escape the sight of my glorious rotating WebGL cube instead of just staying in their RSS readers, but FreshRSS users can join, too. It was a dark day for RSS-based villainy in our universe. Worse yet, he might have convinced me to... maybe? actually automate generating the RSS feed, and come over to the light side, by adding summaries? One day, possibly, far far in the future?

That day has now come.

The New Feed

Of course, I'm not going to give up all villainy right away. Thus, instead of just, like, reading the RSS standard on how to neatly escape HTML inside RSS feeds, I just:

Which is also the reason why I switched over to an Atom feed: Joel's site had one!

The Implementation

It had to be Lisp, of course.

I already had a half-written static generator thing that I wasn't actually using for anything. Yet. It's... not the most glorious example of software engineering, but it does its job.

The actual code is, actually, running on my dev server; thus, parsing all the "index.html" files on it is super fast. I also applied the principles of Common Lisp for shell scripting, resulting in hybrid weirdness like this:

(defun all-articles ()
  "All articles that might be kinda relevant (e.g. for an RSS feed)"
  (mapcar #'uiop:ensure-pathname
    (cl-ppcre:split "\\n"
      (uiop:run-program
         "find /shared_drive/site_dev/202* -name \"index.html\"" :output :string))))
          

... when a UNIX tool exists that does exactly what you want, you can just use it. The HTML files get parsed into a list of objects containing article titles, dates and contents. The actual "Atom feed" part is also a neat demo of what you can do with Lisp; this is the only piece of code having to do anything with Atom, using an XML library that is also not an especially extensive piece of code:

(defun atom-entry (article &key (add-summary t))
  (with-slots (title date-string relpath html) article
    (let ((article-full-path (format nil "https://simonsafar.com/~A" relpath)))
      (with-tag
       ("entry")
       (with-simple-tag
        ("title" '(("type" "html")))
        (xml-out title)
        ())
       (with-simple-tag
        ("link" `(("href" ,article-full-path))))
       (emit-simple-tags :id article-full-path)
       (with-tag ("author") (with-simple-tag ("name") (xml-out "Simon Safar")))
       (with-simple-tag ("published") (xml-out (article-date article)))
       (with-simple-tag ("updated") (xml-out (article-date article)))
       (when add-summary
         (with-tag
          ("summary" `(("type" "html") ("xml:base" ,article-full-path)))
          (xml-out (elt (lquery:$ html "article" (html)) 0))))))))

(defun atom-feed (articles &key (add-summary t))
  (let ((feed-uri (format nil "https://simonsafar.com/~A.xml"
                     (if add-summary "index" "index_list"))))
   (with-tag
    ("feed" '(("xml:lang" "en")
              ("xmlns" "http://www.w3.org/2005/Atom")))
    (with-simple-tag ("link" `(("href" ,feed-uri)
                               ("rel" "self"))))
    (with-simple-tag ("link" '(("href" "https://simonsafar.com")
                               ("rel" "alternate")
                               ("type" "text/html")
                               ("hreflang" "en"))))
    (emit-simple-tags
     :title "Simon Safar"
     :subtitle "All recent entries from simonsafar.com"
     :id feed-uri
     :updated (local-time:now))

    (loop
     for article in articles
     do (atom-entry article :add-summary add-summary)))))
          

And it is a valid atom feed indeed! Thanks Joel for the article without which this would definitely not have happened yet :)

[Valid Atom 1.0]

... comments welcome, either in email or on the (eventual) Mastodon post on Fosstodon.