I Made A Website Builder

July 15th, 2024

This will be the first article posted with it.

Summery/Objective/Background

Over the past three seasons, I built this website from the most minimal form. My process was very manual. This website is made of only static web pages. It is a pile of documents stored on GitHub servers and downloaded in their entirety (compare that to the built up ‘web-apps’ where every bit of content is shuffled in and out by the server). This has been very easy to make and scale. Early versions were one giant wizard scroll I would append new articles to. Since then I’ve added all the beloved front-end nuance.

My method didn’t changed though. I literally wrote each document by hand using a text editor even as I expanded to a few dozen pages. Old pages fell out of standard, and syndication got tedious (and skipped) as my archive and RSS feed took manual updating per every post.

My objective was to create a static site generator that can preserve and populate my content (prose and hypertext writing) with clean uniform code for other features (navigation bars, metadata etc.). Additionally I would automate the process of updating feeds.

Method

Desaturater Script

I created a folder of my hand-written pages. I wrote a python script that cuts out everything between a pair of tags, usually ‘article’ or ‘main’. I approved each file processed. Of 30+ files, my code failed twice, requiring manual cleaning. One time was due to a lack of tag wrapping in the original file, the other had mismatched tags (I would use regular expressions now, more on that in ‘Educational Value’). This produced the desaturated content pages made of pure media text. In the future I will write posts directly as these desaturated files.


import os, glob

selectPath = input("path? leave blank for root.")

outputPath = 'desat/' + selectPath
path = 'handmade/' + selectPath

#for each file in directory 'path'
for filename in glob.glob(os.path.join(path, '*.html')):
    with open(os.path.join(os.getcwd(), filename), 'r') as f: # open in readonly mode

        #print the current file
        os.system('clear')

        print(f.read())
        f.seek(0)

        #create a word to carve out with. Only works if there is only one set of
        #splitable tags
        splitword = input("splitword?, or 's' for skip")
        if splitword == "s":
            continue

        #extract everything between the split word tags
        fraction = f.read().split("<" + splitword + ">\n")
        fraction = fraction + fraction[1].split("</" + splitword + ">\n")
        fraction.pop(1)
        output=fraction[1]
        
        os.system('clear')
        print(output)

        #optionally incase the new content as an article
        if input("incase in article tages?") == "y":
            output="<article>\n" + output + "</article>\n"
            
            os.system('clear')
            print(output)

        #save to file (optional)
        if input("save?") == "y":
            outfile = open(outputPath + vf.name.split("/")[-1], "w")
            outfile.write(output)

Resaturater Script

I wrote a reverse script that takes all the desaturated articles and appends a template header and footer. These include the functional stuff that every file needs (head, closing tags etc.). There isn’t a reason to differentiate these bummers too much between articles so I standardized them. Visually this process manifests as the new uniform nav bar and footer. Now I can change these standards for all files quickly. The header of a file includes the title (the text on the browser tab) so the saturating script extracts the first (and ideally only) h1 tag of the content and copies it into the header.

In this code there is a comment about tags messing up the code and clearing them out. That’s because I didn’t know regular expressions yet (more on that later).


import os, glob, re

selectPath = input("path? leave blank for root.")
outputPath = 'sat/' + selectPath
path = 'desat/' + selectPath
headerFile = input("headerfile? Blank for defult. ")
if headerFile == '':
    headerFile = "header"

#import template files
header = open('templates/' + headerFile +'.html').read()
footer = open('templates/footer.html').read()

#for each file in directory 'path'
for filename in glob.glob(os.path.join(path, '*.html')):
    with open(os.path.join(os.getcwd(), filename), 'r') as f: # open in readonly mode

        #extract article title
        #this curently has the problem of the depreciated 'headline' tag. Running 
        #this program will clear that. Then I will rebuild the program.
        splitFile = re.split('<h1.*?>', f.read(), 1)
        f.seek(0)
        splitFile = splitFile + re.split('</h1>', splitFile[1], 1)
        splitFile.pop(1)
        #extract title
        title = splitFile[1]
        #reconstruct
        output = splitFile[0] + "<h1>" + title + "</h1>" + splitFile[2]

        #add header and footer templates
        output = header + output + footer

        #replace title placeholder with extracted title and title template
        output = output.replace("<!--TITLE-->", title + ", Hayden Shuker")
        os.system('clear')
        print(output)

        #save to file
        outputPathSpecific = open(outputPath + f.name.split("/")[-1], "w")
        outputPathSpecific .write(output)

Updater Script

A third script extracts the title of an article and appends it to the various feeds I need to maintain: blog roll, RSS, and archive. This was the last script I wrote and it is visibly more elegant then the others (I made a function).


import os, glob, re

##Inject some text in the middle of a text.
#insertText is inserted at breakText of rawText
#breakText is reinserted after if beforeToggle is
#positive otherwise inserted before
def insertText(rawText, breakText, insertText, beforeToggle):
    t = rawText.split(breakText, 1)
    return t[0] + insertText * beforeToggle +
     breakText + insertText * (not beforeToggle) + t[1]

directory = 'desat/'

#select a page. Use a desaturated file
fileToPush = input("Select post to publish: ") + '.html'

##LOAD FILES
#extract and write title. Use a regular expression to find it burried in h1
f = open(directory + fileToPush)
title = re.search(r"<h1.*?>(?P<title>.*?)</h1>", f.read())['title']
url = 'https://www.haydenshuker.com/'+ fileToPush
f.close()

rss = 'desat/rss.xml'
rssAppendText = '\n<item>\n<title>' + title + '</title>\n '+ '<link>' + url + '</link>\n</item>\n'
rssText = insertText(open(rss).read(),'</channel>', rssAppendText, True)
print(rssText)
if input("save new rss Text? y/N") == "y":
    open(rss, "w").write(rssText)

blog = 'desat/blog.html'
blogAppendText = '</heading>\n\n<article>\n<h1>' + title + '</h1>\n</article>'
blogText = insertText(open(blog).read(), '</heading>', blogAppendText, False)
print(blogText)
if input("save new blog Text? y/N") == "y":
    open(blog, "w").write(blogText)

archive = 'desat/archive.html'
archiveAppendText = '\n<li><a href=\'' + fileToPush + '\'>' + title +'</a></li>'
archiveText = insertText(open(archive).read(), '<h3>2024</h3>\n<ul>', archiveAppendText, False)
print(archiveText)
if input("save new archive Text? y/N") == "y":
    open(archive, "w").write(archiveText)

Future

Now updating the site takes the use of two scripts, plus a some manual copying for the new RSS and other non-HTML files. A completed tool could cut that down to a single combined action. The text splicing code I made could be repurposed for other mass editing actions such as updating hyperlinks and fixing HTML formatting. A similar tool could process my images and other files to decrease file size.

Furthermore, I could make the updater script syndicate to proprietary social media and industrial printing press.

Lessons

Accumulate wisdom and experience rather than codebase.
Permacomputing Principles

Some programming firsts for me were:

Using regular expressions (more on that later)
Manipulating files with code
Running Linux commands with code (just clear screen for now but this could be useful)
Writing code offline with a downloaded reference text.
Making an actually useful backend helper tool (I used to make games and toys).
Thorough code documentation and commenting (for a non school project).

With these I could do some real fundamental IT stuff. My heart is open for more projects.

Regular Expressions

Apparently text manipulation sucks entirely if you can’t be fuzzy with it. Until I parsed how to use these all my text cuts would be at risk for error if I left in extra spacing or something. Now that I these I see them everywhere.

On Documentation

I like to write software like I would an essay. I outline the whole structure, imagine what thing I’m trying to create in its entirety, then the details just flow into place. For software this means comments then code. I learned this method doing the MATLAB projects in engineering class. The documentation and formality isn’t for anyone other than me while I construct the code. Though when I am done, I have chewed up every idea I’ve expressed to my greatest extent. Does documenting make me learn more? So long as it makes me make more.

Learning While Writing

As this was my first python project, I got dramatically better at writing python code as I wrote it. This created an awkward tendency to write in circles, cleaning up my output and redoing it to better standard. The mental game in such a project is sitting still and powering forward. The whole phase must be executed at once and quality may be judged afterward. Some bad code is better than no good code.

Additionally

I like my styling optional or interchangeable. That means using the human readable sematic HTML components to make the content thick and signposted in the HTML rather than sloppy prose and intention dressed up with information buried in the ethereal CSS. Use the layered headings earnestly and wrap everything in articles, sections and mains.

Website footers are for fine print (technical info, organizational info, contacts etc). I know that if I need something serious and practical, that’s where convention dictates it will be.