Go to menu

Caching pages on cheap shared hosting

March 2022

Bolt isn’t a complex Java beast requiring gigabytes of RAM, but when running on cheap (less than a Euro per month) shared hosting without opcache, simple anonymous page loads can take a second, or more with concurrent requests. This isn’t quite up to my standards, so I started looking for ways to cache as much as possible

Sadly, most of the load time comes just from loading Symfony, so any of the framework’s built-in caching methods, which Bolt does use, are useless or at least not enough.

I decided to bypass PHP completely and cache whole pages, showing cached copies to users via .htaccess. I modified index.php to call my custom caching code after the response is generated, the caching function then filters out private, user-dependent and interactive requests and saves the resulting page to a directory.

Diff of public/index.php changes
@@ -3,7 +3,6 @@
 declare(strict_types=1);

 use App\Kernel;
+use App\Caching;
 use Bolt\Configuration\Config;
 use Symfony\Component\Dotenv\Dotenv;
 use Symfony\Component\ErrorHandler\Debug;
@@ -36,6 +35,5 @@ if ($trustedHosts = $_SERVER['TRUSTED_HOSTS'] ?? false) {
 $kernel = new Kernel($_SERVER['APP_ENV'], (bool) $_SERVER['APP_DEBUG']);
 $request = Request::createFromGlobals();
 $response = $kernel->handle($request);
+Caching::saveCachedCopy($request, $response);
 $response->send();
 $kernel->terminate($request, $response);

# .htaccess rules for using pagecache

# General case for path to "file"
RewriteCond %{DOCUMENT_ROOT}/var/pagecache/$1___%{QUERY_STRING}.html -f
RewriteRule ^(.*)$ /var/pagecache/$1___%{QUERY_STRING}.html [END]

# General case for path to "directory"
RewriteCond %{DOCUMENT_ROOT}/var/pagecache/$1/index___%{QUERY_STRING}.html -f
RewriteRule ^(.*)/$ /var/pagecache/$1/index___%{QUERY_STRING}.html [END]

# Special case for /
RewriteCond %{DOCUMENT_ROOT}/var/pagecache/index___%{QUERY_STRING}.html -f
RewriteRule ^$ /var/pagecache/index___%{QUERY_STRING}.html [END]

# The path wasn't cached, proceed to normal Bolt
RewriteCond $1 !^/public
RewriteRule ^(.*)$ /public/$1

Cache invalidation is a problem with schemes like these, but this particular site is a good candidate: Most visitors are anonyomus, public pages have no user-dependent content, and the site isn’t too large, so I can afford to invalidate the entire cache on every modification. That happens by way of a Symfony event subscriber reacting to Bolt content modificaton events.

<?php
// This is src/Caching.php

namespace App;

use Symfony\Component\HttpFoundation\Request;
use Symfony\Component\HttpFoundation\Response;
use function file_put_contents;

class Caching {
    static function getPageCachePath() {
        return realpath(dirname(__FILE__) . '/../var/pagecache');
    }

    static function isCacheable(Request $request) {
            // Only GET requests are cached
        return $request->getMethod() === "GET"
            // Just a sanity check against any possible directory traversal
            // (not that it should be possible)
            && !str_contains($request->getPathInfo(), "/../")
            // Don't cache internal URLs
            && !preg_match("{^/bolt|^/preview/|^/_}", $request->getRequestUri())
            // Don't cache our custom URLs
            && !preg_match("{^/fbmirror}", $request->getRequestUri());
    }

    static function saveCachedCopy(Request $request, Response $response) {
        if(!Caching::isCacheable($request)
            // Only cache successful responses
            || !$response->isSuccessful()
            // Only cache HTML (because our caching system is kinda dumb and
            // will only work on a static set of filetypes)
            || explode(";", $response->headers->get("Content-Type"))[0]
                !== "text/html") {
            return;
        }

        $file = $request->getPathInfo();
        if(str_ends_with($file, "/")) {
            // Workaround for Apache being configured not to serve hidden files
            $file .= "index";
        }
        $file .= "___" . $request->getQueryString();

        $dir = dirname(Caching::getPageCachePath() . $file);
        if(!file_exists($dir)) {
            mkdir($dir, 0777, true);
        }

        file_put_contents(
            Caching::getPageCachePath() . $file . ".html",
            $response->getContent());
    }

    private static function clearCacheInner(string $dir) {
        $files = array_diff(scandir($dir), array('.','..'));
        foreach ($files as $file) {
            $path = $dir . "/" . $file;
            if(is_dir($path)) {
                clearCacheInner($path);
            } else {
                unlink($path);
            }
        }
    }

    static function clearCache() {
        self::clearCacheInner(Caching::getPageCachePath());
    }
}

<?php
// This is src/EventListener/BoltContentSubscriber.php

namespace App\EventListener;

use App\Caching;
use Bolt\Event\ContentEvent;
use Symfony\Component\EventDispatcher\EventSubscriberInterface;

class BoltContentSubscriber implements EventSubscriberInterface {
    public static function getSubscribedEvents() {
        return [
            ContentEvent::ON_DUPLICATE => ["onContentModification"],
            ContentEvent::ON_EDIT => ["onContentModification"],
            ContentEvent::POST_DELETE => ["onContentModification"],
            ContentEvent::POST_SAVE => ["onContentModification"],
            ContentEvent::POST_STATUS_CHANGE => ["onContentModification"],
        ];
    }

    function onContentModification($ev) {
        // Just clear the whole cache on each edit, tracking what needs what is
        // too error-prone
        Caching::clearCache();
    }
}

I think I can now say this technique has been a success. Full page load times are now around 150ms, down from ~500–1500ms previously. While it might be problematic when developing more interactive pages, it works perfectly for sites like this one, which could be generated statically, but aren’t because of user-friendliness or other concerns (any effective static site generator would have to worry about cache invalidation much more than this simple system anyway).