Serving static assets
Build a static site server
The way of kata
Kata is an exercise for muscle memory. It's not intended to fill your brain with information but train your fingers to react. The information is there to give you the why, but your fingers need to learn the how.
The material on this page is presented in a specific order — from least specific to highly technical. You will learn the most by jumping in as soon as you have some idea of what you should do. Once you're done, read the rest of the material and check your solution.
All katas are designed to be doable without using 3rd party libraries (and, in fact, the point is to also learn how to do what these libraries do).
To make the best of katas, observe the following rules.
- Don't rush.
- When stuck, take a break and do something unrelated.
- Do not copy/paste code. Always retype everything.
- Do not use AI tools to generate code.
- Try to do something that wasn't in the instructions, experiment.
- Repeat the kata from time to time, even if you think you've got it.
- You have mastered the kata once you are able to complete it without thinking too much.
Remember, the goal is not to get it done, but to get some practice.
Introduction
In this exercise, you will build a static site server with Etag support.
Skills you will acquire
- Understanding of absolute and relative paths
- HTTP basics
- Reading files
- Content hashing using the SHA-256
- In-memory caching
- Using ETag header
- Directory traversal attack prevention
Objective
- Serve the contents of a selected directory as a static site
- Serve a 404 page when no content matches the URL
- Implement ETag-based cache management
- Implement directory traversal attack protection
Check your solution
- We can access any file within the static files folder
- For URLs matching a folder, we receive the
index.htmlfile within the folder (if any) - For any URL not matching any file, we get a 404 page
- The status code for all matching URLs is 200
- A strong ETag header is returned for all matching URLs
- The correct
Content-Typeheader is returned for each file - When a file is fetched for the second time, it results in a 304 status code
- Including the
..in the URL does not allow us to fetch files that are outside of the static files folder
Keep in mind
One of the key concepts to understand is the concept of relative and absolute paths. For the purposes of discussing paths, I am going to refer to 'files' where I mean either a file on your disk, or a resource that the server returns (which is not necessarily a file).
The paths consists of segments delimited by separators. The separator is a
forward slash on the web (in URLs) and on Unix-like operating systems (e.g.,
Linux, MacOS, BSD). On Windows, the separator is a backslash \. The
segments are either folders or special directories . and .. (more on
that later). The last segment is either a folder, special directory, or a file,
but if the path ends with the separator, it points to a folder.
The absolute path to a file starts at the root and includes all folders leading up to the file. Normally the single separator represents the root folder. On Windows, though, there are multiple root folders, one per drive. Here's an example of an absolute path on a Linux system:
/var/www/svg-spirit/index.html
On Windows, an absolute path may look like this:
C:\dev\svg-spirit\index.html
When it comes to URls, the paths look exactly the same as on Unix-like systems.
A relative path to a file starts from a specific folder and describes a sequence
of folders leading from that folder to the file. For example, the path of the
index.html file in the previous example relative to /var/www is:
svg-spirit/index.html
Paths can include special folders . and ...
The single-dot folder . means "current folder". For example, the following
two paths are exactly the same:
/var/www/./svg-spirit/index.html
/var/www/svg-spirit/index.html
In case of relative paths, the meaning of a path that includes the single-dot
path can be slightly different depending on the context. For example, the
relative path we've seen before can be written as ./svg-spirit/index.html and
can mean the exact same thing.
The double-dot folder points at the folder above. For example, /var/www/../ is
the same as /var/www. Suppose we want to find the index.html in our example,
but relative to /var/www/example/static, we would use the following path:
../../svg-spirit/index.html
The first two segments of the path are going up one level each. That means that
we go from /var/www/example/static to /var/www/example and then up again to
/var/www. The remainder of the path is added to that.
We mentioned before that the URLs use Unix-style paths, while Windows uses different separators for paths. Regardless of what operating system you are working with, you should always try to make paths work on any system. In real life, you will work with developers that might be using a different operating system than yours.
On NodeJS, all operations related to paths are implemented in the node:path
module. This module provides functions for working with strings that represent
paths, not physical files and folders on the disk. To access the physical files,
you will use the node:fs module instead.
I recommend doing this exercise in iterations.
First create a basic version that reads the files on the disk and returns the contents. In this scenario, you can, instead of reading a file, open it and then create a read stream out of it, and pipe the stream straight into the response object. This is more memory-efficient than first reading the file, and then dumping its contents into the response.
When responding, do not forget to set the Content-Type header to the media
type — a.k.a. MIME type — of the file. Usually, we figure out the media type
based on the file extension.
In the second iteration, add caching. Caching can be done in one of two ways:
- Ad-hoc caching
- Pre-caching
The ad-hoc caching is done when the resource is accessed. The pre-caching is done before or while the server is started. The exact implementation of the caching mechanism depends a lot on what kind of API you use for the filesystem calls — whether you use the callback-based one or promise-based one.
You should try both ad-hoc and pre-caching, and both callback-based and promised-base APIs.
In any case, the caching is in-memory. This usually involves a key-value data
structure (e.g., an object or a Map) where the cached items are looked up.
The lookup is typically done using the resource path.
With ad-hoc caching, you will first check if the cache is empty, and then cache the resource if it is. Then you will retrieve the cached item and return it to the client.
With pre-caching, all files in the folder are first cached. The responses are always made from the cache. With this approach you have two options:
- Halt the server start until caching is finished
- Cache in parallel with the server start
Halting the server delays the server startup so it takes longer to restart the server, but it makes the implementation overall simpler. Parallel caching means that all incoming requests need to wait for the cache to be populated before they can respond, so there's an element of synchronization involved. Promises pose an advantage in this regard, but you can also build simple abstractions to deal with synchronization. Try all options.
For indexing folders, you can use the fs.readdir() (or fsPromises.readdir())
function and use the recursive option.
The ETag response header is used by the client to identify the version of a
resource. It can then ask the server to only return a resource if the version it
has is not current by using the If-None-Match header.
The workflow is as follows:
- Server sends the
ETagheader with every response - Client sends the
If-None-Matchheader containing the value of theETagheader it last saw - Server checks if the resource at the path has the same
ETag- if the
ETagmatches, server responds with 304 - otherwise server responds with the regular 200 request, including the
updated
ETagheader
- if the
ETag value can either be weak or strong. For weak ETag you can use the
file's modification timestamp, for example. For a strong ETag, you can use the
SHA-256 hash of the file content (provided by the node:crypto module). Try
both options and note the difference in performance between the two if any. (You
will likely not see a difference in performance unless the file is fairly
large.)