Hajime, the duck guy

Server-side scripting kata
On this page
Level: Important

Serving static assets

Build a static site server

The way of kata

Kata is an exercise for muscle memory. It's not intended to fill your brain with information but train your fingers to react. The information is there to give you the why, but your fingers need to learn the how.

The material on this page is presented in a specific order — from least specific to highly technical. You will learn the most by jumping in as soon as you have some idea of what you should do. Once you're done, read the rest of the material and check your solution.

All katas are designed to be doable without using 3rd party libraries (and, in fact, the point is to also learn how to do what these libraries do).

To make the best of katas, observe the following rules.

  • Don't rush.
  • When stuck, take a break and do something unrelated.
  • Do not copy/paste code. Always retype everything.
  • Do not use AI tools to generate code.
  • Try to do something that wasn't in the instructions, experiment.
  • Repeat the kata from time to time, even if you think you've got it.
  • You have mastered the kata once you are able to complete it without thinking too much.

Remember, the goal is not to get it done, but to get some practice.

Introduction

In this exercise, you will build a static site server with Etag support.

Skills you will acquire

  • Understanding of absolute and relative paths
  • HTTP basics
  • Reading files
  • Content hashing using the SHA-256
  • In-memory caching
  • Using ETag header
  • Directory traversal attack prevention

Objective

  • Serve the contents of a selected directory as a static site
  • Serve a 404 page when no content matches the URL
  • Implement ETag-based cache management
  • Implement directory traversal attack protection

Check your solution

  • We can access any file within the static files folder
  • For URLs matching a folder, we receive the index.html file within the folder (if any)
  • For any URL not matching any file, we get a 404 page
  • The status code for all matching URLs is 200
  • A strong ETag header is returned for all matching URLs
  • The correct Content-Type header is returned for each file
  • When a file is fetched for the second time, it results in a 304 status code
  • Including the .. in the URL does not allow us to fetch files that are outside of the static files folder

Keep in mind

One of the key concepts to understand is the concept of relative and absolute paths. For the purposes of discussing paths, I am going to refer to 'files' where I mean either a file on your disk, or a resource that the server returns (which is not necessarily a file).

The paths consists of segments delimited by separators. The separator is a forward slash on the web (in URLs) and on Unix-like operating systems (e.g., Linux, MacOS, BSD). On Windows, the separator is a backslash \. The segments are either folders or special directories . and .. (more on that later). The last segment is either a folder, special directory, or a file, but if the path ends with the separator, it points to a folder.

The absolute path to a file starts at the root and includes all folders leading up to the file. Normally the single separator represents the root folder. On Windows, though, there are multiple root folders, one per drive. Here's an example of an absolute path on a Linux system:

/var/www/svg-spirit/index.html

On Windows, an absolute path may look like this:

C:\dev\svg-spirit\index.html

When it comes to URls, the paths look exactly the same as on Unix-like systems.

A relative path to a file starts from a specific folder and describes a sequence of folders leading from that folder to the file. For example, the path of the index.html file in the previous example relative to /var/www is:

svg-spirit/index.html

Paths can include special folders . and ...

The single-dot folder . means "current folder". For example, the following two paths are exactly the same:

/var/www/./svg-spirit/index.html
/var/www/svg-spirit/index.html

In case of relative paths, the meaning of a path that includes the single-dot path can be slightly different depending on the context. For example, the relative path we've seen before can be written as ./svg-spirit/index.html and can mean the exact same thing.

The double-dot folder points at the folder above. For example, /var/www/../ is the same as /var/www. Suppose we want to find the index.html in our example, but relative to /var/www/example/static, we would use the following path:

../../svg-spirit/index.html

The first two segments of the path are going up one level each. That means that we go from /var/www/example/static to /var/www/example and then up again to /var/www. The remainder of the path is added to that.

We mentioned before that the URLs use Unix-style paths, while Windows uses different separators for paths. Regardless of what operating system you are working with, you should always try to make paths work on any system. In real life, you will work with developers that might be using a different operating system than yours.

On NodeJS, all operations related to paths are implemented in the node:path module. This module provides functions for working with strings that represent paths, not physical files and folders on the disk. To access the physical files, you will use the node:fs module instead.

I recommend doing this exercise in iterations.

First create a basic version that reads the files on the disk and returns the contents. In this scenario, you can, instead of reading a file, open it and then create a read stream out of it, and pipe the stream straight into the response object. This is more memory-efficient than first reading the file, and then dumping its contents into the response.

When responding, do not forget to set the Content-Type header to the media type — a.k.a. MIME type — of the file. Usually, we figure out the media type based on the file extension.

In the second iteration, add caching. Caching can be done in one of two ways:

  • Ad-hoc caching
  • Pre-caching

The ad-hoc caching is done when the resource is accessed. The pre-caching is done before or while the server is started. The exact implementation of the caching mechanism depends a lot on what kind of API you use for the filesystem calls — whether you use the callback-based one or promise-based one.

You should try both ad-hoc and pre-caching, and both callback-based and promised-base APIs.

In any case, the caching is in-memory. This usually involves a key-value data structure (e.g., an object or a Map) where the cached items are looked up. The lookup is typically done using the resource path.

With ad-hoc caching, you will first check if the cache is empty, and then cache the resource if it is. Then you will retrieve the cached item and return it to the client.

With pre-caching, all files in the folder are first cached. The responses are always made from the cache. With this approach you have two options:

  • Halt the server start until caching is finished
  • Cache in parallel with the server start

Halting the server delays the server startup so it takes longer to restart the server, but it makes the implementation overall simpler. Parallel caching means that all incoming requests need to wait for the cache to be populated before they can respond, so there's an element of synchronization involved. Promises pose an advantage in this regard, but you can also build simple abstractions to deal with synchronization. Try all options.

For indexing folders, you can use the fs.readdir() (or fsPromises.readdir()) function and use the recursive option.

The ETag response header is used by the client to identify the version of a resource. It can then ask the server to only return a resource if the version it has is not current by using the If-None-Match header.

The workflow is as follows:

  • Server sends the ETag header with every response
  • Client sends the If-None-Match header containing the value of the ETag header it last saw
  • Server checks if the resource at the path has the same ETag
    • if the ETag matches, server responds with 304
    • otherwise server responds with the regular 200 request, including the updated ETag header

ETag value can either be weak or strong. For weak ETag you can use the file's modification timestamp, for example. For a strong ETag, you can use the SHA-256 hash of the file content (provided by the node:crypto module). Try both options and note the difference in performance between the two if any. (You will likely not see a difference in performance unless the file is fairly large.)

Reading list

Want more?

Back to top