4x *the mini multi-tasking HTTP/1.1 web-server*

Architecture

Programming Language

System Interface

Networking

Concurrency

I/O Architecture

4x uses a deliberately low-technology architecture designed to show just how good a simple, synchronous, forking web-server can be.

Programming Language

4x is written in ANSI C89 (ANSI X3.159-1989). Because it contains no Java, C++, Objective-C, D, Go, C99, C11, Python, etc, it is very portable, reasonably efficient and has lower resource requirements would otherwise be the case.

This also allows the server to be relatively self-contained, requiring no additional dynamic runtime-support facilities over and above the host platform itself: you will not have to pre-download and pre-install dozens of other software packages: usually none, and at most one (for MS Windows).

System Interface

For all non-network facilities, 4x uses the "classic" POSIX.1 (IEEE Std 1003.1-1990) interface to the host operating-system, and (for security reasons) additionally requires the readlink() and lstat() functions as defined by the X/Open Portability Guide version 4.2, UNIX95 and POSIX 1003.1-2001.

The very wide availability of these interfaces (UNIX SVR4, BSD4.4, Linux, MS Windows NT/2K/XP/Vista/7, Cygwin, OpenVMS, NetBSD, and so on) is another reason why 4x is so portable.

Network Interface

4x uses the classic "sockets" API, and only the minimal extremely-portable subset of that:

Only plain old TCP stream sockets; no routing-sockets, no arpcache-sockets, no AF_NETLINK, etc.
synchronous sockets only; no SIGURG nor fcntl(fd,...,F_SETOWN), etc.
no fancy "modern" micro-tuning low-level socket-options such as TCP_CORK
The IPv6 socket API extensions are only used on those systems that provide IPv6 facilities.

4x avoids or works-around common socket interface bugs, including:

Non-blocking read() on remotely-closed socket can report either EOF or EAGAIN on SVR4 UNIX and many derivatives, even though only the EAGAIN is documented,
SHUT_READ, SHUT_WRITE and SHUT_RDWR are not defined on older platforms, but must be used on those platforms that do define them (for ABI reasons),
Bidirectional asynchronous sockets do not always work on SVR4 UNIX

All of these choices aid portability and resource usage.

Concurrency

4x handles multiple concurrent clients by providing a dedicated separate process for each (potentially persistent, pipelined) connection. Processes are created on-demand, to avoid the need to "guesstimate" pre-tuning settings and to avoid consuming resources when the server is not-so-busy.

In other words, it is an old-style "forking" server, but taking advantage of HTTP/1.1 persistent connections to somewhat ameliorate the speed costs.

The forking architecture has several advantages, as well as a single drawback. On the plus side, the server does not need to second-guess the host operating-system - all concurrency control and sequencing is handed-off to the host systems, which are generally very good at it, having had decades of development and tuning.

Also, as each connection is completely self-contained (shares no address-space or open file-handles with any other connection), attacks that attempt to crash or corrupt the handling process can only effect that one connection - even if such an attack were successful, service for other clients or other connections would not be stopped or interfered with.

In summary:

single-threaded but highly concurrent - multitasking not multithreaded
no busy-waiting or polling
no mutexes or condition-wait variables
no realtime-signals
no shared address-space
no need to hand-tune thread-stack size limits
no need to tune file-descriptor limits

I/O Architecture

4x uses a 100% dataflow-driven I/O architecture. From the perspective of the webserver software, all I/O is completely synchronous. All read-ahead, write-behind, I/O multiplexing is handled by the host operating environment. In particular:

use simple low-overhead synchronous read() and write()
no select() or poll()
no /dev/poll or /dev/epoll
no kqueue or kevent
no aio_read() and aio_write()
no readiness-signals
no I/O-completion-ports
no realtime-signals
no mutexes, no event-wait condition-variables
no fcntl(..., F_SETOWN) or SIGURG semantics

This allows 4x to be much smaller, and leverages decades of multiplexing experience embodied in most host operating systems.

There is no need to explicitly deal with potential I/O race conditions, and each service process can remain blissfully unaware of the other processes - no explicit coordination is needed. This makes reasoning about the correctness of the software very much easier.

* a pure dataflow-driven architecture, thus no I/O-multiplexing APIs, thus:
	no select()!
	no poll()
	no /dev/poll or /dev/epoll
	no kqueue or kevent
	no readiness-signals
	no I/O-completion-ports
* no aynschronous I/O APIs, thus:
	no AIO
	no realtime-signals
	no fcntl(..., F_SETOWN) or SIGURG semantics
	no mutexes
	no I/O-completion-ports
* no new-fangled goodies (apart from IPv6 later, and optional PAM support):
	no high-resolution timers: use alarm() only
	no thread APIs of any kind
	no readv()/writev() or variations
	no sendfile() unless specifically requested, must work properly without
* no kernel patches
* no loadable kernel modules
* no additional device-drivers
* no explicit use of "alternative" process-scheduling facilities
	(eg: Solaris FSS or RT)


The places where the implementation currently goes "outside the garden":

* for security reasons, the lstat() and readlink() calls are allowed
	even though they are not part of POSIX 1003.1-1990. Because of the
	security implications and lack of any reasonable alternative, and
	the fact that almost all UNIX-like environments had both even in 1992,
	these two exceptions must be kept.

* to allow zero-admin HTTP Basic Authentication, the PAM API can be
	optionally configured into the build. Currently, In the absence of that,
	HTTP Basic Authentication facilities will not be available.


Other targets for the implementation are:

* small executable size: target < 50Kb on SPARC and MIPS, < 45Kb on IA32
* beat the NCSA httpd 1.3, CERN w3 and Apache 1.3 scores in the "Acme98"
	performance test (see www.acme.com/software/thttpd/benchmarks.htm)
	on that same hardware/OS system, and to get within shouting distance
	of the thttpd scores if at all possible. ie: to prove that an
	on-demand-forking server can perform well on very old, plain old,
	modern and bleeding-edge systems (small, large and huge) without being
	specifically micro-tuned or restructured for any of those variations.
* at most *gradual* decrease in performance with increaing number of concurrent
	users/sessions: no falling-off-a-cliff.

To be a useful current server and a good netizen, it provides:

* no gratuitous emmission of more network packets than necessary
	(not a "selfish" implementation).
* support for HTTP/1.1 clients and facilities, expressly including:
	persistent connections with proper connection-management
	support of client pipelining
	If-Modified-Since conditional GETs
	outgoing streaming using chunked transfer-encoding
	incoming chunked transfer-encoding 
	100-Continue expectation processing
	Basic authentication (in this case, currently via PAM).
* transparent support for HTTP/1.0 clients
* CGI/1.1 with configurable timeout
* bombproof finding of files to deliver:
	no escapes from the document-root, even if subsequently reenter it
	early decoding of URN-path, to avoid unexpected-character-escaping
	stripping of leading and trailing duplicate slashes from URNs
		(the implicit-Magic-Filesystem attack).
	support symbolic links but still with no escaping
	detect and disallow hard-links
	*all* file-/directory-access errors cause "Not Found" response:
		no side-band leaking of filesystem structures outside the
		document-root. Esp: no "access-denied" if non-existant file or
		directory, or for unreadable file, or even for disallowed
		files (hard-links, paths that attempt to escape with "..",
		invalid URN characters, and so on and so on).
* Relatively-Simple-Admin:
	all configurable run-time options are expressed on the command-line.
	server can be gracefully stopped by either SIGINT, SIGQUIT
		or SIGTERM signals, and automatically shuts down all
		child processes.
	no need to adjust default per-process resource limits to run the server
		(server processes are small and use only 4 file-descriptors).
	no need for system-level thread-stack-size tuning (a common problem for
		multi-threaded servers, esp. on 32-bit platforms).
	error-logging to standard-error, so can be saved/discarded as desired.
	access-logging to standard-output, so can be saved/discarded as desired.

Suggested Future Features:

* Range requests and conditional range requests.
* Support for IPv6.
* transparent support for HTTP/0.9 clients.