This post sketches the history of web application paradigms and proposed a new paradigm that could offer a unique combination of benefits compared to the others. Although I haven’t had the opportunity yet to try that paradigm out, I wanted to share these notes—primarily so I can find them again in the future, but maybe others will benefit from them in the meantime too.
Server-Rendered
In the beginning there were server-rendered web applications. You hit a URL, the server ran necessary queries to retrieve data, then you received back the rendered HTML UI. When you navigated to a new page, searched, or paginated, the server ran new queries then returned a whole new pages. When you issued a command by submitting a form, the server ran queries to modify any necessary data, then (usually after a redirect for reload protection) reran queries and rerendered the UI again.
From a 2021 perspective, the user interface downsides of this approach jump out at us: any change requires a complete page rerender, including losing scroll and insertion point state.
But there were upsides as well. The app logic (setting aside HTML and CSS) was written as a single codebase in mostly a single programming language. Because there was no persistent client-side state, you didn’t run into distributed-systems problems of data falling out-of-date: every time you made a request, you got the latest data, including anything changed by other users.
JavaScript Sprinkles
As browser JavaScript functionality improved, applications began to take advantage of it to solve the page-rerender problem. Small UI interactions were handled entirely in JavaScript, such as collapsing and expanding elements—and they could even be animated.
For interactions that did require changing server state, the introduction of XMLHttpRequest
allowed changes to be saved without a page refresh. But when this required a visual change on the page in response, a problem started creeping up: duplication and incoherence. Previously the user interface was generated by the server returning HTML, but how would we update that UI in response to an Ajax response?
One option was to manipulate elements directly using the DOM API. This was straightforward enough to make simple changes such as hiding or showing elements, or adding or removing a CSS class. But what about when you needed to construct entirely new elements? At this point you would be duplicating rendering logic that already existed on the server in cumbersome DOM APIs.
Ruby on Rails advocated a different approach: return HTML fragments from the server and replace portions of the page with them. This allowed reusing the same rendering logic on the server, but there was still incoherence: you needed to know what partial template to render, how to find the container element that needed to be replaced, and to make sure you were loading up the same data and rendering it back. This was still a duplication of UI logic from the initial render.
Client-Side Apps
Over time, higher- and higher-level abstractions were built over the DOM APIs to allow JavaScript to more effectively render user interfaces in the browser. It came to the point where it became feasible to render your entire user interface on the client. You queried the server for data, returned in a format like XML or JSON. Then you rendered the UI based on that data. When data changed, you sent a request to the server and updated that data in your client store, then rerendered the UI.
For the UI, this was a step toward coherence: now your rendering logic was all in one place again, this time in the client. The declarative rendering paradigm React introduced even made rerendering coherent. All you had to do was declare how rerender a given data set, then any time your data changed, rerendering automatically happened.
On the flip side, client-side apps were a step toward duplication when it came to data handling. The data needed to be stored on the client side somehow, either in a central data store or at least directly in components using it. When you issued a request to the server to change data on the server, the client needed to know what to do to reflect those changes. Should it update the attribute of a record? Add or remove a record? Replace one? Did it need to recalculate pagination? Did a call to the server result in a change to a number of records? All these changes happened on the server, but the client also needed to know what to do in response to them. This is another kind of incoherence.
JSON:API
The JSON:API spec is an attempt to provide conventions around RESTful APIs that use JSON. It’s modeled to fit well with relational databases, and answers questions about JSON data format, error messages, and relationships between records.
JSON:API helps to solve the problem of duplicate data logic between client and server because all operations are well-defined and not application-specific. Whether you’re creating, updating, or deleting a record, a general JSON:API client library can handle updating the local client cache of data, so you don’t need to implement those yourself.
JSON:API has seen limited adoption, and I expect one of the reasons why is that this approach works for CRUD systems where the only operations are things that happen on the client, but it struggles when you need more logic to happen on the server. Even when using a RESTful mental model of applying changes to resources, you may want to trigger changes to multiple database records in response to one “resource” change.
This leaves you with a catch-22: either limit yourself to CRUD operations and leave the client in control, or let the server be in control and custom-code the client to reflect complex changes that happen on the server.
Server-Reactive Apps
There are a collection of recent approaches you might call server-reactive approaches. A more thorough analysis would include Phoenix LiveView and Rails’ Hotwire. But for now since I’ve used Stimulus Reflex (SR) for Rails, let’s focus on that.
In SR, the server renders the initial page. Actions that will change page state, even “transient” UI state, are sent back to the server. The server updates databases and in-memory transient variables, then rerenders the page the user is on. The rerendered page is sent back to the client over a websocket, and the DOM elements of the page are updated. Optimizations can be applied for things such as limiting the parts of the page that are rerendered and the DOM elements changed, but with or without those optimizations the approach is conceptually the same.
These approaches come from a variety of motivations:
- Wanting the UI logic to live in one place
- Avoiding the overhead of building extensive client and server connection code
- Avoiding the challenges of a distributed system
- Wanting the benefits of server rendering to performance and search indexing
- Wanting to build apps in languages other than ones the browser can run.
Personally, I find the first three motivations compelling. The fourth has benefits in some contexts but isn’t as big a deal for me. And while I identify with the fifth—I would prefer to be writing as much code as possible in Ruby—there is a significant downside: complexity getting the UI to behave as expected. Let me explain.
While a server-reactive approach works fine for the basic case of updating elements on the page, there are edge cases that it struggles with. One particular edge case is text inputs. After an action is run, should a focused text input stay focused, or should it not be? By default, if an input element is removed and replaced with another input element, the latter would not have focus. But, depending on your intent, you may or may not want it to. After an action runs upon a text field change event, maybe you want it to stay focused and maybe you don’t. A server-reactive framework needs to default to one way or the other, and it’s tricky to take the other approach.
By contrast, handling text input focus is straightforward in client-side JavaScript applications. This is because the experience we want to provide is a rich UI being interacted with directly. When we are letting the server direct us to swap out parts of the page, it’s harder to accomplish.
Stimulus Reflex handles this by not trying to achieve all your interactivity, but working along with a variety of other Rails approaches for handling rich UIs: submitting forms via Ajax, the Stimulus JavaScript framework for lightweight functionality, and Stimulus Reflex for heavier reactivity. But this is quite a bit of incoherence: depending on what bit of interactivity you want to implement, there are many different ways you might need to go about it. By contrast, it’s more consistent and straightforward to say that React handles your UI, period.
Now, it’s possible that SR or one of the other libraries has perfectly handled this edge case for now. But there’s always the likelihood that there are other cases that aren’t handled, or that will come up in the future. And even if not, you’re dependent on this library to handle those edge cases, to successfully exert significant effort to abstract away the fundamental mismatch between a rich UI and server-driven approach. For cases where you want server rendering or want to be writing in a language the browser doesn’t support, this may be worth it, but for other cases the costs are too high.
Proposal
Comparing the strengths and weaknesses of the above approaches, a list of goals emerges:
- Rich UI logic is implemented in one consistent way in the client and is not duplicated
- Business logic lives on the server only and is not duplicated
- Minimal app-specific client or server code to write, especially not client cache management
None of the existing approaches satisfies all of these:
- Server-rendered apps don’t have a rich UI
- JavaScript sprinkles duplicate rendering logic in the frontend
- Client-side apps duplicate business logic on the frontend when you have more logic than simple CRUD
- Server-reactive apps don’t implement UI logic in a consistent way, but instead require workarounds
But there is another approach we could take that could get us closer.
Think about server-reactive apps. They make two changes compared to the other approaches: (1) they circumvent the need for client state management by going back to the server, and (2) they return HTML instead of data over the wire. But these two changes are separate decisions that don’t need to change together.
What if we make change 1 but not change 2? What if we remove the need for client state management by going back to the server, but we only ask it to return the updated data to the client?
It could work like this:
- The client side app starts up as usual
- It makes one or more requests to the server for necessary data, then updates based on data it receives back
- UI interactions happen as usual, with transient client-side state being updated
- When data the server owns needs to be changed, a command is sent to the server
- Here’s where the difference is. The client doesn’t attempt to make decisions about what changes to make to its cache of the server data, either before or after the command to the server. Instead, the client receives updated data for all of the queries it’s currently using.
In other words, after any command we “rerender” the data that is displayed on the screen.
The details of this approach could be handled a few different ways:
- The queries and commands could each be done over HTTP requests for simplicity of implementation, or they could be sent over a WebSocket to reduce latency.
- The server could always send back full data for simplicity of implementation, or it could apply “diffing” by comparing the query data before and after a command is run. This would allow it to send only a subset of the data.
What do we achieve by this approach?
- We get the kind of rich UI we want.
- UI logic is only in the client, where it belongs. There’s no reimplementation of server UI logic on the client. And no workarounds to reactively update the frontend from the backend.
- Backend logic is only on the server, where it belongs. The frontend doesn’t need to know anything about the impact of a backend operation on its data; it lets the backend inform it.
- Almost no app-specific connection code needed. In an HTTP-based approach, the server endpoints can be simple JSON endpoints, and the client code can be app-agnostic. A WebSockets-based approach would not be much more complex and would still be app-agnostic.
- If you choose to write the backend in Node.js then there’s a strong consistency and simplicity to the code across frontend and backend. When you want to implement a new feature, you build the UI in React and the queries/commands in an Express route. There’s very little handoff code to write.
What are some concerns with this approach?
- UI is dependent on JS, startup can be slow, and errors can crash the UI. When this is a concern, server rendering and server-reactive options are available, but the downsides of these have been discussed above.
- Inefficiency of data retransmitted. An important point is that you need to only transmit the data needed for the UI for that moment, and omit extraneous fields and records. If you’re doing this then the data retransmitted should be strictly less than the UI code retransmitted by SR.
The next hobby app I start will probably follow this approach, to test it out and learn more about the pros and cons.