Saturday, September 22, 2018

Model correctly and write less code, using Akka Streams

One of my most productive days was throwing away 1000 lines of code.

-- Ken Thompson
More than two decades of writing software programs to earn some humble money and to keep the home hearth warm, has taught me that writing simple, elegant and readable programs result in much more maintainable and extensible software. Importantly, stakeholders prefer to have maintainable code. One way to ensure that  these properties exist in the code, is to write fewer lines of code, wherever possible. Generations of greats in the world of computer science and programming have been reminding us about this little yet undeniable truth. Yet, it is surprising how often we tend to ignore this, knowingly and to projects’ detriment.
If we have to write programs that are simple to understand, we have to understand the problem well, before anything else. If the problem is redefined in simpler and more fundamental terms, then it becomes easier to model the problem in the programming terms. The solution that emerges is likely to be simple and small (but no smaller than required). John Bentley’s Programming Pearls offers excellent treatise of this matter. I still refer to it from time to time. I hope, others do too.
It is my considered view that a well-modeled (i.e., its crux is understood) problem, leads to compact and resilient code. It is much easier to reason about. The verbosity of the language and accidental complexities of the technologies involved may bring in necessary planks and poles, but to an alert pair of eyes, the theme remains easily discernible. A beautiful codebase is a smaller codebase.

The case at hand

Allow me to take you through a greatly abridged version of a recent assignment I have been
associated with. I have been helping a team rewrite their server stack, which is meant to allow users
play quiz like games. Shorn of all business functionality, what happens is this:
  • A player begins a session, chooses a game (quiz) - from several available - to play
  • Player answers questions one at a time 
  • The Server, at the end of the game (all questions attempted), calculates and declares total she has scored
The server updates/informs other important functional units, of player’s accomplishments. Any number of players can play games simultaneously. The server has to keep up with this load.
Simple enough, when one strips it of layers of HTTP, Configuration, Database, Timers, Queues and what have you! I have dropped from the list, other business requirements, as well.

Focus on the interaction

For the purpose of this blog, let us consider a particular portion of the abovementioned functionality: what does the server do, when a player contacts the server? Assuming that the player has already been authenticated - which is a precondition of and not a part of the play-related interaction mentioned above - her interaction has two distinct parts:

Interaction 1 (StartARound)

Hey Server, I want to play a game. Here’s my Session Identifier (obtained earlier from AUTH component) and give me a question.

Interaction 2 (PlayARound)

Hey Server, here’s my answer to the previous question and give me the next one (let’s assume that server offers 3 questions only, in three consecutive rounds; then the game ends.)
Let’s ignore all the network, protocol, routing and JSONfication details for the time being. Then, let’s take a look at what is it that the Server does and then, how we can describe that in a manner that is terse, yet conveys the right meaning.

Modeling the interaction is the key

The Server takes in a piece of data and keeps on transforming it, till another piece of data is ready to be handed over to the client. This transformation consists of one or more steps and in each step, one piece gives rise to another piece. Also, in each step, help from other components or services may be summoned and used, before emitting the resultant piece. Therefore, if we can identify each step, what goes into it and what comes out of it, we can easily model how the whole server works! Moreover, with such a model in hand, we can also verify the behaviour of the server.
An obvious first question is how do we identify these pieces? In the world of OO and Functional Programming, where I manage to reside (and so far, have not been evicted), it is quite natural to identify these pieces by their Types! Every step takes in a Type and gives rise to the same or different Type. Given this tenet, how can we represent the way the Server responds to the player?

The Type-borne behaviour

The diagram below elucidates the scheme of things. The rightmost box shows the transformations that are happening inside the Server.
In terms of Types, one way to describe the flow ‘Start A Round’ (above) is:

Let’s elaborate

The logic of confirming the correctness of sessionID passed by the player (is it existing and valid, or not), is encased in the transformer named sessionExistenceChecker. Because the server stipulates that every message reaching its shores, must have a valid sessionID, every message has to pass through sessionExistenceChecker. However, the important observation is this:
sessionExistenceChecker  understands SessionCarrier only. Therefore, in order to be recognized by the checker, every message must also be a SessionCarrier. In OO terms, every message entering sessionExistenceChecker must be subtype (IS-A) of SessionCarrier.

There are three benefits of this approach:
- Type is self-documenting: the model is self-explanatory. The constraints are obvious. If I want to know what do I need to gather before I can ask sessionExistenceChecker to flag me off OK, I have to look no further than the type it expects.
- Compiler helps to reduce defect: if I am unmindful and pass a message which is not session-id-checkable, the compiler will block my progress with a grim message. A defect will be forestalled much before the code is readied for testing. That’s a substantial gain.
- Code is intuitive and readable: it is quite easy - in many cases straightforward - to translate this model into code (we will see a little later).

Now, let’s look at the result of transformation. The transformer applies its logic and emits either of these Types:
IncorrectSessionIDProvided
This indicates that not everything is correct with the SessionCarrier that has been passed.
StartARound
This indicates that StartARound type - which IS-A SessionCarrier - has come out of the checker, unscathed!
The key understanding, again, is that these are not values but Types! The actual (runtime) objects moving in and out of the checker may carry anything, but they must conform to these Types.

The next step (refer to the shaded region of the Flow diagram, above) is to choose a number for the player, associate that with a Round Identifier and then get back to her with an appropriate message. This logic is encased in the transformer named guessNumberPreparator. Because it  is next to sessionExistenceChecker, it has but got to be capable of consuming either of IncorrectSessionIDProvided and StartARound. Then, it emits either of these Types:


IncorrectSessionIDProvided
This indicates that not everything is correct with the SessionCarrier that has been passed.
RoundStarted
This carries
  • A confirmation that the Session Identifier passed with StartARound has been found to be correct (by the checker earlier)
  • A confirmation that the Server has chosen a number for the player
  • A identifier of the round given to the player to guess
That’s it. We have the blueprint of the Server’s implementation of Interaction[1], available.
Translating this into code - when implemented using Akka Streams - we get this:
val serverSays =
       Source
           .single(StartARound("A123")) // A123 is a session id, obtained from AUTH service
           .via(sessionExistenceChecker)
           .via(guessNumberPreparator)
The diagram below illustrates Server’s implementation of Interaction[2]: when Player makes a guess, and the server gives her points for correct guessing and shows the latest score.
Recall that the Server ends the game after 3 rounds. The way error is handled is the same as that in the previous flow (StartARound). Also, the transformer that checks correctness of Session Identifier is reused here.

I am not depicting the flow of types and transformers for this flow, for space’s sake. The code segment that implements this flow is small, crisp and perhaps, quite intuitive as well:

val serverSays =
    Source
        .single(GuessSubmittedByPlayer(sessionID,roundID, guessedNumber)))
        .via(sessionExistenceChecker)
        .via(roundCorrectnessChecker)
        .via(guessedNumberVerifier)
        .via(pointsAssigner)
        .via(scoreBoardUpdater)
        .via(currentScorePreparator)
        .via(gameTerminationDecider)
        .via(nextGuessGenerator)
That’s what our Server does, to implement Interaction[2]. That’s all there is to it, really!

An Akka-streams based implementation brings in many other benefits, However, the aim of this blog is not to explore and discuss, many and very beneficial aspects of Akka Streams. A number of blogs already exist which do the job very, very well (Colin Breck’s are here, my personal favourite), not to mention Akka Stream’s own site and numerous discussions on StackOverFlow. Therefore, I will rather bring your attention to other aspects of this approach of modeling:

- If we can model the pathway of processing of any message as a series of transformations, then translation of the same in code becomes decidedly easier.

- If the model is clear, the code is small, crisp and readable. This is with what we have begun this blog, haven’t we? The code does what the model depicts; nothing more, nothing less. No code exists, that has no reason to exist. Brevity matters.

- Every transformation demands that types are provided to indicate what it takes and gives. If and when, our code fails to satisfy transformers  - by passing wrong types - compiler stops us in the track. Because it is a compile-time prevention of potential defects, this approach saves time and effort, in a very significant way. That’s an undeniable gain. Type-Driven Development, did  you say?

- When it comes to testing the behaviour, it is possible to write separate testcases for every transformer separately and for a series of them as specified in the flow. For example, it is quite easy to test if the points have been assigned correctly:
  val serverSays =
        Source
          .single(GuessSubmittedByPlayer(sessionID,roundID, guessedNumber)))
          .via(sessionExistenceChecker)
          .via(roundCorrectnessChecker)
          .via(guessedNumberVerifier)
          .via(pointsAssigner)
          // subsequent transformations dropped because we want to test this outcome

Using akka stream’s testkit and scalatest, it is straightforward to test such partial outcomes (refer to the github codebase). So, we can test functional behaviour of the Server without having to set up the planks and poles around it.


- Take a good look at what your Server is supposed to do, and spend time in modeling the expected behaviour.
- Depend on types to strengthen the correctness and let the compiler help you save time to spend on testing.
- Write quick and automated test cases.
- Above all, do yourself a favour: write less code!

All accompanying code reside here, on github.

Remember the programming maxim: the best code I ever wrote, was the code that I never wrote! (This blog is also posted here: I work as a Principal at Swanspeed Consulting)







2 comments:

  1. Why not just use functions? You can compose them into a pipeline using `.andThen` if you want to have that nice point-free style.

    But what if the data flow gets more complex and step 5 needs something from step 2? Do steps 3 and 4 have to explicitly convey that data?

    I like Akka Streams for a number of use cases, but I don't understand why it's needed here.

    ReplyDelete
    Replies
    1. Point well-taken, Alan.

      This example and codebase are from a much larger, deployed backend stack, that is built using Akka Streams. I simply tried to elucidate a concept, and used Akka Streams to exemplify it (in addition to being lazy). Of course, there are myriad ways of implementing this, many of which are beyond my admittedly little knowledge. Nowhere, I am suggesting that one *has but to use* Akka Streams in order to follow this approach.

      Of course, you can use functions. In fact, at the time of materialization, something equivalent will happen in many of the the transformations I have mentioned. That's my understanding.

      If Step[5] needs something from Step[2], we will have other ways to handle it. Conceptually, if I conceive the stream as *serial* transformations, then Step[5] has no choice but hope that Step [3] and [4] are faithfully carrying it from Step[2]. On the other hand, if my use-case says that Step[5] should have direct access to what Step[2] produces, then I will implement that as two *parallel* streams: [2]-[3]-[4]-[6/Merge] and [2]-[5]-[6/Merge]. Of course, merger at Step[6] is just an example again: I presume you accept that. _Sink.ignore()_ is always handy!

      The basic point of mine - that significant benefit accrues from modeling as streams of typed transformations - is not diluted at all, IMO.

      Delete