Sunday, January 23, 2022

An advisory CTO: does it help to hire one?

As a consulting CTO/Engineering head for small-sized and medium-sized software product firms for about one and half a decade, often times I am asked this question by the main stakeholders of current and potential clients: why do I need your service? What can you do, that can benefit us?

It is an obvious question to ask. It is a fair question too. Over years, I have tried to answer this question to the best of my ability and intent. It will be a lie if I say that every time, my explanations drove home the point, as concisely as one expected. However, a theme underpinned my answers, every time. This write-up is an elaboration of that theme.

It helps to have someone as a sounding board for all things technical, behind your product

As your idea is taking shape or your offering is getting newer features (thereby bringing in new customers, you don’t want to lose the momentum. Catering to customers’ demands, adding features to the product, positioning it for newer market segments: all these are important for you and your Product Management team. However, the software stack that is going to lie underneath all these, has to support the offering, all along. Some level of plumbing is necessary now and cheap; the rest should be built as the direction of the product becomes clearer with the journey. A CTO can hear your plans, co-visualize the product alongside you, align your vision with trends of technology and outline for you technical underpinning that must evolve, you will have a much better view of cost of building what you want to build, the time it is going to take and the risks and benefits that alternatives offer.

Your Product Management team must be heard well and told clearly

Your Product Managers have ideas. They are quite enthusiastic about those. They would want these ideas to be implemented/realized as early as possible. However, they look for assurances from the engineering/technical team about the feasibility. The engineering team has to know really, really well what needs to be built, in order for those ideas to be materialized. An experienced CTO/Engineering Head can bridge this gap effectively and quickly. One cannot emphasize the importance of this act of elaboration: the correctness and completeness of the next several milestones of the product depend overwhelmingly on how well the boundaries are drawn and innards are sketched. Some times, the CTO may insist on restricting some feature temporarily so that subsequent features can be rolled out much faster. Some other times, the Engineering head may advise to fold two seemingly unconnected features together because of very beneficial reuse of components. A CTO can be the perfect interlocutor and devil’s advocate who talks straight, all for your benefit.

The future versions of the software requires deep and wide considerations

First version of the product came into being primarily because the first customers had to be acquired. Because of value your product continued to put forward, next few customers came in too. Now, the expectations of these customers — and of many potential future customers — are weighing heavily in your plans and in your software. The second version has to be built right, because the next phase is not only of growth but also of retention. The technical choices and decisions that your engineering team faces, turns quite challenging as the product begins to cater to variety of customers. The overall architecture may need a revisit. The non-functional requirements may crystallize and pose questions about the current technical structure. Perhaps, several proof-of-concept exercises become necessary. Your engineering team finds themselves in a flux. The steady and reassuring presence of a CTO can be immense help in situations like this.

Accidental complexities of software have to be dealt with

The primary features of the product you build, don’t necessarily depend on the complexity that the software begets or is infused with. You may begin with a browser-based UI but soon decide to move to native mobile apps. The code-bases increase. The areas of expertise required, increase. In effect, the whole of the development pipeline is affected, one way or the other. Business may demand a move to a cloud provider. The operational aspects spring up. The laws of the land may demand special actions to ensure privacy of your customers. All these cost money, time and effort. You will need someone to think and plan through these not-so-straightforward aspects. Someone must understand what the engineering team is facing and help align their preparatory or mitigative steps that gain confidence the Product Management team. As the accidental complexity grows, so do the need of specialized expertise in your engineering team. Your CTO will ensure that right teams are given the right tasks and establish appropriate engineering best practices. By avoiding mistakes she has learnt through her experiences, she can prevent cost of release and rework (this can be a substantial cost).

Your engineering team is your asset too; nurture the members and environment

Collectively, your engineering team carries a picture of the product: the various inputs and outputs, their formats, the points of integration with other 3rd party software, schema of the data storage, boundaries of primary business functions and their representation in executable form etc. They want the software they have built, to be used extensively by the intended users, both internal and external. That is the source of their **self-actualization**. An enthusiastic and expert team can, not only bring to fruition newer features smoothly, but also attract equally enthusiastic and expert staff from outside. Put differently, in order for the product to grow and meet expectations — speedily and timely — your engineering team has to be at its best. And, which challenges do they solve and how, go a long way to induce other great software technologists from outside to your organisation. A CTO can bring about the passion and commitment to your engineering team and help build a culture that stands your organisation in good stead. In a larger sense, such a team is one of the main ingredients of the world-class product that you build.

By hiring a CTO/Engineering Head — even in advisory or consulting positions — you and your stakeholders can be highly benefited. By having a clear understanding of the business and its future as well as a fantastic grasp of technology trends, with hands-on knowledge as necessary, a CTO can anticipate directions that your business is going to take, take preparatory measures on behalf of your technology team and help your business stay ahead of the curve, as it were. In the process, she also prevent not-so-obvious mistakes your engineers are likely to make and save you from unnecessary costs.

Saturday, September 22, 2018

Model correctly and write less code, using Akka Streams

One of my most productive days was throwing away 1000 lines of code.

-- Ken Thompson
More than two decades of writing software programs to earn some humble money and to keep the home hearth warm, has taught me that writing simple, elegant and readable programs result in much more maintainable and extensible software. Importantly, stakeholders prefer to have maintainable code. One way to ensure that  these properties exist in the code, is to write fewer lines of code, wherever possible. Generations of greats in the world of computer science and programming have been reminding us about this little yet undeniable truth. Yet, it is surprising how often we tend to ignore this, knowingly and to projects’ detriment.
If we have to write programs that are simple to understand, we have to understand the problem well, before anything else. If the problem is redefined in simpler and more fundamental terms, then it becomes easier to model the problem in the programming terms. The solution that emerges is likely to be simple and small (but no smaller than required). John Bentley’s Programming Pearls offers excellent treatise of this matter. I still refer to it from time to time. I hope, others do too.
It is my considered view that a well-modeled (i.e., its crux is understood) problem, leads to compact and resilient code. It is much easier to reason about. The verbosity of the language and accidental complexities of the technologies involved may bring in necessary planks and poles, but to an alert pair of eyes, the theme remains easily discernible. A beautiful codebase is a smaller codebase.

The case at hand

Allow me to take you through a greatly abridged version of a recent assignment I have been
associated with. I have been helping a team rewrite their server stack, which is meant to allow users
play quiz like games. Shorn of all business functionality, what happens is this:
  • A player begins a session, chooses a game (quiz) - from several available - to play
  • Player answers questions one at a time 
  • The Server, at the end of the game (all questions attempted), calculates and declares total she has scored
The server updates/informs other important functional units, of player’s accomplishments. Any number of players can play games simultaneously. The server has to keep up with this load.
Simple enough, when one strips it of layers of HTTP, Configuration, Database, Timers, Queues and what have you! I have dropped from the list, other business requirements, as well.

Focus on the interaction

For the purpose of this blog, let us consider a particular portion of the abovementioned functionality: what does the server do, when a player contacts the server? Assuming that the player has already been authenticated - which is a precondition of and not a part of the play-related interaction mentioned above - her interaction has two distinct parts:

Interaction 1 (StartARound)

Hey Server, I want to play a game. Here’s my Session Identifier (obtained earlier from AUTH component) and give me a question.

Interaction 2 (PlayARound)

Hey Server, here’s my answer to the previous question and give me the next one (let’s assume that server offers 3 questions only, in three consecutive rounds; then the game ends.)
Let’s ignore all the network, protocol, routing and JSONfication details for the time being. Then, let’s take a look at what is it that the Server does and then, how we can describe that in a manner that is terse, yet conveys the right meaning.

Modeling the interaction is the key

The Server takes in a piece of data and keeps on transforming it, till another piece of data is ready to be handed over to the client. This transformation consists of one or more steps and in each step, one piece gives rise to another piece. Also, in each step, help from other components or services may be summoned and used, before emitting the resultant piece. Therefore, if we can identify each step, what goes into it and what comes out of it, we can easily model how the whole server works! Moreover, with such a model in hand, we can also verify the behaviour of the server.
An obvious first question is how do we identify these pieces? In the world of OO and Functional Programming, where I manage to reside (and so far, have not been evicted), it is quite natural to identify these pieces by their Types! Every step takes in a Type and gives rise to the same or different Type. Given this tenet, how can we represent the way the Server responds to the player?

The Type-borne behaviour

The diagram below elucidates the scheme of things. The rightmost box shows the transformations that are happening inside the Server.
In terms of Types, one way to describe the flow ‘Start A Round’ (above) is:

Let’s elaborate

The logic of confirming the correctness of sessionID passed by the player (is it existing and valid, or not), is encased in the transformer named sessionExistenceChecker. Because the server stipulates that every message reaching its shores, must have a valid sessionID, every message has to pass through sessionExistenceChecker. However, the important observation is this:
sessionExistenceChecker  understands SessionCarrier only. Therefore, in order to be recognized by the checker, every message must also be a SessionCarrier. In OO terms, every message entering sessionExistenceChecker must be subtype (IS-A) of SessionCarrier.

There are three benefits of this approach:
- Type is self-documenting: the model is self-explanatory. The constraints are obvious. If I want to know what do I need to gather before I can ask sessionExistenceChecker to flag me off OK, I have to look no further than the type it expects.
- Compiler helps to reduce defect: if I am unmindful and pass a message which is not session-id-checkable, the compiler will block my progress with a grim message. A defect will be forestalled much before the code is readied for testing. That’s a substantial gain.
- Code is intuitive and readable: it is quite easy - in many cases straightforward - to translate this model into code (we will see a little later).

Now, let’s look at the result of transformation. The transformer applies its logic and emits either of these Types:
IncorrectSessionIDProvided
This indicates that not everything is correct with the SessionCarrier that has been passed.
StartARound
This indicates that StartARound type - which IS-A SessionCarrier - has come out of the checker, unscathed!
The key understanding, again, is that these are not values but Types! The actual (runtime) objects moving in and out of the checker may carry anything, but they must conform to these Types.

The next step (refer to the shaded region of the Flow diagram, above) is to choose a number for the player, associate that with a Round Identifier and then get back to her with an appropriate message. This logic is encased in the transformer named guessNumberPreparator. Because it  is next to sessionExistenceChecker, it has but got to be capable of consuming either of IncorrectSessionIDProvided and StartARound. Then, it emits either of these Types:


IncorrectSessionIDProvided
This indicates that not everything is correct with the SessionCarrier that has been passed.
RoundStarted
This carries
  • A confirmation that the Session Identifier passed with StartARound has been found to be correct (by the checker earlier)
  • A confirmation that the Server has chosen a number for the player
  • A identifier of the round given to the player to guess
That’s it. We have the blueprint of the Server’s implementation of Interaction[1], available.
Translating this into code - when implemented using Akka Streams - we get this:
val serverSays =
       Source
           .single(StartARound("A123")) // A123 is a session id, obtained from AUTH service
           .via(sessionExistenceChecker)
           .via(guessNumberPreparator)
The diagram below illustrates Server’s implementation of Interaction[2]: when Player makes a guess, and the server gives her points for correct guessing and shows the latest score.
Recall that the Server ends the game after 3 rounds. The way error is handled is the same as that in the previous flow (StartARound). Also, the transformer that checks correctness of Session Identifier is reused here.

I am not depicting the flow of types and transformers for this flow, for space’s sake. The code segment that implements this flow is small, crisp and perhaps, quite intuitive as well:

val serverSays =
    Source
        .single(GuessSubmittedByPlayer(sessionID,roundID, guessedNumber)))
        .via(sessionExistenceChecker)
        .via(roundCorrectnessChecker)
        .via(guessedNumberVerifier)
        .via(pointsAssigner)
        .via(scoreBoardUpdater)
        .via(currentScorePreparator)
        .via(gameTerminationDecider)
        .via(nextGuessGenerator)
That’s what our Server does, to implement Interaction[2]. That’s all there is to it, really!

An Akka-streams based implementation brings in many other benefits, However, the aim of this blog is not to explore and discuss, many and very beneficial aspects of Akka Streams. A number of blogs already exist which do the job very, very well (Colin Breck’s are here, my personal favourite), not to mention Akka Stream’s own site and numerous discussions on StackOverFlow. Therefore, I will rather bring your attention to other aspects of this approach of modeling:

- If we can model the pathway of processing of any message as a series of transformations, then translation of the same in code becomes decidedly easier.

- If the model is clear, the code is small, crisp and readable. This is with what we have begun this blog, haven’t we? The code does what the model depicts; nothing more, nothing less. No code exists, that has no reason to exist. Brevity matters.

- Every transformation demands that types are provided to indicate what it takes and gives. If and when, our code fails to satisfy transformers  - by passing wrong types - compiler stops us in the track. Because it is a compile-time prevention of potential defects, this approach saves time and effort, in a very significant way. That’s an undeniable gain. Type-Driven Development, did  you say?

- When it comes to testing the behaviour, it is possible to write separate testcases for every transformer separately and for a series of them as specified in the flow. For example, it is quite easy to test if the points have been assigned correctly:
  val serverSays =
        Source
          .single(GuessSubmittedByPlayer(sessionID,roundID, guessedNumber)))
          .via(sessionExistenceChecker)
          .via(roundCorrectnessChecker)
          .via(guessedNumberVerifier)
          .via(pointsAssigner)
          // subsequent transformations dropped because we want to test this outcome

Using akka stream’s testkit and scalatest, it is straightforward to test such partial outcomes (refer to the github codebase). So, we can test functional behaviour of the Server without having to set up the planks and poles around it.


- Take a good look at what your Server is supposed to do, and spend time in modeling the expected behaviour.
- Depend on types to strengthen the correctness and let the compiler help you save time to spend on testing.
- Write quick and automated test cases.
- Above all, do yourself a favour: write less code!

All accompanying code reside here, on github.

Remember the programming maxim: the best code I ever wrote, was the code that I never wrote! (This blog is also posted here: I work as a Principal at Swanspeed Consulting)







Tuesday, July 24, 2018

Your Microservices are asynchronous! Your approach to testing has to change!

Conceptually, the idea of an application being composed of a number of MicroServices is quite self-evident and therefore, quite appealing. A complex solution, which is broken into a number of self-sufficient modules, each running on its own, each capable of responding to messages, each amenable of upgradation/replacement without letting any other know about: sounds too good to be true! The icing on the cake of course, is that each such service can run on its own containers!

No wonder then, that senior stakeholders of software projects or initiatives, love to insist that the application be modelled as a bunch of MicroServices. The technology team responds favourably, of course, because everyone wants to be seen among the pack, if not ahead of the pack! Jumping in sounds exciting, even though the swimming may not be, but we don’t really care, do we?

The proverbial devil is in the details. In my experience so far, the term Microservice is thrown about without much thought. The benefits of having a Microservice-based solution comes with its own price. It is better to be aware of the aspects from which the price accrues and to fold them in the plan, rather than wishing them under the carpet. One such aspect is the property of inherent asynchronicity of  Microservices and verification (testing) of behaviour of the application as a whole, given this property.

No singleton Microservice

Let’s take one fatuous argument away to save time: Microservices-based applications cannot be monolithic. In all discussions, the word Microservices is used in plural. One accepts that there will indeed be more than one such services. They will interact between themselves as and when required by the application’s use-cases and logic of implementation. Putting the whole application in a big ball and attaching that with a HTTP handling layer, doesn’t constitute a Microservice.

First, the simple synchronous cases
Let’s assume that we are dealing with two Microservices. Let’s call them A and B. A has its upstream callers, who interact solely with A. In order to respond to its callers, A needs to get something done by B. Therefore, A needs to call an API exposed by B and expect something in return. To keep things simple, let’s assume that B offers an HTTP interface. So, A’s call is effectively synchronous.

Case 1: when B is up
Because B is a service, it has to keep running, because it doesn’t know when A may require something from it. At some point in time, A sends a request through an HTTP API, gets response back when B is done with the request. Testing this easy.
Case 2: when B is not up
A assumes that B is up but it is not. So, A will have to deal with an eventual non-availability of  response: some kind of timeout handling is obviously necessary. After all, HTTP is a session-based and single-connection protocol. If B is not available or reachable, TCP layer offers a number of error codes like ECONNREFUSED, ENETUNREACH, EHOSTUNREACH etc. ( Do you recall days of living with tomes by late Richard Stevens? I certainly do 😃). HTTP covers all of them for us, thankfully!
Case 3: B is slow
This is a little tricky! If B is slow to respond, uncertainty creeps in. Should A wait for a little longer for B’s response? If it does, then for how long? Configurable timeout value? That of course, is the most popular way and is most likely to be useful for a large number of cases, but the basic point remains: A is uncertain of receiving a response. We will revisit this a little later.
In all these situations, underlying TCP is helping us. HTTP sets up a single connection. Once the connection is set up (because B could be contacted), A is certain - well, almost certain but let’s not nitpick - that B has received the request. Everything else, including the testing approach, follows this.

Microservices are meant to be asynchronous


Asynchronous means just that: an interaction between two parties is based on the understanding that one of the parties may not be in a position to participate in the interaction, at that moment. It may or may not: this is the key understanding. In essence, this is an implementation of the classical Producer-Consumer problem. It is easy to visualize but may not be that straightforward to adopt when it comes to real-life situations.
The basic premise is this: A wants to tell B something but B may or may not available to sit and listen (it is not unwilling, it is just unavailable). So, A tells an intermediary what it has to tell B. When B is ready, it checks with the intermediary, and is told what A wants to tell it in person, but cannot.

Case 1:
A sends a message to B, but in reality, the message is dropped in the intermediary’s bin. The send() ( or post() ) operation of A, succeeds! A will not know if B ever comes to the intermediary and if it does, when! A can surely implement a Timeout logic, but it is difficult for it to know why B has never responded: is it because
  • the intermediary is sluggish, not really B’s fault
  • intermediary has become unavailable after accepting the message from A
  • B is sluggish (the same as B being slow, refer to #3 of synchronous case)
  • B runs into problems while readying for processing A’s message and throws an exception
  • … and more!
Observe, how the number of test-scenarios has increased (not necessarily associated with application’s functional correctness) simply because one intermediary has been introduced between A and B.

Case 2: A needs to interact with B and C
In this case, A needs to hear from B and C, before deciding what should it do with two responses before forming its own response. Remember A’s upstream callers; they are expecting something from A. This is rather obvious. All the points listed in the previous case, are now doubled and may be more. For example, the intermediary and B have been very quick and co-operative but C has been sluggish!

Case 3: A interacts with B and C, and C needs to tell D about this interaction
This is somewhat orthogonal to the caller->A->(B and C) flow. A’s callers are not interested about what information is shared with D but the application’s overall behaviour necessitates it.
Microservice D - mentioned in the arrangement above - poses an additional problem, when it comes to verification of application’s behaviour.
  • What if A successfully responds to its upstream callers - thereby giving an impression that everything is fine - and yet, C fails to inform D? Should we still assert that the application is behaving as expected?
  • How do we confirm if D has indeed received the correct piece of information from C? Please note that the flow allows C to tell D, but doesn’t mandate D to acknowledge to C. Therefore, inquiring with C will not give us the answer. We need a mechanism to inquire of D, if it has heard from C at all.


The effect of being asynchronous on testing

If you have been following me along, you must have agreed with me that the interactions between Microservices give rise to an increasing number of test-scenarios.
The important point is that entirely by itself, a Microservice is rather easy to test. It is supposed to be self-sufficient: it has its own data structures, components, databases and storage. We can possibly mock much of these and prove that its behaviour is in line with what is expected. Such a Microservice responds to messages from other Microservices. Even though, this brings in additional test-scaffolding (and associated complexity and effort), the situation is manageable. After all, there is a predictability here: send the Microservice a message and check how it reacts.

However, how do we deal with the asynchronous aspect of the interaction while testing: the cases where a message may or may not reach the Microservice and even if it does, then we don't know, by when? If there is a delay, is that acceptable? That is not all. As I have elucidated about (C telling D), the re is a sizeable jump in the cases, when two Microservices interact to complete a particular Use-case.

For example, if the number of messages from A to C is more than 50 in a minute, then D must be notified but D doesn’t need to acknowledge. If D is unavailable now, C will never know. If D is available after 2 minutes, C will never know either.
Think about the cases that we need to handle!

Summary

Microservices bring in a host of benefits for sure, but require us to be ready to pay for it as well. While deciding to go the Microservices way, It is very easy to overlook the complexities that we need to handle, only to be reminded of the price at a later stage of the projects, accompanied by unavoidable pain. It is obvious that we must plan for this. Usual ways of testing are going to be insufficient. Protocols must be well-thought. Designs should be ground up. Exposed functionality must take into account, operational demands and constraints. Tolerance for failure must be arrived at, following objective analysis of the overall behaviour of the application.

I have not touched upon the distributed nature of Microservice based applications. No, networks are not reliable! I have not touched the whole subject of pre-deployment and post-deployment testing. No, Dockers and Kubes don’t take away the basic fact that interaction between two temporally decoupled processes is fraught with uncertainty. These are topics of a follow-up blog, perhaps.


An usual synchronous testing approach doesn’t work with a bunch of Microservices meant to behave asynchronously. It is important that we bear this in mind.

(This blog is also posted here: I work as a Principal at Swanspeed Consulting)