Github’s Scientist is a tool to create fitness functions for critical path refactorings in a project. It relies on the idea that for a large enough system, the behavior or data complexity makes it harder to refactor the critical paths only with the help of tests. If we can run the new path and old path in production in parallel without affecting the current behavior, and compare the results, then we can decide the best moment to switch to the new path more confidently.
I created a simple example to demonstrate the application of the concept on a java project.
Prologue
Some software companies don’t pay enough attention to the overall quality of their codebase. We might even say that this is a common pattern in the software business. The rationale for such a behavior is often justified by the claim that paying attention to “fast delivery to market” is far more important than code quality aspects of the product. Similarly, caring more about whether the functionality is there, is what matters for the business.
During the early stages of a project, this claim might have the (false) appearance of being true; your codebase has not grown that large yet, and you are delivering to your customers with “unbelievable” velocity. Since this the case, there is no point in caring about this technical nonsense. However, as time goes by, this kind of approach causes the technical debt to pile up. It slowly starts to cripple your market maneuvering capability, makes your code harder to change and degrades your developers’ motivation.
Excuses don’t help your software to stay healthy
This situation resembles the self-apologia of a person who consumes junk food and does no exercise to support her body because she thinks she lacks time. There is always the excuse of having some more important things to do for her. But, you know, it might be too late for her when she starts to face life paralyzing issues like vascular occlusion problems, heart failures, and lung capacity issues.
Nevertheless, some fact facer people are capable of confronting such future troubling practices in time. No matter the difficulty of overcoming the problems that are caused by a sloppy lifestyle, they show strong dedication to surmount them.
A goal without a proper strategy is just a goldfish in a bowl!
This strong dedication is also valid for some companies. As a consequence of different kinds of technical debts and code smells, their codebases might have been going south for a while. To name a few; deep coupling, outdated design, death star classes, and many others might be among them. However, they show a strong dedication to alter this situation.
Dedication is only the starting point though. Refactoring under such circumstances is a tedious, long-running job. Thereby, if you do not have a proper strategy; not long before the beginning, you start to feel your self like a goldfish in a bowl.
Implementing dependable test suites is a common next step to commence a refactoring strategy. The question is: Are the tests enough to refactor a considerably large, legacy codebase?
The idea behind Github’s Scientist
The idea is simple: For a large enough system, it’s hard to cover all cases by tests. In order to refactor safely, implement a way to run the candidate and the old code path against production data, but do not let the candidate path have any effects on the final result. Compare the results of each execution and record the differences. When you’re confident that there is no mismatch anymore, switch to the new implementation.
Java alternatives of Github’s Scientist
Github’s Scientist originally implemented as a Ruby library. The implementations for other languages are mostly done by independent developers. So for java, there are two alternatives currently: Scientist4J and Poptchev’s Kotlin based scientist. Since the original library is relatively small, it’s also possible to implement your version of “Github’s Scientist for Java”.
Example
As I mentioned in the previous section, two alternatives of Scientist for Java platform exist at the moment. We are going to use Scientist4J in this example.
The scenario of the example
Although the actual application scenarios are far more complicated, let’s use a hello-world scenario to represent the idea of Github’s Scientist.
Our application will expose a rest end-point to greet its consumers. The end-point will use a path parameter to get the name of the caller and pass it to its backing business service.
The backing business service interface is called “GreeterService“. In the beginning, “GreeterService” has only the old implementation “OldGreeterService“.
Because of ever-changing market situations, we will have to implement a new version of the “GreeterService”; the “NewGreeterService“.
The greeting feature is a critical one for our business, so we do not want to change it directly after the implementation. We would like to check if the new service behaves exactly like the old one for a while. Therefore, we are going to introduce an additional abstraction layer to use the “Scientist” for our Java environment.
You can check out the example code from here.
Implementation of the Scenario
We stated that we are going to use Scientist4J for this example. The setup of the tool is easy; Add the dependency to your pom file or “build.gradle” file.
We will expose a rest endpoint to serve our clients:
The backing service of our rest controller is the “GreeterService“. It returns “Hello <name>” for the provided name.
Our old version of the service:
“GreeterService” is defined in injection context as follows:
In order to simulate a random mismatch between new and old implementations, the “NewGreeterService” chooses a salutation word from an array of salutation words. This way we can demonstrate how Scientist works:
Introduce the Experiment
The idea was to run both of these implementations in the production environment, right? To achieve this, we are going to use a simplified version of the “Branch by Abstraction” pattern. Our abstraction flow is shown in the figure below.
Let’s implement the abstraction layer “ExperimentingGreeterService” so that we can make use of Scientist for Java. Here is our experimenting service implementation:
Note: An extended version of Scientist4J’s Experiment class is used in the “ExperimentingGreeterService”. This way we were able to enable service return value comparison for reporting.
The critical section of this class is the implementation of the “greet” method. The actual experimentation is created via Supplier instances, which are passed into the “run” method of the “Experiment” class. The instance of the “Experiment” class executes both paths and returns the result of the old one.
In Scientist4J, developers used Dropwizard metrics for reporting purposes. In order to make use of these metrics, we need to manually configure reporting. The “initReporter” and the “reportAndStop” methods do the trick for us.
I used “ConsoleReporter”, which was enough for this example. However, for a real-life scenario, of course, it’s better to redirect your comparison results to your monitoring tools.
Run the experiment and get the reports
We are almost ready to see the Scientist for Java in action. The remaining last step is to configure our “ExperimentingGreeterService” as the “GreeterService” provider. Update the “ServiceConfigurer” class as follows:
Now we can run the application and see the results:
java -jar simple-scientist-example-1.0-SNAPSHOT.jar net.entrofi.examples.refactoring.scientist.ScientistExampleApplication curl http://localhost:8080/greet/comak
On top of the results, we see the gauge for the greet method call mismatch. The rest shows the Counters and Timers for the candidate and control calls.