Disclaimer: The content and opinions expressed in this article are solely my own and do not express the views or opinions of my current or past employers
For those who cannot be bothered reading the whole article and just want the short “executive summary”, here is the moral of the story:
Mixing programming languages can sometimes be beneficial depending on the situation. But if one can separate the components with different languages and containerize them (e.g. with docker) then that would make life a lot easier. Because mixing languages often comes with extra complexity, it makes your application functions harder to test or debug. It is not impossible to handle, but one really should consider whether it is worth it before going down that route.
If you did not know already, there will most certainly be packages in your favourite language that can call and interface with another language, which actually is very convenient if you want to use some readily available functions from another language. Some examples are:
These packages aim to provide seamless transition between the languages, and most of the time that is the case. But as each language is different in its own way, and with languages continuously being updated or patched, we cannot rule out that there will be situations where the interlanguage package does not work the way we expect it to. So even if a function in one language (e.g. python) works the way we expect it to, when calling this function via another language (e.g. julia), there is no harm (apart from taking a bit more time) to write equivalence tests just to check that it works the same way.
This brings us to our case study on how Light Gradient Boosting Machine (LightGBM) is called using python, or julia. LightGBM is a framework for machine learning made by Microsoft, which is free and open source. There is a python package (maintained by Microsoft on GitHub) and a julia package (currently maintained by IQVIA-ML on GitHub) for using this framework. Before you ask, there should not be any significant advantage or disadvantage in terms of computing resource or time between running LightGBM in julia or python, as both packages interact directly with the C API of LightGBM.
All the reference code is captured in two jupyter notebooks which can be found in this github repo. (Side note: did you know, the origin of the project name Jupyter is a reference to the three core programming languages, Julia, Python and R?).
In an ideal world, one would expect calling LightGBM via julia or python to be equivalent. So when we use julia to call python (to then call LightGBM), if we assume everything translates seamlessly, one should expect equivalency as well. But if this is not tested, how will we know?
So, here is how we tested it (see the github repo for detailed code):
When we compared the running LightGBM in just python or just julia, the output boosters looked nearly identical, which is what we would expect. However, what was interesting is when we used julia to call the python lightgbm package to run the framework, the output booster was different to the previous two runs, including the details of each tree and the feature importance.
Let me be clear! I am not trying to take criticise PyCall or LightGBM, as there can always be edge cases (like this very case…I mean, how many people would call LightGBM this way?) that are not necessarily well tested!
Now, we could try fixing what went wrong with julia’s PyCall + LightGBM combination run, but this would most certainly be going down a rabbit hole. One might find a solution, but do we really need this? Or maybe I made a mistake when setting up this julia -> PyCall -> python-lightgbm call (hopefully not!)! Even so, this extra complexity of calling multiple languages/frameworks does not help when we want to find out what is going on. And if a native package works, why not just use it?
This might be a naive case study, but hopefully it illustrates why I always prefer to keep things simple. Of course there might be situations where we cannot avoid mixing languages when developing. In those cases, it would be much safer to at least write some equivalence tests so we know it is doing what we expect it to do!
Many thanks to Yaqub Alwan and Dinesh Vatvani for the insightful discussions and for proofreading this article.