Why I use Julia
Come for the speed. Stay for the productivity.
When I am asked why I use Julia, my immediate response is “multiple dispatch”. Julia is well-known for performance, but that is only a part of what keeps me using it every day. Multiple dispatch is a feature where different code is called by a function depending on the types of the arguments. Combined with the JIT (Just-in-time compiler), Julia will automatically compile specialized code for each set of argument types the function is called with.
A simple example
As an example, consider the following simple function:
f(x) = x + x
I have not given Julia any hint as to what the type of the argument is, but as long as it’s a type that supports addition, Julia will compile an optimized method for it. You can peek at the LLVM compiler code with the @code_llvm
macro.
julia> @code_llvm f(1); @ REPL[5]:1 within `f’
define i64 @julia_f_12805(i64) {
top:
; ┌ @ int.jl:53 within `+’
%1 = shl i64 %0, 1
; └
ret i64 %1
}julia> @code_llvm f(1.0); @ REPL[5]:1 within `f’
define double @julia_f_12813(double) {
top:
; ┌ @ float.jl:395 within `+’
%1 = fadd double %0, %0
; └
ret double %1
}
Ignoring some of the details, notice that one method is specific to 64-bit integers and the other for double precision floats!
What about more complicated types?
Here we will use the Distributions package to implement a naive quantile finder using Newton’s Method. Thanks to the folks at repl.it, you can run the example yourself below! It will take a moment to install the Distributions package.
Newton’s method is an iterative procedure that finds the root of a function.
For quantiles, we are trying to find the number x
, for a given number q
(between 0 and 1), such that:
cdf(dist, x) - q = 0
where cdf
is the cumulative distribution function for distribution dist
. We also need the derivative of the cdf
, which is the probability density function, or pdf
.
Again, we have not told Julia anything about what the function arguments are, but when you call myquantile
with Distributions.Normal(0, 1)
and 0.5
, Julia will compile specialized code to run the algorithm and then return the median (0.5 quantile) for a standard normal distribution (which is 0).
Notice that myquantile
works out of the box with other distributions! In fact, as long as the function arguments have methods for mean
, cdf
, and pdf
, it will just work! If you were to implement this quantile algorithm in R, you would need to rewrite it for each distribution using the dnorm/pnorm
family of functions.
Takeaway
The language you use has a tremendous effect on how you approach problems (see linguistic relativity). I have a background in statistics, so naturally R was one of the first languages I learned. I don’t mean to bash R (language wars are boring) as it is a fantastic tool for data analysis, but I often find myself asking “how do I solve this without a for loop?” since loops are slow in R. In Julia, I have fewer performance obstacles, so my questions are more along the lines of “what are the methods I’m trying to accomplish this task with?”. If I can reduce a task to the operations that need to be performed, it becomes easy to write abstract yet performant code that works with whatever types I throw at it.
Multiple dispatch has become invaluable to how I code, and with Julia you get it along with stellar performance. If you want to see how I use multiple dispatch to get a lot done with very little code, check out my package OnlineStats.jl for calculating statistics/models on data streams with single-pass algorithms.
I hope you try Julia for yourself and have the same experience I had.
Come for the speed. Stay for the productivity.
About the Author
Josh has a PhD in Statistics from NC State University and works as a Senior Research Scientist at Julia Computing. He operates the Julia For Data Science newsletter and maintains many open source packages, available on GitHub.