Big data is a booming industry with an approximate market value of $274.3 billion. This has led to a surge in the popularity of Data Science and Data Analytics. Businesses everywhere are cashing in on new insights to optimize their decision-making processes. When it comes to working with data, R is one of the most popular languages for statistical modeling and analysis. However, as a high-level interpreted language, R isn’t best suited for speed. That’s where R compilers come in.
To fully understand why compiling is such a big deal, we’ll need to look at why people use R and what they use it for. Welcome to our guide to R compilers.
As we mentioned before, R is a high-level interpreted language. Programming languages are usually either interpreted or compiled. This may sound complicated, but they’re just different ways of converting source code (the human-readable instructions) into machine-readable code.
Interpreters convert code line-by-line at run time and execute it directly on the running machine. On the other hand, compilers scan the entire script and convert it to a bytecode object, which it then runs on a virtual machine or environment. While the processes may be more nuanced than this, some key differences between the two are highlighted below:
Ultimately, it depends on what you’re working on.
R is typically used for computationally intensive tasks like statistical modeling and mathematical operations on large data sets. In business, where time is money, every second counts.
Large operations on data are often repeated regularly, so speed can make a huge difference. For example, imagine running a complex formula on an excel sheet with millions of rows. Even with the best tech money can buy, it’s computationally expensive and will likely take a long time. A few minutes here or there may not seem like much, but if you’re running scripts multiple times per day, every day, it adds up to hours lost in productivity per year.
Another problem is the issue of velocity in Big Data. As data grows faster than computational speed, even the most efficient scripts may lag. As many data analysis tools work on streaming data captured in real-time, speed is very important. For this reason, R compilers are often used to gain a moderate speed boost.
Since R’s 2.13.0 release, the R byte compiler has been a standard package in the R installation. This means that modern versions of R can be compiled. The R compiler has a few different options for compiling code, which we will look at below.
The R Compiler package written by Luke Tierney is a bytecode converter. This means it scans R scripts, then converts source code to an intermediary format (bytecode), which it then executes from a virtual machine.
Explicit commands are those that developers need to state in their scripts to achieve the desired effect. In R, the bytecode compiler allows user-defined expressions, functions, and even whole scripts to be compiled explicitly using built-in functions.
This provides versatility through selective compiling.
Key functions for explicitly compiling your code in R include:
For a more technical overview of these functions, visit the R Bytecode compiler documentation.
Whereas explicit compiling requires users to state that they would like their code compiled, implicit compiling simply does it by default. This means that any code written in a file will be compiled, as long as the implicit compiler is active.
One option for implicit compiling is the Just-in-Time (JIT) compiler. JIT compilers are interesting compiling tools that offer modest speed gains for executing scripts. Rather than interpreting bytecode every time an explicit method is invoked, they compile bytecode into machine code which is then executed directly on the running machine instead of on a virtual machine.
R also has a JIT compiler which can be enabled using the enableJIT() function. While explicit compile functions like compile() or cmpfile() are useful and provide versatility, it would be inefficient to compile every expression or closure in an R script. Using enableJIT(), users can apply implicit compiling of each line of code encountered by the compiler.
The enableJIT() function can also be set to compile at different levels:
By default, JIT is set to level 3 in R. This means that despite being an interpreted language, R is technically still compiled at runtime.
While the R installation typically comes with several options for compiling user-written code, there are also online compilers. These are web-hosted tools that allow users to compile and run code in their own maintained virtual environments. These can be used for collaborative programming or user benchmarking when it isn’t advantageous to do so locally.
It’s important to remember that despite its versatility and usefulness, R is not the fastest programming language around. Having said that, R compilers can help improve execution speed when used effectively. While R is still one of the more popular languages available for data analytics today, there are other options available. To learn more about the most popular programming languages in tech, read our article here.