Big data is a booming industry with an approximate market value of $274.3 billion. This has led to a surge in the popularity of Data Science and Data Analytics. Businesses everywhere are cashing in on new insights to optimize their decision-making processes. When it comes to working with data, R is one of the most popular languages for statistical modeling and analysis. However, as a high-level interpreted language, R isn’t best suited for speed. That’s where R compilers come in.
To fully understand why compiling is such a big deal, we’ll need to look at why people use R and what they use it for. Welcome to our guide to R compilers.
R, Interpreted or Compiled?
As we mentioned before, R is a high-level interpreted language. Programming languages are usually either interpreted or compiled. This may sound complicated, but they’re just different ways of converting source code (the human-readable instructions) into machine-readable code.
Interpreters convert code line-by-line at run time and execute it directly on the running machine. On the other hand, compilers scan the entire script and convert it to a bytecode object, which it then runs on a virtual machine or environment. While the processes may be more nuanced than this, some key differences between the two are highlighted below:So, which is better?
Ultimately, it depends on what you’re working on.
R is typically used for computationally intensive tasks like statistical modeling and mathematical operations on large data sets. In business, where time is money, every second counts.
Large operations on data are often repeated regularly, so speed can make a huge difference. For example, imagine running a complex formula on an excel sheet with millions of rows. Even with the best tech money can buy, it’s computationally expensive and will likely take a long time. A few minutes here or there may not seem like much, but if you’re running scripts multiple times per day, every day, it adds up to hours lost in productivity per year.
Another problem is the issue of velocity in Big Data. As data grows faster than computational speed, even the most efficient scripts may lag. As many data analysis tools work on streaming data captured in real-time, speed is very important. For this reason, R compilers are often used to gain a moderate speed boost.
Compiling in R
Since R’s 2.13.0 release, the R byte compiler has been a standard package in the R installation. This means that modern versions of R can be compiled. The R compiler has a few different options for compiling code, which we will look at below.
R Bytecode Compiler
The R Compiler package written by Luke Tierney is a bytecode converter. This means it scans R scripts, then converts source code to an intermediary format (bytecode), which it then executes from a virtual machine.
Explicit compiling
Explicit commands are those that developers need to state in their scripts to achieve the desired effect. In R, the bytecode compiler allows user-defined expressions, functions, and even whole scripts to be compiled explicitly using built-in functions.
This provides versatility through selective compiling.
Key functions for explicitly compiling your code in R include:
- compile(). This function is used to compile expressions in R. They are then run on the global virtual environment by default, but it is possible to select a specific virtual environment. The result of this function is a bytecode object, which can then be evaluated with the eval function.
- mpfun(). In R compiling, a closure or a user-defined function is done with cmpfun(). Like compile, it converts the closure to a bytecode object which can then be run on the virtual environment. The compiled object can then be used as a regular user-defined function.
- cmpfile(). An entire file can also be compiled using the cmpfile() function. This essentially compiles all the code within the file and creates a single bytecode object. To run the file, however, it needs to be loaded with the loadcmp() function.
For a more technical overview of these functions, visit the R Bytecode compiler documentation.
Implicit Compiling
Whereas explicit compiling requires users to state that they would like their code compiled, implicit compiling simply does it by default. This means that any code written in a file will be compiled, as long as the implicit compiler is active.
One option for implicit compiling is the Just-in-Time (JIT) compiler. JIT compilers are interesting compiling tools that offer modest speed gains for executing scripts. Rather than interpreting bytecode every time an explicit method is invoked, they compile bytecode into machine code which is then executed directly on the running machine instead of on a virtual machine.
Just-In-Time (JIT) Compiler
R also has a JIT compiler which can be enabled using the enableJIT() function. While explicit compile functions like compile() or cmpfile() are useful and provide versatility, it would be inefficient to compile every expression or closure in an R script. Using enableJIT(), users can apply implicit compiling of each line of code encountered by the compiler.
The enableJIT() function can also be set to compile at different levels:
- enableJIT(0): JIT compiling is disabled. User code will not be compiled.
- enableJIT(1): Closures are compiled before use.
- enableJIT(2): Closures and enclosed expressions are compiled before use.
- enableJIT(3): All closures, as well as top-level control loops, are compiled before execution.
By default, JIT is set to level 3 in R. This means that despite being an interpreted language, R is technically still compiled at runtime.
Can you run an R program online?
While the R installation typically comes with several options for compiling user-written code, there are also online compilers. These are web-hosted tools that allow users to compile and run code in their own maintained virtual environments. These can be used for collaborative programming or user benchmarking when it isn’t advantageous to do so locally.
R and more
It’s important to remember that despite its versatility and usefulness, R is not the fastest programming language around. Having said that, R compilers can help improve execution speed when used effectively. While R is still one of the more popular languages available for data analytics today, there are other options available. To learn more about the most popular programming languages in tech, read our article here.
About the author
Juan Pablo González
Working as Foreworth’s Chief Technical Officer, Juan Pablo (JP) manages the company’s technical strategy. With nearly 20 years of experience in software development, he ensures the development process at Foreworth is meeting its keys objectives and technical requirements.
More info →
What do you think? Leave us your comments here!