This page describes a simple way (via perl) to have a graphical insight of the output of the R profiler.
Usually, the R profiler outputs numbers which can allow the user to assess which functions are slow and can be improved. See here for an example of the use of the Rprof function. What is missing from that output is the notion of what function calls what function which could then be useful to figure out why a given function is slow.
Obviously, the sharp R programmer can have a go at crunching the raw output of the profiler, generally Rprof.out to get that sort of information.
> Rprof(); example(glm) ; Rprof(NULL) > cat(readLines("Rprof.out"), sep = "\n") sample.interval=20000 "data.frame" "print" "eval.with.vis" "eval.with.vis" "source" "example" "linkinv" "glm.fit" "glm" "eval.with.vis" "eval.with.vis" "source" "example" "as.integer" "substr" "print.anova" "print" "source" "example" ".readRDS" "<Anonymous>" "eval.with.vis" "eval.with.vis" "source" "example" "inherits" "is.factor" "match" "%in%" "deparse" "eval" "match.arg" "sort.int" "sort.default" "sort" "symnum" "printCoefmat" "print.summary.glm" "print" "source" "example" "pmax" "formatC" "paste" "quantile.default" "quantile" "print.summary.glm" "print" "source" "example" ">" "switch" "residuals.glm" "residuals" "summary.glm" "summary" "eval.with.vis" "eval.with.vis" "source" "example"
So, as an example, if I want to know who “source” is calling, I could torture this file, but this is becoming really cryptic.
> rl <- strsplit( gsub("\"", "",readLines("Rprof.out")[-1]), " ") > funs <- sapply( rl, function(x) { x[ which(x == "source") - 1 ] } ) > table( funs ) funs eval.with.vis print 4 3
The files generated by the profiler already contains all information about what function calls what other function. We can almost use that information directly and supply it to dot from graphviz. As an example, a simple dot file generated from the first line of the Rprof.out file would look like this:
digraph{
graph [ rankdir = "LR"];
"example" -> "source" -> "eval.with.vis" -> "eval.with.vis" -> "print" -> "data.frame"
}
and can be processed by dot into several formats:
dot test.dot -Tsvg > test.svg
We just need to find something to convert the Rprof.out file into a suitable dot format. We could do that in R, but in order to increase the fun in here, let’s use a language that is built to play with text: perl. There is already a perl script shipped with R to crunch the Rprof.out file, so i just took my inspiration from there. The current version of the script can be found here and older versions under here.
Call this Rprof2dot and store it in the bin directory of your R installation 1). Now you can call this script 2) via the following command to generate the dot file.
[romain@fedora tmp]$ R CMD Rprof2dot Rprof.out
digraph {
graph [ rankdir = "LR"];
"source" [shape=rect,fontsize=6,label="source\n(7)"]
"example" [shape=rect,fontsize=6,label="example\n(7)"]
"eval.with.vis" [shape=rect,fontsize=6,label="eval.with.vis\n(8)"]
"example" -> "source" [label=7,fontsize=6]
}
You can now save the output in a dot file, or directly pipe it to the dot command:
[romain@fedora tmp]$ R CMD Rprof2dot Rprof.out > test.dot [romain@fedora tmp]$ R CMD Rprof2dot Rprof.out | dot -Tpng > test3.png
By default, the perl script leaves only the boxes that are called at least 5 times, but you can change that if you fancy by using the –cutoff flag. Let’s say that for this really simple example, we want to get everything, so we’ll remove all the functions called less than 0 times (cutoff = 0).
[romain@fedora tmp]$ R CMD Rprof2dot --cutoff=0 Rprof.out | dot -Tpng > test4.png
I’ve implemented so far three ways to restrict what data from the profiler’s output is used as input to produce the dot file:
Sometimes, you’d like some functions not to appear in the profiling results, specially as this graph gets big. For example, when using anonymous functions such as in a sapply call, the function will be reported as <Anonymous> without any way to distinguish between one anonymous function and another one. In that case, it is quite useful to just say “I don’t want the <Anonymous> function to be a part of the graphic”.
The script allows a –blacklist flag to indicate a file containing a set of undesirable functions or regular expressions. For example, this file:
eval.with.vis <Anonymous> sort\.[^"]* print[^"]*
will result in removing the functions eval.with.vis, <Anonymous>, all that match sort\.[^”] and all that match print[^”] from the file before generating the dot code. One can then use this file to clean the profiler output :
$ R CMD Rprof2dot --cutoff=0 Rprof.out --blacklist=blacklist | dot -Tpng > test5.png
Because of the way the regex is used in the script, using the dot can be dangerous because it would match the " that ends the word. Therefore i’ve used [^”] in here. I agree that this is not pretty, and i will try to find something better. Open for suggestions. One way is to use the non-greedy matching, something like: sort\..*? but it would be nice to find something else. Also, this does not allow to use $ or ^ to specify begin or end of the function name.
On the other end, when someone is only interested in a subset of functions that are on the profiler’s output 3), the script allows to declare a whitelist. The whitelist consists of a file where each line gives a regex that functions must match in order to be included in the graph.
As an example, we would like to profile the examples in the xyplot help page to know how the grid package is used by lattice (why not?). In that case, we would “whitelist” all the grid functions and print.trellis (which is the function heavily calling grid).
require(lattice) Rprof( ) example( xyplot, ask = F ) Rprof( NULL )
# list of functions in grid cut -f1 /usr/local/lib/R/library/grid/help/AnIndex | sed "s/\./\\\./g" | grep -v "-" > whitelist # manually add print.trellis ( and escaping the dot ) echo "print\\.trellis" >> whitelist # making the graph R CMD Rprof2dot Rprof.out --cutoff=0 --whitelist=whitelist | dot -Tpng > xyplot.png
The third way to restrict the data is a bit more complicated, but can be useful. It basically allows to keep only a subset of functions and:
An example restrict file is given by the following which means “keep the grob function and up to 3 functions before and after it”
grob,3,3
restrict can be combined with whitelist or blacklist. In that case, the blacklist or whitelist are applied before the restrict.
— Romain François 2007/06/30