Index

This manual explains the R syntax for people familiar with Stata.
NOTE: the examples only demonstrate R's syntax without considering specifically how the data should be analysed to take into account its particular properties.

about ameans anova append
bootstrap browse bsample by bysort
cd centile clear clogit cls codebook collapse correlate count
describe dotplot drop
egen estimates_store
findit
generate graph_bar graph_box graph_matrix
help histogram
if in
kappa keep ktau
levelsof list logit logistic lowess lrtest
mean merge mhodds mvdecode mvenecode
predict
qnorm
ranksum recode regress rename rvfplot replace reshape
sample save scatter signrank sort spearman summarize
tab1 table tabstat tabulate ttest twoway_lfit
xi xtile

Example dataset airquality

Data represent daily air quality measurements in New York, May to September 1973. The variables Ozone and Solar.R have several missing values.

NOTE: the examples only demonstrate R's syntax without considering specifically how the data should be analysed to take into account its particular properties.

Table 1: Example dataset airquality
No Var Type Description
1 Ozone number Ozone (ppb)
2 Solar.R number Solar radiation (lang)
3 Wind number Wind speed (mph)
4 Temp number Temperature (°F)
5 Month number Month (1--12)
6 Day number Day of month (1--31)



Return to index

Example dataset infert

Matched case-control study (ratio 1:2) on infertility after spontaneous and induced abortion. One case with two prior spontaneous abortions and two prior induced abortions is omitted. The first variable education is a categorical variable - called a factor in R.

NOTE: the examples only demonstrate R's syntax without considering specifically how the data should be analysed to take into account its particular properties.

Table 2: Example dataset infert
No Var Type Description
1 education category Years of eduction 0:<6; 1:6-11; 2:12+
2 age number Age (years)
3 parity number Parity
4 induced number Numebr of prior induced abortions
5 case binary Case (1) or control (0)
6 spontaneous number Numebr of prior spontaneous abortions
7 stratum number Matching ID
8 pooled.stratum number Pooled matching ID


Return to index

about

The equivalent function in R is sessionInfo()


Return to index

ameans

The basic function for the arithmetic mean is mean. Please note that this function will return NA if any value is missing. To ignore missing values add the argument na.rm = TRUE R has no inbuild function for calculation the geometric mean. However, it is easy to define a function. Keep in mind: The geometric mean is 0 if any of the values is 0. In case of 0's a constant is often added to each value (usually 1).
The mean for all variables is easily obtained with summary.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFNUQVRBIGFtZWFucyBTb2xhci5SXG4jIGFyaXRobWV0aWMgbWVhblxubWVhbihhaXJxdWFsaXR5JFNvbGFyLlIpXG4jIE5PVEU6IG1lYW4gd2lsbCByZXR1cm4gTkEgaWYgYW55IHZhbHVlIGlzIG1pc3NpbmdcbiMgICAgICAgQXJndW1lbnQgJ25hLnJtPVQnIGlnbm9yZXMgbWlzc2luZyB2YWx1ZXMgXG5tZWFuKGFpcnF1YWxpdHkkU29sYXIuUiwgbmEucm09VClcbiMgZ2VvbWV0cmljIG1lYW4gKG5vIG1pc3NpbmcpXG5nZW9tZWFuIDwtIGZ1bmN0aW9uKHgpIGV4cChzdW0obG9nKHgpKS9sZW5ndGgoeCkpXG5nZW9tZWFuKGFpcnF1YWxpdHkkVGVtcClcbiMgU1RBVEEgYW1lYW5zIFRlbXAsIGFkZCgxKVxuZ2VvYWRkMSA8LSBmdW5jdGlvbih4KSBleHAoc3VtKGxvZyh4KzEpKS9sZW5ndGgoeCkpXG5nZW9hZGQxKGFpcnF1YWxpdHkkVGVtcCkifQ==


Return to index

anova

R interprets ANOVA as a special case of linear regression. The function which shows the Analysis of Variance table is anova which is called on a regression object.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIEJpbmFyeSBwcmVkaWN0b3JcbiMgU1RBVEEgYW5vdmEgYWdlIGNhc2UgIFxuYW5vdmEobG0oYWdlIH4gY2FzZSwgZGF0YT1pbmZlcnQpKVxuIyBDYXRlZ29yaWNhbCBwcmVkaWN0b3IgLSBjYXRlZ29yaWNhbCAoZmFjdG9yKSB2YXJpYWJsZSBpbiBSXG4jIFNUQVRBIHhpOiBhbm92YSBhZ2UgaS5lZHVjYXRpb24gXG5hbm92YShsbShhZ2UgfiBlZHVjYXRpb24sIGRhdGE9aW5mZXJ0KSlcbiMgQ2F0ZWdvcmljYWwgcHJlZGljdG9yIC0gbnVtZXJpY2FsIHZhcmlhYmxlIGluIFJcbiMgU1RBVEEgeGk6IGFub3ZhIGFnZSBpLmluZHVjZWQgXG5hbm92YShsbShhZ2UgfiBmYWN0b3IoaW5kdWNlZCksIGRhdGE9aW5mZXJ0KSkifQ==


Return to index

append

To append two datasets, R has the function rbind (for 'row bind'). Note: Both datasets must have the same number of variables with identical variable names. However, the order of variables may be different.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIE1ha2UgYSBjb3B5IG9mIGRhdGFzZXQgaW5mZXJ0XG5pbmYuY29weSA8LSBpbmZlcnRcbiMgQXBwZW5kIGJvdGggZGF0YXNldHMgXG4jIFNUQVRBIGFwcGVuZCB1c2luZyAuLi4gIFxubmV3LmRhdGEgPC0gcmJpbmQoaW5mZXJ0LCBpbmYuY29weSlcbm5yb3coaW5mZXJ0KVxubnJvdyhuZXcuZGF0YSkifQ==


Return to index

bootstrap

A bootstrap algorithm is implemented in the package boot. See centile for an example how to estimate confidence intervals for a quantile.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIDk1JSBDSSBvZiB0aGUgbWVhbiB2aWEgYm9vdHN0cmFwXG4jIFNUQVRBIGJvb3RzdHJhcCByKG1lYW4pLCByZXBzKDEwMDApOiBzdW1tYXJpemUgVGVtcFxubGlicmFyeShib290KVxubWVhbi5ib290IDwtIGJvb3QoZGF0YSA9IGFpcnF1YWxpdHkkVGVtcCwgc3RhdGlzdGljID0gZnVuY3Rpb24oeCwgaSkgbWVhbih4W2ldKSwgUiA9IDEwMDApXG5tZWFuLmJvb3QgXG5ib290LmNpKG1lYW4uYm9vdCkifQ==


Return to index

browse

In R: View.


View(infert)

Return to index

bsample

For Stata's (bootstrap) sample with replacement use sample with argument replace = TRUE.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFNhbXBsaW5nIHRoZSBudW1iZXJzIGZyb20gMSB0byAyMCB3aXRoIHJlcGxhY2VtZW50XG5zYW1wbGUoMToyMCwgcmVwbGFjZSA9IFQpXG4jIC4uLiBzb3J0ZWQ6XG5zb3J0KCBzYW1wbGUoMToyMCwgcmVwbGFjZSA9IFQpKVxuIyBTVEFUQSBic2FtcGxlIFRlbXBcbnNhbXBsZShhaXJxdWFsaXR5JFRlbXAsIHJlcGxhY2UgPSBUKSJ9


Return to index

by bysort table

R does not require the dataset to be sorted. Therefore, bysort is not needed in R. The syntax of R's equivalent function is by(values, bygroups, function). R's by is also a useful alternative to Stata's table.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFNUQVRBIGJ5IE1vbnRoOiBtZWFuIFNvbGFyLlJcbmJ5KGFpcnF1YWxpdHkkU29sYXIuUiwgYWlycXVhbGl0eSRNb250aCwgbWVhbilcblxuIyBOT1RFOiBtZWFuIHdpbGwgcmV0dXJuIE5BIGlmIGFueSB2YWx1ZSBpcyBtaXNzaW5nXG4jICAgICAgIEFyZ3VtZW50ICduYS5ybT1UJyBpZ25vcmVzIG1pc3NpbmcgdmFsdWVzIFxuYnkoYWlycXVhbGl0eSRTb2xhci5SLCBhaXJxdWFsaXR5JE1vbnRoLCBtZWFuLCBuYS5ybT1UKVxuXG4jIFNUQVRBIHRhYmxlIE1vbnRoLCAgYyhtaW4gU29sYXIuUiBtZWFuIG1wZyBtZWRpYW4gU29sYXIuUiBtYXggU29sYXIuUik6IG1lYW4gU29sYXIuUlxuYnkoYWlycXVhbGl0eSRTb2xhci5SLCBhaXJxdWFsaXR5JE1vbnRoLCBzdW1tYXJ5KSJ9


Return to index

cd

R's function to change the directory is setwd (for 'set working directory'). Note: R understands only forward slash! If the path contains back slashes (typically for Windows) you have to convert them to forward slashes.
Examples:

setwd("C:/Users/KM/Documents/R/")



Return to index

centile

Percentiles of a single variable are provided by quantile. Please note that this function will return NA if any value is missing. To ignore missing values add the argument na.rm = TRUE. The median, 1st quantile and 3rd quantile for all variables is easily obtained with summary. The confidence interval of the pseudo median can be obtained with wilcox.test. For confidence interval of other percentiles bootstrap resampling is recommended.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFNUQVRBIGNlbnRpbGVcbnN1bW1hcnkoYWlycXVhbGl0eSlcbiMgb2J0YWluIDk1JUNJIG9mIHRoZSBtZWRpYW5cbiMgU1RBVEEgY2VudGlsZSBUZW1wXG53aWxjb3gudGVzdChhaXJxdWFsaXR5JFRlbXAsIGNvbmYuaW50ID0gVClcbiMgU1RBVEEgY2VudGlsZSBPem9uZSwgY2VudGlsZSgxMCwgNTAsIDkwKSBcbnF1YW50aWxlKGFpcnF1YWxpdHkkT3pvbmUsIHByb2JzPWMoMC4xLCAwLjUsIDAuOSksIG5hLnJtPVQpXG4jIDk1JSBDSSBvZiB0aGUgMXN0IHF1YW50aWxlIHZpYSBib290c3RyYXBcbmxpYnJhcnkoYm9vdClcbnExLmJvb3QgPC0gYm9vdChkYXRhID0gYWlycXVhbGl0eSRUZW1wLCBzdGF0aXN0aWMgPSBmdW5jdGlvbih4LCBpKSBxdWFudGlsZSh4W2ldLCBwcm9icz0uMjUpLCBSID0gMTAwMClcbnExLmJvb3QgXG5ib290LmNpKHExLmJvb3QpIn0=


Return to index

clear cls

To remove a certain dataset (or any R object) from memory use rm(mydata). To remove all objects use rm(list=ls()). To clear the text in the console press [CTRL] + [L].



Return to index

clogit

Package survival provides an equivalent function for conditional logistic regression. The model formula is of the form case.status ~ exposure + strata(matched.set).

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFNUQVRBIGNsb2dpdCBjYXNlIHNwb250YW5lb3VzIGluZHVjZWQsIGdyb3VwKHN0cmF0dW0pXG5saWJyYXJ5KHN1cnZpdmFsKVxuY2xvZyA8LSBjbG9naXQoY2FzZSB+IHNwb250YW5lb3VzICsgaW5kdWNlZCArIHN0cmF0YShzdHJhdHVtKSwgZGF0YT1pbmZlcnQpXG5zdW1tYXJ5KGNsb2cpIn0=


Return to index

codebook summarize

In R summary provides many information on all or a few variables. For percentiles see centile.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFNUQVRBIHN1bW1hcml6ZVxuc3VtbWFyeShhaXJxdWFsaXR5KVxuXG4jIE51bWJlciBvZiB1bmlxdWUgdmFsdWVzIG9mIERheVxuIyBTVEFUQSBjb2RlYm9vayBEYXlcbnNvcnQodW5pcXVlKGFpcnF1YWxpdHkkRGF5KSlcbmxlbmd0aCh1bmlxdWUoYWlycXVhbGl0eSREYXkpKSJ9


Return to index

collapse tabstat

Package dplyr provides several functions to generate summary statistics for numeric variables.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFNUQVRBIHRhYnN0YXQgVGVtcCBPem9uZSwgYnkoTW9udGgpIHN0YXQobWVhbiBtZWRpYW4gbilcbmxpYnJhcnkoZHBseXIpXG5haXJxdWFsaXR5ICU+JVxuICBncm91cF9ieShNb250aCkgJT4lXG4gIHN1bW1hcml6ZShtZWFuLlRlID0gbWVhbihUZW1wLCBuYS5ybSA9IFRSVUUpLFxuICAgICAgICAgICAgbWVkaWFuLlRlID0gbWVkaWFuKFRlbXAsIG5hLnJtID0gVFJVRSksXG4gICAgICAgICAgICBtZWFuLk96ID0gbWVhbihPem9uZSwgbmEucm0gPSBUUlVFKSxcbiAgICAgICAgICAgIG1lZGlhbi5PeiA9IG1lYW4oT3pvbmUsIG5hLnJtID0gVFJVRSksXG4gICAgICAgICAgICBuID1uKCkpIn0=


Return to index

correlate ktau spearman

Use cor in R.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFNUQVRBIGNvcnJlbGF0ZSBfYWxsXG5jb3IoYWlycXVhbGl0eSwgIHVzZSA9IFwicGFpcndpc2UuY29tcGxldGUub2JzXCIpXG5cbiMgU1RBVEEgc3BlYXJtYW4gX2FsbFxuY29yKGFpcnF1YWxpdHksICB1c2UgPSBcInBhaXJ3aXNlLmNvbXBsZXRlLm9ic1wiLCBtZXRob2QgPSBcInNwZWFybWFuXCIpXG5cbiMgU1RBVEEga3RhdSBfYWxsXG5jb3IoYWlycXVhbGl0eSwgIHVzZSA9IFwicGFpcndpc2UuY29tcGxldGUub2JzXCIsIG1ldGhvZCA9IFwia2VuZGFsbFwiKSJ9


Return to index

count

The number of records or the number fullfilling a certain conditions can be determined with nrow.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFNUQVRBIGNvdW50XG5ucm93KGFpcnF1YWxpdHkpXG4jIFNUQVRBIGNvdW50IGlmIG1vbnRoIDwgOVxubnJvdyhhaXJxdWFsaXR5W2FpcnF1YWxpdHkkTW9udGggPCA4LCBdKVxuI3dpdGggc3Vic2V0XG5ucm93KHN1YnNldChhaXJxdWFsaXR5LCBNb250aCA8IDgpKVxuI3dpdGggcGFja2FnZSBkcGx5clxubGlicmFyeShkcGx5cilcbmZpbHRlcihhaXJxdWFsaXR5LCBNb250aCA8IDgpICU+JSBucm93KCkifQ==


Return to index

describe

str (for 'structure') provides a basic description of a dataset.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFNUQVRBIGRlc2NyaWJlXG5zdHIoYWlycXVhbGl0eSkifQ==


Return to index

dotplot

dotchart provides a basic desription of a dataset.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFNUQVRBIFRlbXAsIG92ZXIoTW9udGgpXG5kb3RjaGFydChhaXJxdWFsaXR5JFRlbXAsIGdyb3VwPWFpcnF1YWxpdHkkTW9udGgpIn0=



Return to index

drop keep

The easiest way is to use a conditional index in square brackets to keep some variables. For drop use the function subset.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFNUQVRBIGtlZXAgTW9udGggRGF5XG5rZWVwLmRhdGEgPC0gYWlycXVhbGl0eVssIGMoXCJNb250aFwiLCBcIkRheVwiKV1cbmhlYWQoa2VlcC5kYXRhKVxuIyBTVEFUQSBkcm9wIE1vbnRoIERheVxuZHJvcC5kYXRhIDwtIHN1YnNldChhaXJxdWFsaXR5LCBzZWxlY3QgPSAtYyhNb250aCwgRGF5KSlcbmhlYWQoZHJvcC5kYXRhKSJ9



Return to index

egen

Stata's egen command is extremely felxible and powerful. Some typical examples are provided below.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFNUQVRBIGVnZW4gcmFua2FnZSA9IHJhbmsoYWdlKVxuaW5mZXJ0JHJhbmthZ2UgPC0gcmFuayhpbmZlcnQkYWdlKVxuIyBTVEFUQSBlZ2VuIE5hYm9ydCA9IHJvd3RvdGFsKHNwb250YW5lb3VzIGluZHVjZWQpXG5pbmZlcnQkTmFib3J0IDwtIHJvd1N1bXMoaW5mZXJ0WywgYyhcInNwb250YW5lb3VzXCIsXCJpbmR1Y2VkXCIpXSwgbmEucm0gPSBUKVxuIyBTVEFUQSBlZ2VuIGV4Y2x1ZGVkID0gcm93bWlzcyhfYWxsKVxuYWlycXVhbGl0eSRleGNsdWRlZCA8LSByb3dTdW1zKGlzLm5hKGFpcnF1YWxpdHkpKVxudGFibGUoYWlycXVhbGl0eSRleGNsdWRlZClcblxuI1NUQVRBIGJ5c29ydCBjYXNlOiBlZ2VuICBtZWFuYWdlID0gbWVhbihhZ2UpPGJyPiBcbmxpYnJhcnkoZHBseXIpIFxuYWlycXVhbGl0eSA8LSBhaXJxdWFsaXR5ICU+JSAgXG4gICAgICAgIGdyb3VwX2J5KE1vbnRoKSAlPiUgXG4gICAgICAgIG11dGF0ZShtVGVtcCA9IG1lYW4oVGVtcCwgbmEucm09VCkpIFxudGFibGUoYWlycXVhbGl0eSRtVGVtcCkifQ==


Return to index

estimates store lrtest

Results from statistical models can be stroired by simply assign them to a new R object mymodel <- lm(a ~ b, data = c). Comparing 2 models with the likelihood ratio test use anova.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFNUQVRBIHJlZ3Jlc3MgT3pvbmUgV2luZCBUZW1wXG4jICAgICAgIGVzdGltYXRlcyBzdG9yZSBmdWxsbW9kXG4jICAgICAgIHJlZ3Jlc3MgT3pvbmVcbiMgICAgICAgZXN0aW1hdGVzIHN0b3JlIG51bGxtb2RcbiMgICAgICAgbHJ0ZXN0IGZ1bGxtb2QgbnVsbG1vZFxuZnVsbG1vZCA8LSBsbShPem9uZSB+IFdpbmQgKyBUZW1wLCBkYXRhPWFpcnF1YWxpdHkpXG5udWxsbW9kIDwtIGxtKE96b25lIH4gMSwgZGF0YT1haXJxdWFsaXR5KVxuYW5vdmEoZnVsbG1vZCwgbnVsbG1vZCkifQ==


Return to index

generate

A new variable in R is simply generated by assigning new values to mydata$newvarname

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFNUQVRBIGdlbnJhdGUgQ2Vsc2l1cyA9IChUZW1wIC0gMzIpICogNS85XG5haXJxdWFsaXR5JENlbHNpdXMgPC0gKGFpcnF1YWxpdHkkVGVtcCAtIDMyKSAqIDUvOVxuc3VtbWFyeShhaXJxdWFsaXR5JENlbHNpdXMpIn0=


Return to index

graph bar

R's barplot function can be used with a numbers or a matrix. To generate the latter one prop.table may be useful.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFNUQVRBIGdyYXBoIGJhciAobWVhbikgT3pvbmUsIG92ZXIoTW9udGgpXG5Pei5tZWFuIDwtIGJ5KGFpcnF1YWxpdHkkT3pvbmUsIGFpcnF1YWxpdHkkTW9udGgsIG1lYW4sIG5hLnJtID0gVClcbmJhcnBsb3QoT3oubWVhbilcblxuIyBTVEFUQSdzIHVzZXIgd3JpdHRlbiBjb21tYW5kIGNhdHBsb3RcbmJhcnBsb3QocHJvcC50YWJsZSh0YWJsZShpbmZlcnQkcGFyaXR5LCBpbmZlcnQkZWR1Y2F0aW9uKSwgbWFyZ2luPTIpKSJ9


Return to index

graph box

In R: boxplot.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFNUQVRBIGdyYXBoIGJveCBPem9uZSwgb3ZlcihNb250aClcbmJveHBsb3QoT3pvbmUgfiBNb250aCwgZGF0YT1haXJxdWFsaXR5KSJ9


Return to index

graph matrix

plot(x) is a generic function. If x is a dataset, R will produce a scatterplot matrix. pairs is an alternative function which offers more flexibility.

Examples:

plot(airquality)
pairs(airquality, upper.panel = NULL, col=airquality$Month)



>Return to index

help findit

The basic function to access the help documentation is simply a ? followed by the function. A search in the documentation can be done with ?? or help.search
Examples:

?airquality
??lm
help.search("linear models")



Return to index

histogram

In R: hist.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFNUQVRBIGhpc3QgT3pvbmUsIHN0YXJ0KDApIHdpZHRoKDEwKVxuaGlzdChhaXJxdWFsaXR5JE96b25lLCBicmVha3M9c2VxKDAsMTcwLDEwKSkifQ==


Return to index

if

We refer here to Stata's command if expression. Not to the programming command.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFNUQVRBIHN1bW1hcml6ZSBpZiBNb250aCA9PSA2XG5zdW1tYXJ5KGFpcnF1YWxpdHlbYWlycXVhbGl0eSRPem9uZSA9PSA2LCBdKVxuIyBTVEFUQSBzdW1tYXJpemUgV2luZCBpZiBNb250aCA9PSA2XG5zdW1tYXJ5KGFpcnF1YWxpdHkkV2luZFthaXJxdWFsaXR5JE96b25lID09IDZdKVxuIyBTVEFUQSBzdW1tYXJpemUgaWYgT3pvbmUgPT0gLiBcbnN1bW1hcnkoYWlycXVhbGl0eVtpcy5uYShhaXJxdWFsaXR5JE96b25lKSwgXSlcbiMgU1RBVEEgc3VtbWFyaXplIGlmIE96b25lIDw9IDMwXG4jIGluIFIgYmV0dGVyICVpbiUgaW5zdGVhZCBvZiA9PSBiZWNhdXNlIG9mIE5BXG5zdW1tYXJ5KGFpcnF1YWxpdHlbYWlycXVhbGl0eSRPem9uZSAlaW4lIDA6MzAsIF0pIn0=



Return to index

in

In R usually done with square brackets.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFNUQVRBIHN1bW1hcml6ZSBpbiAxLzEwIE1vbnRoID09IDZcbnN1bW1hcnkoYWlycXVhbGl0eVsxOjEwLCBdKSJ9



Return to index

kappa

The basic R statistic package has no fucntion to calculate the Kappa statistic. The packages psych and irr provide functions to calculate Kappa and various other types of rater agreement. Note: the interpretation of Kappa is not as straight forward as widely assumed. See https://en.wikipedia.org/wiki/Cohen%27s_kappa#Limitations



Return to index

levelsof

In R simply: levels.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFNUQVRBIGxldmVsc29mIGVkdWNhdGlvbiAgXG5sZXZlbHMoaW5mZXJ0JGVkdWNhdGlvbikifQ==


Return to index

list

To print record numbers 4 to 6 use mydata[4:6, ]. To print records which fulfill a certain condition use mydata[mycondition, ].

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFNUQVRBIGxpc3QgaW4gNC82XG5haXJxdWFsaXR5WzQ6NiwgXVxuIyBTVEFUQSBsaXN0IG9mIE96b25lID09IDBcbmFpcnF1YWxpdHlbYWlycXVhbGl0eSRTb2xhci5SIDwgMTAsIF1cbiMgdG8gZ2V0IHJpZCBvZiBtaXNzaW5nOlxuYWlycXVhbGl0eVthaXJxdWFsaXR5JFNvbGFyLlIgICVpbiUgMDoxMCwgXSJ9


Return to index

logistic logit

In R, the logistic regression is considered a special case of a broader group of statistical models known as generelaized linear models. The function to fit those models is glm (for 'generalized linear model').

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIGdlbmVyYXRlIGJpbmFyeSB2YXJpYWJsZSBhYm9ydCA9IFRSVUUgaWYgaW5kdWNlZCA+IDAgb3Igc3BvbnRhbmVvdXMgPiAwICAgXG4jIFNUQVRBIGdlbmVyYXRlIGFib3J0ID0gaW5kdWNlZCA+IDAgfCBzcG9udGFuZW91cyA+IDAgIFxuaW5mZXJ0JGFib3J0IDwtIGluZmVydCRpbmR1Y2VkID4gMCB8IGluZmVydCRzcG9udGFuZW91cyA+IDBcbiMgY3Jvc3N0YWIgXG4jIFNUQVRBIHRhYiBhYm9ydCBjYXNlXG50YWJsZShpbmZlcnQkYWJvcnQsIGluZmVydCRjYXNlKVxuIyBTVEFUQSBsb2dpdCBjYXNlIGFib3J0LCBPUiBcbm1vZGVsLjEgPC0gZ2xtKGNhc2UgfiBhYm9ydCwgZGF0YT1pbmZlcnQsIGZhbWlseT1cImJpbm9taWFsXCIpIFxuc3VtbWFyeShtb2RlbC4xKVxuIyBHZXQgb2RkcyByYXRpb1xuZXhwKGNvZWYobW9kZWwuMSkpXG4jIEdldCBjb25maWRlbmNlIGludGVydmFsc1xuZXhwKGNvbmZpbnQobW9kZWwuMSkpIn0=


Return to index

lowess lfit

To generate a scatter plot with a smoothing or regression line you first have to generate the plot. Afterwards you can add the lines. Note on lowess smoother: if you have missing data in one of the variables you should first remove those observations (or use the alternative loess which is slightly more complicated to handle).

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFNUQVRBIGxvd2VzcyBXaW5kIFRlbXBcbnBsb3QoYWlycXVhbGl0eSRXaW5kLCBhaXJxdWFsaXR5JFRlbXApXG5saW5lcyhsb3dlc3MoYWlycXVhbGl0eSRXaW5kLCBhaXJxdWFsaXR5JFRlbXApKVxuXG5cbiMgU1RBVEEgZ3JhcGggdHdvd2F5IChsZml0IFdpbmQgT3pvbmUpIChzY2F0dGVyIFdpbmQgT3pvbmUgKVxucGxvdChhaXJxdWFsaXR5JFdpbmQsIGFpcnF1YWxpdHkkVGVtcClcbmFibGluZShsbShUZW1wIH4gV2luZCwgZGF0YSA9IGFpcnF1YWxpdHkpKSJ9


Return to index

mean

Stata's mean calculates the mean together with standard error and confidence intervals. In R, confidence intervals can be estimated with a one sample t-test and the (constant only) linear regression returns the standard error. For geometric and harmonic means see ameans

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFNUQVRBIG1lYW4gVGVtcFxuIyBcbiMgMSBzYW1wbGUgdC50ZXN0IGZvciBtZWFuIGFuZCA5NSVDSSBcbnQudGVzdChhaXJxdWFsaXR5JFRlbXApXG4jXG4jIGxpbmVhciByZWdyZXNzaW9uIGZvciBtZWFuIGFuZCBzdGQgRXJyXG5zdW1tYXJ5KGxtKFRlbXAgfiAxLCBkYXRhPWFpcnF1YWxpdHkpKSJ9


Return to index

merge

R has a basic merge function but it is recommended to use the dplyr package. Have a look at the documentation:
?dplyr::join


Return to index

mhodds

The Mantel-Haenszel Test is implemented in mantelhaen.test

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIGdlbnJhdGUgYmluYXJ5IHZhcmlhYmxlIGFib3J0ID0gVFJVRSBpZiBpbmR1Y2VkID4gMCBvciBzcG9udGFuZW91cyA+IDAgICBcbiMgU1RBVEEgZ2VuZXJhdGUgYWJvcnQgPSBpbmR1Y2VkID4gMCB8IHNwb250YW5lb3VzID4gMCAgXG5pbmZlcnQkYWJvcnQgPC0gaW5mZXJ0JGluZHVjZWQgPiAwIHwgaW5mZXJ0JHNwb250YW5lb3VzID4gMFxuIyBjcm9zc3RhYiBcbiMgU1RBVEEgdGFiIGFib3J0IGNhc2VcbnRhYmxlKGluZmVydCRhYm9ydCwgaW5mZXJ0JGNhc2UpXG4jIE1hbnRlbC1IYWVuc3plbCBUZXN0IHN0cmF0aWZpZWQgYnkgZWR1Y2F0aW9uYWwgbGV2ZWxcbiMgU1RBVEEgbWhvZGRzIGNhc2UgYWJvcnQgZWR1Y3Rpb25cbm1hbnRlbGhhZW4udGVzdChpbmZlcnQkY2FzZSwgaW5mZXJ0JGFib3J0LCBpbmZlcnQkZWR1Y2F0aW9uKSJ9


Return to index

mvdecode mvencode

Handling missing data in R is a bit tricky. The function is.na checks if a value is missing.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFNUQVRBIG12ZGVjb2RlIE1vbnRoLCBtdig5KSAgXG5haXJxdWFsaXR5JE1vbnRoW2FpcnF1YWxpdHkkTW9udGggPT0gOV0gPC0gTkFcbnRhYmxlKCBhaXJxdWFsaXR5JE1vbnRoLCB1c2VOQT1cImlmYW55XCIpXG4jIFNUQVRBIG12ZGVjb2RlIF9hbGwsIG12KDk5OSkgIFxuYWlycXVhbGl0eVthaXJxdWFsaXR5ID09IDk5OV0gPC0gTkFcblxuIyBTVEFUQSBtdmVuY29kZSBPem9uZSwgbXYoOTk5KSAgXG5haXJxdWFsaXR5JE96b25lW2lzLm5hKGFpcnF1YWxpdHkkT3pvbmUpXSA8LSA5OTlcbnRhYmxlKCBhaXJxdWFsaXR5JE96b25lLCB1c2VOQT1cImlmYW55XCIpIn0=


Return to index

predict

Predicted values in regression models can be estimated via predict. Residuals are stored in mymodel$residuals.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFByZWRpY3RlZCB2YWx1ZXMgd2l0aCBubyBtaXNzaW5nIGRhdGFcbiMgU1RBVEEgcmVncmVzcyBXaW5kIFRlbXBcbiMgU1RBVEEgcHJlZGljdCBwcmVkLCB4YiBcbm1vZGVsLjEgPC0gbG0oV2luZCB+IFRlbXAsIGRhdGE9YWlycXVhbGl0eSkgXG5zdW1tYXJ5KG1vZGVsLjEpXG5haXJxdWFsaXR5JHByZWQgPC0gcHJlZGljdChtb2RlbC4xKVxuIyBTVEFUQSBwcmVkaWN0IHJlc2lkLCByZXNpZHVhbHNcbmFpcnF1YWxpdHkkcmVzaWQgPC0gbW9kZWwuMSRyZXNpZHVhbHNcbnN1bW1hcnkoYWlycXVhbGl0eSlcblxuIyBQcmVkaWN0ZWQgdmFsdWVzIHdpdGggbWlzc2luZyBkYXRhXG5tb2RlbC4yIDwtIGxtKE96b25lIH4gU29sYXIuUiwgZGF0YT1haXJxdWFsaXR5KSBcbnN1bW1hcnkobW9kZWwuMilcbmFpcnF1YWxpdHkkcHJlZCA8LSBwcmVkaWN0KG1vZGVsLjIsIG5ld2RhdGEgPSBhaXJxdWFsaXR5KVxuYWlycXVhbGl0eSRyZXNpZFstYyhtb2RlbC4yJG5hLmFjdGlvbildIDwtIG1vZGVsLjIkcmVzaWR1YWxzXG5zdW1tYXJ5KGFpcnF1YWxpdHkpIn0=


Return to index

qnorm

Use qqnorm to genrate a normal Q-Q plot. See regress for a normal Q-Q plot of the residuals from a regression model.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFNUQVRBIHFxbm9ybSBXaW5kXG5xcW5vcm0oYWlycXVhbGl0eSRXaW5kKSJ9


Return to index

ranksum signrank ttest

R's function for Student's t-test is t.test. Note that Stata assumes by default equal variances whereas R assumes unequal variances. Stata's ranksum and signrank translates to wilcox.test in R.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIHVucGFpcmVkIHQtdGVzdCAoZXF1YWwgdmFyKVxuIyBTVEFUQSB0dGVzdCBhZ2UsIGJ5KGNhc2UpXG50LnRlc3QoYWdlIH4gY2FzZSwgZGF0YSA9IGluZmVydCwgdmFyLmVxdWFsID0gVClcbiMgdW5wYWlyZWQgdC10ZXN0ICh1bmVxdWFsIHZhciAtIFdlbGNoJ3MgYXBwcm94KVxuIyBTVEFUQSB0dGVzdCBhZ2UsIGJ5KGNhc2UpIHdlbGNoXG50LnRlc3QoYWdlIH4gY2FzZSwgZGF0YSA9IGluZmVydClcbiMgbm9ucGFyYW1ldHJpY1xuIyBTVEFUQSByYW5rc3VtIGFnZSwgYnkoY2FzZSkgd2VsY2hcbndpbGNveC50ZXN0KGFnZSB+IGNhc2UsIGRhdGEgPSBpbmZlcnQpXG5cbiMgcGFpcmVkIHQtdGVzdFxuIyBTVEFUQSB0dGVzdCBzcG9udGFiZW91cyA9PSBpbmR1Y2VkXG50LnRlc3QoaW5mZXJ0JHNwb250YW5lb3VzLCBpbmZlcnQkaW5kdWNlZCwgcGFpcmVkPVQpXG4jIG5vbnBhcmFtZXRyaWMgV2lsY294b24gbWF0Y2hlZC1wYWlycyBzaWduZWQtcmFua3MgdGVzdFxuIyBTVEFUQSBzaWducmFuayBzcG9udGFiZW91cyA9PSBpbmR1Y2VkXG53aWxjb3gudGVzdChpbmZlcnQkc3BvbnRhbmVvdXMsIGluZmVydCRpbmR1Y2VkLCBwYWlyZWQ9VCkifQ==


Return to index

recode

There is no function in R which has the same flexibility as Stata's recode. A useful function is recode in the dplyr package. Note that numeric values have to be enclosed in backticks ` and that replacements have to be specified exhaustively (other values will be set to NA). If you would like to recode only a single or few values see replace.
Stata has also a recode function which can be used together with generate to split numerical values into groups. If you would like to split a numerical variable into quantiles see xtile

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFJlY29kaW5nIG51bWVyaWNhbCB2YWx1ZXNcbiMgU1RBVEEgcmVjb2RlIE1vbnRoICg1IDkgPSAwKSAoNi84ID0gMSksIGdlbmVyYXRlKFN1bW1lcikgXG5saWJyYXJ5KGRwbHlyKVxuYWlycXVhbGl0eSRTdW1tZXIgPC0gcmVjb2RlKGFpcnF1YWxpdHkkTW9udGgsIGA1YCA9IDAsIGA2YCA9IDEsIGA3YCA9IDEsIGA4YCA9IDEsIGA5YCA9IDApXG5haXJxdWFsaXR5JFN1bW1lclxuXG4jIENhdGVncml6ZSBPem9uZSBpbnRvIDw9NTAsIDw9MTAwLCAxMDErXG4jIFNUQVRBIGdlbmVyYXRlIE96b25lQ2F0ID0gcmVjb2RlKE96b25lLCA1MCwgMTAwLCAyMDApIFxuYWlycXVhbGl0eSRPem9uZUNhdCA8LSBjdXQoYWlycXVhbGl0eSRPem9uZSwgYnJlYWtzPWMoMCw1MCwxMDAsMjAwKSwgaW5jbHVkZS5sb3dlc3Q9VClcbmFpcnF1YWxpdHkkT3pvbmVDYXQifQ==


Return to index

regress rvfplot

R's function for linear regression is lm (for 'linear model'). Regression diagnostic plots can be obtained with the generic plot function.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFNUQVRBIHJlZ3Jlc3MgVGVtcCBPem9uZSBcbm1vZGVsLjEgPC0gbG0oVGVtcCB+IE96b25lLCBkYXRhPWFpcnF1YWxpdHkpIFxuc3VtbWFyeShtb2RlbC4xKVxuXG4jIEdldCBjb25maWRlbmNlIGludGVydmFsc1xuY29uZmludChtb2RlbC4xKVxuXG4jIFJlc3NpZHVhbHMgdnMgZml0dGVkIHBsb3RcbnBsb3QobW9kZWwuMSwgd2hpY2ggPSAxKVxuXG4jIG5vcm1hbCBRLVEgcGxvdCBvZiB0aGUgcmVzaWR1YWxzXG5wbG90KG1vZGVsLjEsIHdoaWNoID0gMikifQ==


Return to index

rename

Variable names can be manipulated with colnames(mydata).

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFNUQVRBIHJlbmFtZSBTb2xhci5SIHNvbGFyXG5jb2xuYW1lcyhhaXJxdWFsaXR5KVxuY29sbmFtZXMoYWlycXVhbGl0eSlbMl0gPC0gXCJzb2xhclwiXG5jb2xuYW1lcyhhaXJxdWFsaXR5KVxuIyBTVEFUQSByZW5hbWUgX2FsbCwgbG93ZXJcbmNvbG5hbWVzKGFpcnF1YWxpdHkpIDwtIHRvbG93ZXIoY29sbmFtZXMoYWlycXVhbGl0eSkpXG5jb2xuYW1lcyhhaXJxdWFsaXR5KSJ9


Return to index

replace

replace is usually used with the conditional statement if. In R this is either done via conditional indexing, the ifelse function or mutate in the dplyr package.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIHdpdGggaW5kZXhpbmdcbiMgU1RBVEEgcmVwbGFjZSBhZ2UgPSA0MCBpZiBhZ2UgPiA0MFxuaW5mZXJ0JGFnZVtpbmZlcnQkYWdlID4gNDBdIDwtIDQwXG50YWJsZShpbmZlcnQkYWdlKVxuIyB3aXRoIGlmZWxzZVxuIyBTVEFUQSByZXBsYWNlIGFnZSA9IDM1IGlmIGFnZSA+IDM1XG5pbmZlcnQkYWdlIDwtIGlmZWxzZShpbmZlcnQkYWdlID4gMzUsIDM1LCBpbmZlcnQkYWdlKVxudGFibGUoaW5mZXJ0JGFnZSlcbiMgd2l0aCBtdXRhdGVcbiMgU1RBVEEgcmVwbGFjZSBhZ2UgPSAzMCBpZiBhZ2UgPiAzMFxubGlicmFyeShkcGx5cilcbmluZmVydCA8LSBtdXRhdGUoaW5mZXJ0LCBhZ2UgPSBpZmVsc2UoYWdlID4gMzAsIDMwLCBhZ2UpKVxudGFibGUoaW5mZXJ0JGFnZSkifQ==


Return to index

reshape

A function with the same name is available in R. If you often transform data, consider the package reshape.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIHdpdGggaW5kZXhpbmdcbiMgU1RBVEEgcmVzaGFwZSB3aWRlIE96b25lIFNvbGFyLlIgV2luZCBUZW1wLCBpKERheSkgaihNb250aClcbmFpci53aWRlIDwtIHJlc2hhcGUoYWlycXVhbGl0eSwgZGlyZWN0aW9uPVwid2lkZVwiLGlkdmFyPVwiRGF5XCIsdGltZXZhcj1cIk1vbnRoXCIpXG5oZWFkKGFpci53aWRlKSJ9


Return to index

sample

A sub sample of the dataset can be obtained with the sample function but the dplyr package provides more features including stratified sampling.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFNUQVRBIHNhbXBsZSA0LCBjb3VudCBieShNb250aCkgXG5saWJyYXJ5KGRwbHlyKVxuc3ViLnNhbXBsZSA8LSBhaXJxdWFsaXR5ICU+JVxuICBncm91cF9ieShNb250aCkgJT4lXG4gIHNhbXBsZV9uKDQpXG5oZWFkKHN1Yi5zYW1wbGUsIDEwKSJ9


Return to index

save

To save a dataset in a separated format use the function write.table(mydata, "mydata.txt"). There are in addition the functions write.csv and write.csv2 which are doing the same with different default settings for the arguments. However, it is recommended to specify the arguments individually (see the example). Note: R's save function is doing something else. It saves one or several R objects to an Rdata file.
Examples:
write.table(infert, "mydata.csv", sep= "\t", na= "NA", dec= ".", row.names= F, col.names = T)
# Example below does only work on Windows:
write.table(infert, "clipboard", sep= "\t", na= "NA", dec= ".", row.names= F, col.names= T)



Return to index

scatter

R's generic plot function will produce a scatter-plot if the first 2 arguments are numeric variables.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFNUQVRBIHNjYXR0ZXIgT3pvbmUgVGVtcCBcbnBsb3QoYWlycXVhbGl0eSRPem9uZSwgYWlycXVhbGl0eSRUZW1wKVxuXG4jIFNUQVRBIHR3b3dheSAoc2NhdHRlciBPem9uZSBUZW1wIGlmIE1vbnRoID09IDUpIC4uLiAgKHNjYXR0ZXIgT3pvbmUgVGVtcCBpZiBNb250aCA9PSA5KSBcbnBsb3QoYWlycXVhbGl0eSRPem9uZSwgYWlycXVhbGl0eSRUZW1wLCBjb2w9YWlycXVhbGl0eSRNb250aCkifQ==


Return to index

sort

Sorting a dataset in R is not really intuitive. Therefore, many people prefer the R package dplyr

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFNUQVRBIHNvcnQgVGVtcCBcbmFpcnF1YWxpdHkgPC0gYWlycXVhbGl0eVtvcmRlcihhaXJxdWFsaXR5JFRlbXApLCBdXG5oZWFkKGFpcnF1YWxpdHkpXG4jIHVzaW5nIHBhY2thZ2UgZHBseXJcbmxpYnJhcnkoZHBseXIpXG5haXJxdWFsaXR5IDwtIGFycmFuZ2UoYWlycXVhbGl0eSwgVGVtcClcbmhlYWQoYWlycXVhbGl0eSkifQ==


Return to index

tabulate tab1

Useful functions in R include table prop.table addmargins. Note: R will omit missing values unless the argument useNA="ifany" is specified. Unlike Stata, Chi2 and Fisher's exact test have to be specified.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIFNUQVRBIHRhYnVsYXRlIE96b25lXG50YWJsZShhaXJxdWFsaXR5JE96b24pXG4jIFNUQVRBIHRhYnVsYXRlIE96b25lLCBtaXNzaW5nXG50YWJsZShhaXJxdWFsaXR5JE96b24sIHVzZU5BPVwiaWZhbnlcIilcblxuIyBnZW5yYXRlIGJpbmFyeSB2YXJpYWJsZSBhYm9ydCA9IFRSVUUgaWYgaW5kdWNlZCA+IDAgb3Igc3BvbnRhbmVvdXMgPiAwICAgXG4jIFNUQVRBIGdlbmVyYXRlIGFib3J0ID0gaW5kdWNlZCA+IDAgfCBzcG9udGFuZW91cyA+IDAgIFxuaW5mZXJ0JGFib3J0IDwtIGluZmVydCRpbmR1Y2VkID4gMCB8IGluZmVydCRzcG9udGFuZW91cyA+IDBcbiMgU1RBVEEgdGFidWxhdGUgYWJvcnQgY2FzZVxudGFibGUoaW5mZXJ0JGFib3J0LCBpbmZlcnQkY2FzZSlcbmFkZG1hcmdpbnModGFibGUoaW5mZXJ0JGFib3J0LCBpbmZlcnQkY2FzZSkpXG4jIFNUQVRBIHRhYnVsYXRlIGFib3J0IGNhc2UsIHJvd1xucHJvcC50YWJsZSh0YWJsZShpbmZlcnQkYWJvcnQsIGluZmVydCRjYXNlKSwgbWFyZ2luID0gMSlcbiMgU1RBVEEgdGFidWxhdGUgYWJvcnQgY2FzZSwgY2hpIGV4YWN0XG5jaGlzcS50ZXN0KGluZmVydCRhYm9ydCwgaW5mZXJ0JGNhc2UpXG5maXNoZXIudGVzdChpbmZlcnQkYWJvcnQsIGluZmVydCRjYXNlKVxuXG4jIFNUQVRBIHRhYjEgX2FsbFxuYXBwbHkoaW5mZXJ0LCAyLCB0YWJsZSkifQ==


Return to index

xi

Stata's xi prefix can be used for different purposes. Particular important is that the xi: ... i.myvariable notation to tell Stata that myvariable is a categorical variable. Categorical variables are called factors in R and the first step when we use a new dataset should be to explore which variables are numeric and which categorical using the str. A numeric variable can be converted to a factor using the function as.factor.

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIEluc3BlY3QgdGhlIHZhcmlhYmxlIHR5cGVzIG9mIGRhdGEgaW5mZXJ0XG5zdHIoaW5mZXJ0KVxuIyBMb2dpc3RpYyByZWdyZXNzaW9uIHRvIGFzc2VzcyB0aGUgZWZmZWN0IG9mIGVkdWNhdGlvbiBvbiBjYXNlXG4jIE5vdGU6IHRoaXMgdmFyaWFibGUgaXMgYSBmYWN0b3JcbiMgU1RBVEEgeGk6IGxvZ2l0IGNhc2UgaS5lZHVjdGlvblxubW9kZWwuMSA8LSBnbG0oY2FzZSB+IGVkdWNhdGlvbiwgZGF0YT1pbmZlcnQsIGZhbWlseT1cImJpbm9taWFsXCIpXG5zdW1tYXJ5KG1vZGVsLjEpXG4jIExvZ2lzdGljIHJlZ3Jlc3Npb24gdG8gYXNzZXNzIHRoZSBlZmZlY3Qgb2YgaW5kdWNlZCBvbiBjYXNlXG4jIE5vdGU6IHRoaXMgdmFyaWFibGUgaXMgbnVtZXJpY1xuIyBTVEFUQSB4aTogbG9naXQgY2FzZSBpLmluZHVjZWRcbm1vZGVsLjIgPC0gZ2xtKGNhc2UgfiBhcy5mYWN0b3IoaW5kdWNlZCksIGRhdGE9aW5mZXJ0LCBmYW1pbHk9XCJiaW5vbWlhbFwiKVxuc3VtbWFyeShtb2RlbC4yKSJ9


Return to index

xtile

To split a numeric variable into quantiles you need to combine cut and quantile. alternatively you can use ntile (dplyr package).

eyJsYW5ndWFnZSI6InIiLCJzYW1wbGUiOiIjIHNwbGl0IE96b25lIGludG8gcXVhcnRpbGVzXG4jIFNUQVRBIHh0aWxlIE96b25lNCA9IE96b25lLCBucSg0KVxuYWlycXVhbGl0eSRPem9uZTQgPC0gY3V0KGFpcnF1YWxpdHkkT3pvbmUsIFxuICAgICAgICAgIGJyZWFrcyA9IHF1YW50aWxlKGFpcnF1YWxpdHkkT3pvbmUsIHByb2JzPWMoMCwgMC4yNSwgMC41LCAwLjc1LCAxKSwgbmEucm0gPSBUKSxcbiAgICAgICAgICBpbmNsdWRlLmxvd2VzdCA9IFQpXG50YWJsZShhaXJxdWFsaXR5JE96b25lNClcbiNcbiMgc3BsaXQgVGVtcCBpbnRvIHF1YXJ0aWxlcyB1c2luZyB0aGUgZHBseXIgcGFja2FnZVxuIyBTVEFUQSB4dGlsZSBUZW1wNCA9IFRlbXAsIG5xKDQpXG5saWJyYXJ5KGRwbHlyKVxuYWlycXVhbGl0eSA8LSBtdXRhdGUoYWlycXVhbGl0eSwgVGVtcDQgPSBudGlsZShUZW1wLCA0KSlcbnRhYmxlKGFpcnF1YWxpdHkkVGVtcDQpIn0=


Return to index