Skip to contents

Command block overview

In this document, we will show how the different command blocks in datadaptor work. There are the following keywords that you can use to enter the command blocks in your Excel mapping file:

keyword command_block sheet
#RECNA cmd_recna_xcpt
#ACROSS cmd_across Free
#AVALL cmd_add_labs Free
#ADDFILE cmd_addfile Free
#COMP cmd_comp Free
#DEBUG cmd_debug Free
#DIC cmd_dic Free
#FILTER cmd_filter Free
#IF cmd_if Free
#KG cmd_kg Free
#MERGE cmd_merge Free
#R cmd_r Free
#REC cmd_rec Free
#RENAME cmd_rename Free
#RFUN cmd_rfun Free
#RMVAL cmd_rmval Free
#SELECT cmd_select Free
#VARL cmd_set_lab Free
#VALL cmd_set_labs Free
#NEWVALL cmd_newvall Label
#SUMVAR cmd_sumvar Label
#AUTOREC cmd_autorec Variables
#DROP cmd_drop Variables
#NEWLAB cmd_newlab Variables
#RENAME_varsheet cmd_rename_varsheet Variables
#STR2NUM cmd_str_to_num Variables
#verbatim cmd_verbatim Verbatims
cmd_verbatim_custom cmd_verbatim_custom Verbatims

The column "command_block" shows the class, which determines how datadaptor parses the command block (with parse_command_args()) and then applies it on the data (with apply_command()).

Command block examples

Illustration of labelled datasets

In this article we’ll talk about labelled datasets. Therefore, we’ll first explain how we’ll illustrate these datasets:

q1 q2 q3
values

1


2 2 I'm a string


3 variable
labels
varlab hello how are you?
1 a

2 b c
3
d

The table shows a dataset containing the variables q1, q2& q3, each containing 3 observations. In the values section, the data is shown. q1 & q2 are numeric and q3 is a string variable. Below follows the section containing the labels in the data. The varlab row in green shows the variable labels. q1 is labelled with “hello”, and q2 is labelled with “how are you?”. Below, the labelled values are shown. In q1 value 1 is labelled by “a” and 2 is labelled by “b”. In q2 value 2 is labelled by “c” and 3 is labelled by “d”. The variable q3 is not labelled.

In the following sections we’ll explain, how an initial dataset is then modified by command blocks from an Excel mapping file resulting in the modified dataset.

Free sheets

Here we’ll look at examples how various commands in the Free sheets can look like.

#VARL

q1
values

4

5

6
labels
varlab
&
A B C
#VARL q1 assigned variable label
=>
q1
values

4

5

6
labels
varlab assigned variable label

Actually, this command does not change the values, but it assigns the variable label specified in column C of the command to the variable specified in column B.

#VALL

Assign value labels

q1
values

1

1

2
labels
varlab
&
A B C
#VALL q1 yes/no question

1 Yes

2 No
=>
q1
values

1

1

2
labels
varlab yes/no question
1 Yes
2 No

This command also does not change the values. Just like #VARL, it assigns the variable label specified in the first row of column C (if empty, it keeps the variable label, if already existing) to the variable specified in the first row in column B. It also assigns the value labels in column C to the values in column B in the following rows of the command.

#AVALL

Add value labels

q1
values

1

2

3
labels
varlab
1 a
&
A B C
#AVALL q1

2 b

3 c
=>
q1
values

1

2

3
labels
varlab
1 a
2 b
3 c

#AVALL does the same as #VALL, except that it keeps the value values of the original variable, if they are not overwritten.

#RMVAL

Remove values and their labels

q1
values

1

2

3
labels
varlab abc
1 a
2 b
3 c
&
A B C D
#RMVAL q1 q2 def

2

3
=>
q1 q2
values

1 1

2

3
labels
varlab abc def
1 a a
2 b
3 c

Here, the #RMVAL command makes a copy of q1, named q2, and removes the specified values. (In the first line of the command block, column B specifies the variable to be recoded, and in column C the name of the recoded variable. Column D assigns the new variable label.) In the following rows you specify the values and their labels to be removed in column B.

#COMP

Example

q1
values

1

2

3
labels
varlab
&
A B C
#COMP q2 5
=>
q1 q2
values

1 5

2 5

3 5
labels
varlab

This command computes a new variable (specified in column B) with the value in column C.

Example

You can also assign a variable to the values of the new variable. In this case, the labels of the original variable are also copied:

q1
values

1

2

3
labels
varlab abc
1 a
2 b
3 c
&
A B C
#COMP q2 q1
=>
q1 q2
values

1 1

2 2

3 3
labels
varlab abc abc
1 a a
2 b b
3 c c

#IF

Example

q1
values

1

2

3
labels
varlab
&
A B C
#IF q1 >= 2 q2 = 5
=>
q1 q2
values

1

2 5

3 5
labels
varlab

This command takes a condition (more precisely a logical vector of length of the variables in the data) in column B and an assignment in column C (of the form <variable name> = <assigned values>). When the condition is TRUE, the expression (<assigned values>) is assigned to <variable name>. In this case, the value 5 is assigned to q2 where q1 is greater or equal to 2. In this case q2 didn’t exist in the data before, and thus is initialized to NA before.

Example

If the variable existed before, the values where the condition evaluates to FALSE are kept, as can be seen in the next example:

q1 q2
values

1 7

2 8

3 9
labels
varlab
&
A B C
#IF q1 >= 2 q2 = 5
=>
q1 q2
values

1 7

2 5

3 5
labels
varlab

Here the value of 7 in the first row is kept from the original data.

Example

The #IF command preserves labels, if an already existing variable is modified:

q1
values

1

2

3
labels
varlab abc
1 a
2 b
3 c
&
A B C
#IF q1 >= 2 q1 = 5
=>
q1
values

1

5

5
labels
varlab abc
1 a
2 b
3 c

#REC

Example

q1
values

1

2

3

4

5

99
labels
varlab abcde
1 a
2 b
3 c
4 d
5 e
99 No answer
&
A B C D E
#REC q1 kq1 fgh
1 2 1 a & b
3 2 c
4 5 3 d & e
=>
q1 kq1
values

1 1

2 1

3 2

4 3

5 3

99
labels
varlab abcde fgh
1 a a & b
2 b c
3 c d & e
4 d
5 e
99 No answer

The #REC command recodes variables. In this example q1 is recoded to kq1. (In the first line of the command block, column B specifies the variable to be recoded, and in column C the name of the recoded variable. Column D assigns the new variable label.) In the following rows you specify the intervals to be recoded in column B and C, column D specifies the new value and column E its label. So for instance, the second line specifies, that all values in q1 between 1 and 2 are recoded to 1 in kq1 with the value label a & b.

Example

If you don’t specify the variable label (column D, first row), the one of the original variable is copied:

q1
values

1

2

3
labels
varlab abc
1 a
2 b
3 c
&
A B C D E
#REC q1 kq1

1 2 1 a & b
=>
q1 kq1
values

1 1

2 1

3
labels
varlab abc abc
1 a a & b
2 b
3 c

Example

If the name of the recoded variable is the same as the original variable, the values that are not recoded are kept:

q1
values

1

2

3
labels
varlab abc
1 a
2 b
3 c
&
A B C D E
#REC q1 q1

1 2 0 a & b
=>
q1
values

0

0

3
labels
varlab abc
0 a & b

#KG

Example

q1 q2
values

1 1

2 2

1 3

2 1

1 2

2 3
labels
varlab How are you? Country
1 good UK
2 bad US
3
FR
4
DE
&
A B C
#KG q1 q2
=>
q1 q2 q1xq2k10 q1xq2k20 q1xq2k30 q1xq2k40
values

1 1 1



2 2
2


1 3

1

2 1 2



1 2
1


2 3

2
labels
varlab How are you? Country UK: How are you? US: How are you? FR: How are you? DE: How are you?
1 good UK good good good good
2 bad US bad bad bad bad
3
FR



4
DE



The #KG command is useful to calculate cross-tabulations for subsets. It computes new variables for all the value labels of the variable q2 (in column C). The value labels are copied from q1 (the variable in column B). The new variable labels are a combination of the value labels of q2 and the variable label of q1. The values are copied of q1 if q2 takes the value specified in the variable label and are NA otherwise.

#SELECT

This command block passes the arguments in column B to dplyr::select(). However, renaming doesn’t work (see #RENAME to do this). It can be used to remove and reorder variables in the dataset.

Example

This removes the variable a2:

a1 b1 b2 a2
values

1 3 1 5

2 4 2 6
labels
varlab
&
A B
#SELECT -a2
=>
a1 b1 b2
values

1 3 1

2 4 2
labels
varlab

Example

a1 b1 b2 a2
values

1 3 1 5

2 4 2 6
labels
varlab
&
A B
#SELECT matches("a\\d")
=>
a1 a2
values

1 5

2 6
labels
varlab

Example

In order to keep all variables, and only reorder those named before, you can pass everything() in the end:

a1 b1 b2 a2
values

1 3 1 5

2 4 2 6
labels
varlab
&
A B
#SELECT matches("a\\d")
everything()
=>
a1 a2 b1 b2
values

1 5 3 1

2 6 4 2
labels
varlab

#RENAME

Example

This command renames variables in column B to the names in column C:

a1 b1 b2 a2
values

1 3 1 5

2 4 2 6
labels
varlab
&
A B C
#RENAME a2 new_name_for_a2
b2 new_name_for_b2
=>
a1 b1 new_name_for_b2 new_name_for_a2
values

1 3 1 5

2 4 2 6
labels
varlab

#ACROSS

This command block has the same tidyselect selection semantics as #SELECT. It then applies the function in column C by executing df |> dplyr::mutate(dplyr::across(var_selection, fun)) where df is the current data, var_selection the selected variables in column B, and fun the function.

Example

This takes all numeric variables from a1 to b2 and calculates the deviation to their mean :

a1 b1 b2 a2
values

1 a 4 5

2 b 3 6
labels
varlab
&
A B C
#ACROSS where(is.numeric) & a1:b2

function(x) x - mean(x)
=>
a1 b1 b2 a2
values

-0.5 a 0.5 5

0.5 b -0.5 6
labels
varlab

Instead of "function(x) x - mean(x)" you can also use the short form "\(x) x - mean(x)", or write intermediate steps and by using curly braces and separating statements with semi-colons like "\(x) {res <- x - mean(x); res}" (also see ?"{").

Example

Here is a very clunky function applied on a1 & b2 that sets the value -2 to NA and removes its value label.

a1 b1 b2 a2
values

1 a -2 1

-2 b -2 -2

-2 c 1 1
labels
varlab a1 b1 a2
-2 FILTER
FILTER FILTER
1 a
a a
2 b
b b
3 c
c c
&
A B C
#ACROSS where(is.numeric) & a1:b2

function(x) {x[x == -2] <- NA; vall <- attr(x, "labels"); attr(x, "labels") <- vall[vall != -2]; x}
=>
a1 b1 b2 a2
values

1 a
1


b
-2


c 1 1
labels
varlab a1 b1 a2
-2


FILTER
1 a
a a
2 b
b b
3 c
c c

This function does the same as #RMVAL.

Example

It is also possible to provide the .names argument in the first row of column C. And you can provide a list of multiple functions to the .fns argument in the following rows. You can also give names to the functions in column B.

a1 b2
values

1 -2

-2 -2

-2 1
labels
varlab a1 b1
-2 FILTER FILTER
1 a a
2 b b
3 c c
&
A B C
#ACROSS where(is.numeric) & a1:b2 {.col}___{.fn}
rm_val function(x) {x[x == -2] <- NA; vall <- attr(x, "labels"); attr(x, "labels") <- vall[vall != -2]; x}
mean mean
=>
a1 b2 a1___rm_val a1___mean b2___rm_val b2___mean
values

1 -2 1 -1
-1

-2 -2
-1
-1

-2 1
-1 1 -1
labels
varlab a1 b1 a1 b1
-2 FILTER FILTER



1 a a a
a
2 b b b
b
3 c c c
c

See the documentation of dplyr::across() for a more thorough documentation.

curlychop()

The idea of the function curlychop() is to prevent you from writing redundant code by turning one line into several commands. Let’s have a first look at it with the help of an example.

Example

Let’s assume you want to make copies of several variables, all ending with the same suffix “1”. You can do this by using several #COMP commands:

a1 b1
values

1 3

2 4
labels
varlab
&
A B C
#COMP ka1 a1
#COMP kb1 b1
=>
a1 b1 ka1 kb1
values

1 3 1 3

2 4 2 4
labels
varlab

The function curlychop() lets you write these commands in a more succinct form:

X1 X2 X3
#COMP k{a b}1 {a b}1

It replaces the curly braces by each of the parts inside (separated by spaces). You can put as many parts inside the curly braces as you want, but in the examples we’ll only write 2 parts to save the eye.

Example

You can also write multiple parts with curly braces inside one cell

a1 b1
values

1 3

2 4
labels
varlab
&
A B C
#IF {a b}1 == {1 4} k{a b}1 = {5 6}
=>
a1 b1 ka1 kb1
values

1 3 5

2 4
6
labels
varlab

The above command block is equivalent to these 2 #IF commands:

A B C
#IF a1 == 1 ka1 = 5
#IF b1 == 4 kb1 = 6

Example

For commands spanning multiple rows, you can also use curlychop() in the first row:

q_1 q_2
values

1 2

2 3

3 4
labels
varlab varlab q1 varlab q2
&
A B C D E
#REC q_{1 2} kq{_1 _2}


1 2 1 value label of 1

3 4 2 value label of 2
=>
q_1 q_2 kq_1 kq_2
values

1 2 1 1

2 3 1 2

3 4 2 2
labels
varlab varlab q1 varlab q2 varlab q1 varlab q2
1

value label of 1 value label of 1
2

value label of 2 value label of 2

The above command block is equivalent to these 2 #REC commands:

A B C D E
#REC q_1 kq_1


1 2 1 value label of 1

3 4 2 value label of 2
#REC q_2 kq_2


1 2 1 value label of 1

3 4 2 value label of 2

Variables sheets

To enter commands here, you don’t need to specify the commands, as in the Free sheets. Instead you need to put the information in one of the columns op, new_name or new_label:

#RENAME_varsheet

Example

This command renames the variables specified in the var column (column A) to the names in the new_name column. It does the same as the #RENAME command.

a1 b1
values

1 3

2 4
labels
varlab
&
var new_name
a1 c1
b1 d1
=>
c1 d1
values

1 3

2 4
labels
varlab

#NEWLAB

Example

This command assigns new variable labels (specified in the new_label column) to the variables specified in the var column (column A). It does the same as the #VARL command.

a1 b1 c1
values

1 1 3

2 2 4
labels
varlab original variable label original variable label of b1
&
var new_label
a1 I'm new
b1
c1
=>
a1 b1 c1
values

1 1 3

2 2 4
labels
varlab I'm new original variable label of b1

#DROP

Example

This command removes the variables marked with d (for “drop”) in the op column:

a1 b1
values

1 3

2 4
labels
varlab
&
var op
a1 d
b1
=>
b1
values

3

4
labels
varlab

The result is the same as specifying #SELECT -a1 on a Free sheet.

#STR2NUM

Example

This command turns string variables into numeric ones by putting an n (for “numeric”) in the op column:

a1 b1
values

1 3

0033 4
labels
varlab
&
var op
a1 n
b1
=>
a1 b1
values

1 3

33 4
labels
varlab

#AUTORECODE

Example

This command autorecodes string variables by putting a (for “autorecode”) in the op column. It assigns labelled values (1, 2, 3, …), ordered alphabetically:

a1
values

auto

recode

auto

hello
labels
varlab
&
var op
a1 a
=>
a1
values

1

3

1

2
labels
varlab
1 auto
2 hello
3 recode

Label sheets

Here, you can modify value labels or create new variable

#SUMVAR

Example

This command recodes variables specified in the var column (column A) to new variables, by attaching a “k” prefix. It is very similar to the #REC command.

a1
values

1

2

3

4

5
labels
varlab original variable label
1 A
2 B
3 C
4 D
5 E
&
var nv sum_var_label sum_var_value sum_var_vallab
a1 1 new varlab 1 a & b
a1 2
1
a1 3
2 c
a1 4
3 d & e
a1 5
3
=>
a1 ka1
values

1 1

2 1

3 2

4 3

5 3
labels
varlab original variable label new varlab
1 A a & b
2 B c
3 C d & e
4 D
5 E

All values of a1 (specified by nv) that are assigned to the same value under sum_var_value are mapped onto that value (labelled by the string in the first row where this sum_var_value occurs). The assigned variable label of the summary variable is specified in the first row of sum_var_label (if empty the variable label of the original variable is copied).

#NEWVALL

Example

This command replaces value labels like #AVALL:

a1
values

1

2

3

4

5
labels
varlab original variable label
1 A
2 B
3 C
4 D
5 E
&
var nv new_label
a1 1 new value label of value 1
a1 2
a1 3
a1 4
a1 5
=>
a1
values

1

2

3

4

5
labels
varlab original variable label
1 new value label of value 1
2 B
3 C
4 D
5 E

#RECNA

This command is executed per default (unless you set the parameter na_to_filter to FALSE when creating the Mapping object with

Mapping$new(na_to_filter = FALSE)

for instance). It:

  • recodes missing values in all numeric variables in the dataset to -2 (cf. parameter miss_rec_val in get_mapping_options()), and
  • labels this value by FILTER (cf. parameter miss_rec_lab in get_mapping_options()) in these variables

Under the hood, this will call the function set_na_to_filter() on all the numeric variables. You can make exceptions for the variable names (space-separated & defined in the parameter string not_miss_to_filter_vars; also see get_mapping_options()).

a b c
values

1 1


2 I'm a string

1
variable
labels
varlab
1
a
2
b
&
A B C
#IF a == 1 d = 2
=>
a b c d
values

1 1
2

-2 2 I'm a string

1 -2 variable 2
labels
varlab
-2 FILTER FILTER

1
a

2
b

The missing values in the variables a & b were changed to -2 and the value label of -2 = FILTER was added. The variable d that was generated by the #COMP command is not touched, as the #RECNA command is executed in the very beginning of all commands in the mapping and thus only affects the numeric variables present in the original dataset.