Documentation of the Excel mapping command blocks

Command block overview

In this document, we will show how the different command blocks in datadaptor work. There are the following keywords that you can use to enter the command blocks in your Excel mapping file:

keyword	command_block	sheet
#RECNA	cmd_recna_xcpt
#ACROSS	cmd_across	Free
#AVALL	cmd_add_labs	Free
#ADDFILE	cmd_addfile	Free
#COMP	cmd_comp	Free
#DEBUG	cmd_debug	Free
#DIC	cmd_dic	Free
#FILTER	cmd_filter	Free
#IF	cmd_if	Free
#KG	cmd_kg	Free
#MERGE	cmd_merge	Free
#R	cmd_r	Free
#REC	cmd_rec	Free
#RENAME	cmd_rename	Free
#RFUN	cmd_rfun	Free
#RMVAL	cmd_rmval	Free
#SELECT	cmd_select	Free
#VARL	cmd_set_lab	Free
#VALL	cmd_set_labs	Free
#NEWVALL	cmd_newvall	Label
#SUMVAR	cmd_sumvar	Label
#AUTOREC	cmd_autorec	Variables
#DROP	cmd_drop	Variables
#NEWLAB	cmd_newlab	Variables
#RENAME_varsheet	cmd_rename_varsheet	Variables
#STR2NUM	cmd_str_to_num	Variables
#verbatim	cmd_verbatim	Verbatims
cmd_verbatim_custom	cmd_verbatim_custom	Verbatims

The column "command_block" shows the class, which determines how datadaptor parses the command block (with parse_command_args()) and then applies it on the data (with apply_command()).

Command block examples

Illustration of labelled datasets

In this article we’ll talk about labelled datasets. Therefore, we’ll first explain how we’ll illustrate these datasets:

	q1	q2	q3
values
	1
	2	2	I'm a string
		3	variable
labels
varlab	hello	how are you?
1	a
2	b	c
3		d

The table shows a dataset containing the variables q1, q2& q3, each containing 3 observations. In the values section, the data is shown. q1 & q2 are numeric and q3 is a string variable. Below follows the section containing the labels in the data. The varlab row in green shows the variable labels. q1 is labelled with “hello”, and q2 is labelled with “how are you?”. Below, the labelled values are shown. In q1 value 1 is labelled by “a” and 2 is labelled by “b”. In q2 value 2 is labelled by “c” and 3 is labelled by “d”. The variable q3 is not labelled.

In the following sections we’ll explain, how an initial dataset is then modified by command blocks from an Excel mapping file resulting in the modified dataset.

`Free` sheets

Here we’ll look at examples how various commands in the Free sheets can look like.

`#VARL`

	q1
values
	4
	5
	6
labels
varlab

A	B	C
#VARL	q1	assigned variable label

	q1
values
	4
	5
	6
labels
varlab	assigned variable label

Actually, this command does not change the values, but it assigns the variable label specified in column C of the command to the variable specified in column B.

`#VALL`

Assign value labels

	q1
values
	1
	1
	2
labels
varlab

A	B	C
#VALL	q1	yes/no question
	1	Yes
	2	No

	q1
values
	1
	1
	2
labels
varlab	yes/no question
1	Yes
2	No

This command also does not change the values. Just like #VARL, it assigns the variable label specified in the first row of column C (if empty, it keeps the variable label, if already existing) to the variable specified in the first row in column B. It also assigns the value labels in column C to the values in column B in the following rows of the command.

`#AVALL`

Add value labels

	q1
values
	1
	2
	3
labels
varlab
1	a

A	B	C
#AVALL	q1
	2	b
	3	c

	q1
values
	1
	2
	3
labels
varlab
1	a
2	b
3	c

#AVALL does the same as #VALL, except that it keeps the value values of the original variable, if they are not overwritten.

`#RMVAL`

Remove values and their labels

	q1
values
	1
	2
	3
labels
varlab	abc
1	a
2	b
3	c

A	B	C	D
#RMVAL	q1	q2	def
	2
	3

	q1	q2
values
	1	1
	2
	3
labels
varlab	abc	def
1	a	a
2	b
3	c

Here, the #RMVAL command makes a copy of q1, named q2, and removes the specified values. (In the first line of the command block, column B specifies the variable to be recoded, and in column C the name of the recoded variable. Column D assigns the new variable label.) In the following rows you specify the values and their labels to be removed in column B.

`#COMP`

Example

	q1
values
	1
	2
	3
labels
varlab

A	B	C
#COMP	q2	5

	q1	q2
values
	1	5
	2	5
	3	5
labels
varlab

This command computes a new variable (specified in column B) with the value in column C.

Example

You can also assign a variable to the values of the new variable. In this case, the labels of the original variable are also copied:

	q1
values
	1
	2
	3
labels
varlab	abc
1	a
2	b
3	c

A	B	C
#COMP	q2	q1

	q1	q2
values
	1	1
	2	2
	3	3
labels
varlab	abc	abc
1	a	a
2	b	b
3	c	c

`#IF`

Example

	q1
values
	1
	2
	3
labels
varlab

A	B	C
#IF	q1 >= 2	q2 = 5

	q1	q2
values
	1
	2	5
	3	5
labels
varlab

This command takes a condition (more precisely a logical vector of length of the variables in the data) in column B and an assignment in column C (of the form <variable name> = <assigned values>). When the condition is TRUE, the expression (<assigned values>) is assigned to <variable name>. In this case, the value 5 is assigned to q2 where q1 is greater or equal to 2. In this case q2 didn’t exist in the data before, and thus is initialized to NA before.

Example

If the variable existed before, the values where the condition evaluates to FALSE are kept, as can be seen in the next example:

	q1	q2
values
	1	7
	2	8
	3	9
labels
varlab

A	B	C
#IF	q1 >= 2	q2 = 5

	q1	q2
values
	1	7
	2	5
	3	5
labels
varlab

Here the value of 7 in the first row is kept from the original data.

Example

The #IF command preserves labels, if an already existing variable is modified:

	q1
values
	1
	2
	3
labels
varlab	abc
1	a
2	b
3	c

A	B	C
#IF	q1 >= 2	q1 = 5

	q1
values
	1
	5
	5
labels
varlab	abc
1	a
2	b
3	c

#REC

Example

	q1
values
	1
	2
	3
	4
	5
	99
labels
varlab	abcde
1	a
2	b
3	c
4	d
5	e
99	No answer

A	B	C	D	E
#REC	q1	kq1	fgh
	1	2	1	a & b
	3		2	c
	4	5	3	d & e

	q1	kq1
values
	1	1
	2	1
	3	2
	4	3
	5	3
	99
labels
varlab	abcde	fgh
1	a	a & b
2	b	c
3	c	d & e
4	d
5	e
99	No answer

The #REC command recodes variables. In this example q1 is recoded to kq1. (In the first line of the command block, column B specifies the variable to be recoded, and in column C the name of the recoded variable. Column D assigns the new variable label.) In the following rows you specify the intervals to be recoded in column B and C, column D specifies the new value and column E its label. So for instance, the second line specifies, that all values in q1 between 1 and 2 are recoded to 1 in kq1 with the value label a & b.

Example

If you don’t specify the variable label (column D, first row), the one of the original variable is copied:

	q1
values
	1
	2
	3
labels
varlab	abc
1	a
2	b
3	c

A	B	C	D	E
#REC	q1	kq1
	1	2	1	a & b

	q1	kq1
values
	1	1
	2	1
	3
labels
varlab	abc	abc
1	a	a & b
2	b
3	c

Example

If the name of the recoded variable is the same as the original variable, the values that are not recoded are kept:

	q1
values
	1
	2
	3
labels
varlab	abc
1	a
2	b
3	c

A	B	C	D	E
#REC	q1	q1
	1	2	0	a & b

	q1
values
	0
	0
	3
labels
varlab	abc
0	a & b

#KG

Example

	q1	q2
values
	1	1
	2	2
	1	3
	2	1
	1	2
	2	3
labels
varlab	How are you?	Country
1	good	UK
2	bad	US
3		FR
4		DE

A	B	C
#KG	q1	q2

	q1	q2	q1xq2k10	q1xq2k20	q1xq2k30	q1xq2k40
values
	1	1	1
	2	2		2
	1	3			1
	2	1	2
	1	2		1
	2	3			2
labels
varlab	How are you?	Country	UK: How are you?	US: How are you?	FR: How are you?	DE: How are you?
1	good	UK	good	good	good	good
2	bad	US	bad	bad	bad	bad
3		FR
4		DE

The #KG command is useful to calculate cross-tabulations for subsets. It computes new variables for all the value labels of the variable q2 (in column C). The value labels are copied from q1 (the variable in column B). The new variable labels are a combination of the value labels of q2 and the variable label of q1. The values are copied of q1 if q2 takes the value specified in the variable label and are NA otherwise.

#SELECT

This command block passes the arguments in column B to dplyr::select(). However, renaming doesn’t work (see #RENAME to do this). It can be used to remove and reorder variables in the dataset.

Example

This removes the variable a2:

	a1	b1	b2	a2
values
	1	3	1	5
	2	4	2	6
labels
varlab

A	B
#SELECT	-a2

	a1	b1	b2
values
	1	3	1
	2	4	2
labels
varlab

Example

	a1	b1	b2	a2
values
	1	3	1	5
	2	4	2	6
labels
varlab

A	B
#SELECT	matches("a\\d")

	a1	a2
values
	1	5
	2	6
labels
varlab

Example

In order to keep all variables, and only reorder those named before, you can pass everything() in the end:

	a1	b1	b2	a2
values
	1	3	1	5
	2	4	2	6
labels
varlab

A	B
#SELECT	matches("a\\d")
	everything()

	a1	a2	b1	b2
values
	1	5	3	1
	2	6	4	2
labels
varlab

#RENAME

Example

This command renames variables in column B to the names in column C:

	a1	b1	b2	a2
values
	1	3	1	5
	2	4	2	6
labels
varlab

A	B	C
#RENAME	a2	new_name_for_a2
	b2	new_name_for_b2

	a1	b1	new_name_for_b2	new_name_for_a2
values
	1	3	1	5
	2	4	2	6
labels
varlab

#ACROSS

This command block has the same tidyselect selection semantics as #SELECT. It then applies the function in column C by executing df |> dplyr::mutate(dplyr::across(var_selection, fun)) where df is the current data, var_selection the selected variables in column B, and fun the function.

Example

This takes all numeric variables from a1 to b2 and calculates the deviation to their mean :

	a1	b1	b2	a2
values
	1	a	4	5
	2	b	3	6
labels
varlab

A	B	C
#ACROSS	where(is.numeric) & a1:b2
		function(x) x - mean(x)

	a1	b1	b2	a2
values
	-0.5	a	0.5	5
	0.5	b	-0.5	6
labels
varlab

Instead of "function(x) x - mean(x)" you can also use the short form "\(x) x - mean(x)", or write intermediate steps and by using curly braces and separating statements with semi-colons like "\(x) {res <- x - mean(x); res}" (also see ?"{").

Example

Here is a very clunky function applied on a1 & b2 that sets the value -2 to NA and removes its value label.

	a1	b1	b2	a2
values
	1	a	-2	1
	-2	b	-2	-2
	-2	c	1	1
labels
varlab	a1		b1	a2
-2	FILTER		FILTER	FILTER
1	a		a	a
2	b		b	b
3	c		c	c

A	B	C
#ACROSS	where(is.numeric) & a1:b2
		function(x) {x[x == -2] <- NA; vall <- attr(x, "labels"); attr(x, "labels") <- vall[vall != -2]; x}

	a1	b1	b2	a2
values
	1	a		1
		b		-2
		c	1	1
labels
varlab	a1		b1	a2
-2				FILTER
1	a		a	a
2	b		b	b
3	c		c	c

This function does the same as #RMVAL.

Example

It is also possible to provide the .names argument in the first row of column C. And you can provide a list of multiple functions to the .fns argument in the following rows. You can also give names to the functions in column B.

	a1	b2
values
	1	-2
	-2	-2
	-2	1
labels
varlab	a1	b1
-2	FILTER	FILTER
1	a	a
2	b	b
3	c	c

A	B	C
#ACROSS	where(is.numeric) & a1:b2	{.col}___{.fn}
	rm_val	function(x) {x[x == -2] <- NA; vall <- attr(x, "labels"); attr(x, "labels") <- vall[vall != -2]; x}
	mean	mean

	a1	b2	a1___rm_val	a1___mean	b2___rm_val	b2___mean
values
	1	-2	1	-1		-1
	-2	-2		-1		-1
	-2	1		-1	1	-1
labels
varlab	a1	b1	a1		b1
-2	FILTER	FILTER
1	a	a	a		a
2	b	b	b		b
3	c	c	c		c

See the documentation of dplyr::across() for a more thorough documentation.

`curlychop()`

The idea of the function curlychop() is to prevent you from writing redundant code by turning one line into several commands. Let’s have a first look at it with the help of an example.

Example

Let’s assume you want to make copies of several variables, all ending with the same suffix “1”. You can do this by using several #COMP commands:

	a1	b1
values
	1	3
	2	4
labels
varlab

A	B	C
#COMP	ka1	a1
#COMP	kb1	b1

	a1	b1	ka1	kb1
values
	1	3	1	3
	2	4	2	4
labels
varlab

The function curlychop() lets you write these commands in a more succinct form:

X1	X2	X3
#COMP	k{a b}1	{a b}1

It replaces the curly braces by each of the parts inside (separated by spaces). You can put as many parts inside the curly braces as you want, but in the examples we’ll only write 2 parts to save the eye.

Example

You can also write multiple parts with curly braces inside one cell

	a1	b1
values
	1	3
	2	4
labels
varlab

A	B	C
#IF	{a b}1 == {1 4}	k{a b}1 = {5 6}

	a1	b1	ka1	kb1
values
	1	3	5
	2	4		6
labels
varlab

The above command block is equivalent to these 2 #IF commands:

A	B	C
#IF	a1 == 1	ka1 = 5
#IF	b1 == 4	kb1 = 6

Example

For commands spanning multiple rows, you can also use curlychop() in the first row:

	q_1	q_2
values
	1	2
	2	3
	3	4
labels
varlab	varlab q1	varlab q2

A	B	C	D	E
#REC	q_{1 2}	kq{_1 _2}
	1	2	1	value label of 1
	3	4	2	value label of 2

	q_1	q_2	kq_1	kq_2
values
	1	2	1	1
	2	3	1	2
	3	4	2	2
labels
varlab	varlab q1	varlab q2	varlab q1	varlab q2
1			value label of 1	value label of 1
2			value label of 2	value label of 2

The above command block is equivalent to these 2 #REC commands:

A	B	C	D	E
#REC	q_1	kq_1
	1	2	1	value label of 1
	3	4	2	value label of 2
#REC	q_2	kq_2
	1	2	1	value label of 1
	3	4	2	value label of 2

`Variables` sheets

To enter commands here, you don’t need to specify the commands, as in the Free sheets. Instead you need to put the information in one of the columns op, new_name or new_label:

#RENAME_varsheet

Example

This command renames the variables specified in the var column (column A) to the names in the new_name column. It does the same as the #RENAME command.

	a1	b1
values
	1	3
	2	4
labels
varlab

var	new_name
a1	c1
b1	d1

	c1	d1
values
	1	3
	2	4
labels
varlab

#NEWLAB

Example

This command assigns new variable labels (specified in the new_label column) to the variables specified in the var column (column A). It does the same as the #VARL command.

	a1	b1	c1
values
	1	1	3
	2	2	4
labels
varlab	original variable label	original variable label of b1

var	new_label
a1	I'm new
b1
c1

	a1	b1	c1
values
	1	1	3
	2	2	4
labels
varlab	I'm new	original variable label of b1

#DROP

Example

This command removes the variables marked with d (for “drop”) in the op column:

	a1	b1
values
	1	3
	2	4
labels
varlab

var	op
a1	d
b1

	b1
values
	3
	4
labels
varlab

The result is the same as specifying #SELECT -a1 on a Free sheet.

#STR2NUM

Example

This command turns string variables into numeric ones by putting an n (for “numeric”) in the op column:

	a1	b1
values
	1	3
	0033	4
labels
varlab

var	op
a1	n
b1

	a1	b1
values
	1	3
	33	4
labels
varlab

#AUTORECODE

Example

This command autorecodes string variables by putting a (for “autorecode”) in the op column. It assigns labelled values (1, 2, 3, …), ordered alphabetically:

	a1
values
	auto
	recode
	auto
	hello
labels
varlab

var	op
a1	a

	a1
values
	1
	3
	1
	2
labels
varlab
1	auto
2	hello
3	recode

`Label` sheets

Here, you can modify value labels or create new variable

#SUMVAR

Example

This command recodes variables specified in the var column (column A) to new variables, by attaching a “k” prefix. It is very similar to the #REC command.

	a1
values
	1
	2
	3
	4
	5
labels
varlab	original variable label
1	A
2	B
3	C
4	D
5	E

var	nv	sum_var_label	sum_var_value	sum_var_vallab
a1	1	new varlab	1	a & b
a1	2		1
a1	3		2	c
a1	4		3	d & e
a1	5		3

	a1	ka1
values
	1	1
	2	1
	3	2
	4	3
	5	3
labels
varlab	original variable label	new varlab
1	A	a & b
2	B	c
3	C	d & e
4	D
5	E

All values of a1 (specified by nv) that are assigned to the same value under sum_var_value are mapped onto that value (labelled by the string in the first row where this sum_var_value occurs). The assigned variable label of the summary variable is specified in the first row of sum_var_label (if empty the variable label of the original variable is copied).

#NEWVALL

Example

This command replaces value labels like #AVALL:

	a1
values
	1
	2
	3
	4
	5
labels
varlab	original variable label
1	A
2	B
3	C
4	D
5	E

var	nv	new_label
a1	1	new value label of value 1
a1	2
a1	3
a1	4
a1	5

	a1
values
	1
	2
	3
	4
	5
labels
varlab	original variable label
1	new value label of value 1
2	B
3	C
4	D
5	E

`#RECNA`

This command is executed per default (unless you set the parameter na_to_filter to FALSE when creating the Mapping object with

Mapping$new(na_to_filter = FALSE)

for instance). It:

recodes missing values in all numeric variables in the dataset to -2 (cf. parameter miss_rec_val in get_mapping_options()), and
labels this value by FILTER (cf. parameter miss_rec_lab in get_mapping_options()) in these variables

Under the hood, this will call the function set_na_to_filter() on all the numeric variables. You can make exceptions for the variable names (space-separated & defined in the parameter string not_miss_to_filter_vars; also see get_mapping_options()).

	a	b	c
values
	1	1
		2	I'm a string
	1		variable
labels
varlab
1		a
2		b

A	B	C
#IF	a == 1	d = 2

	a	b	c	d
values
	1	1		2
	-2	2	I'm a string
	1	-2	variable	2
labels
varlab
-2	FILTER	FILTER
1		a
2		b

The missing values in the variables a & b were changed to -2 and the value label of -2 = FILTER was added. The variable d that was generated by the #COMP command is not touched, as the #RECNA command is executed in the very beginning of all commands in the mapping and thus only affects the numeric variables present in the original dataset.

Command block overview

Command block examples

Illustration of labelled datasets

Free sheets

#VARL

#VALL

#AVALL

#RMVAL

#COMP

Example

Example

#IF

Example

Example

Example

#REC

Example

Example

Example

#KG

Example

#SELECT

Example

Example

Example

#RENAME

Example

#ACROSS

Example

Example

Example

curlychop()

Example

Example

Example

Variables sheets

#RENAME_varsheet

Example

#NEWLAB

Example

#DROP

Example

#STR2NUM

Example

#AUTORECODE

Example

Label sheets

#SUMVAR

Example

#NEWVALL

Example

#RECNA

`Free` sheets

`#VARL`

`#VALL`

`#AVALL`

`#RMVAL`

`#COMP`

`#IF`

`curlychop()`

`Variables` sheets

`Label` sheets

`#RECNA`