## Introduction

`rquery`

is a data wrangling system designed to express complex data manipulation as a series of simple data transforms. This is in the spirit of `R`

’s `base::transform()`

, or `dplyr`

’s `dplyr::mutate()`

and uses a pipe in the style popularized in `R`

with `magrittr`

. The operators themselves follow the selections in Codd’s relational algebra, with the addition of the traditional `SQL`

“window functions.” More on the background and context of `rquery`

can be found here.

The `R`

/`rquery`

version of this introduction is here, and the `Python`

/`data_algebra`

version of this introduction is here.

In transform formulations data manipulation is written as transformations that produce new `data.frame`

s, instead of as alterations of a primary data structure (as is the case with `data.table`

). Transform system *can* use more space and time than in-place methods. However, in our opinion, transform systems have a number of pedagogical advantages.

In `rquery`

’s case the primary set of data operators is as follows:

`drop_columns`

`select_columns`

`rename_columns`

`select_rows`

`order_rows`

`extend`

`project`

`natural_join`

`convert_records`

(supplied by the`cdata`

package).

These operations break into a small number of themes:

- Simple column operations (selecting and re-naming columns).
- Simple row operations (selecting and re-ordering rows).
- Creating new columns or replacing columns with new calculated values.
- Aggregating or summarizing data.
- Combining results between two
`data.frame`

s. - General conversion of record layouts (supplied by the
`cdata`

package).

The point is: Codd worked out that a great number of data transformations can be decomposed into a small number of the above steps. `rquery`

supplies a high performance implementation of these methods that scales from in-memory scale up through big data scale (to just about anything that supplies a sufficiently powerful `SQL`

interface, such as PostgreSQL, Apache Spark, or Google BigQuery).

We will work through simple examples/demonstrations of the `rquery`

data manipulation operators.