top of page
Pipelines

A sequence of operators, connected by pipes, is a pipeline. While marcel usage often involves just running one pipeline and seeing the results, you can also store pipelines for later use, similar to the writing of a function in a programming language.


For example, here is a pipeline that searches the current directory recursively, and lists the files that have been modified in the past day:

M 0.18.3 jao@loon: ~/git/marcel$ recent = (| ls -fr | select (f: now() - f.mtime < days(1)) |)

This creates a pipeline, and assigns it to the environment variable recent. Running it:

M 0.18.3 jao@loon ~/git/marcel$ recent
-rw-rw-r--   jao    jao       13314   2023 Oct 25 11:14:13   env.py
-rw-rw-r--   jao    jao       10895   2023 Oct 25 11:14:13   main.py
-rw-rw-r--   jao    jao        4793   2023 Oct 25 11:14:13   object/color.py
-rw-rw-r--   jao    jao        8763   2023 Oct 25 11:14:13   object/file.py
-rw-rw-r--   jao    jao       39519   2023 Oct 25 11:14:13   parser.py

You can generalize this, by paramaterizing the pipeline. For example, here is a pipeline that explores the current directory recursively and lists the files that have been modified within a given number of days:

M 0.18.3 jao@loon ~$ recent = (| n: ls -fr | select (f: now() - f.mtime < days(int(n))) |)

You can specify a value for the parameter positionally:

M 0.18.3 jao@loon ~/git/marcel/marcel$ recent 1
-rw-rw-r--   jao    jao       13314   2023 Oct 25 11:14:13   env.py
-rw-rw-r--   jao    jao       10895   2023 Oct 25 11:14:13   main.py
-rw-rw-r--   jao    jao        4793   2023 Oct 25 11:14:13   object/color.py
-rw-rw-r--   jao    jao        8763   2023 Oct 25 11:14:13   object/file.py
-rw-rw-r--   jao    jao       39519   2023 Oct 25 11:14:13   parser.py

Note that in the definition of recent, n is cast to int. This is because recent 1 binds the string '1' to the pipeline parameter n, while the days function requires an int.


If you prefer, you can use a short flag, if the parameter's name is a single-character string:

M 0.18.3 jao@loon ~/git/marcel/marcel$ recent -n 1
-rw-rw-r--   jao    jao       13314   2023 Oct 25 11:14:13   env.py
-rw-rw-r--   jao    jao       10895   2023 Oct 25 11:14:13   main.py
-rw-rw-r--   jao    jao        4793   2023 Oct 25 11:14:13   object/color.py
-rw-rw-r--   jao    jao        8763   2023 Oct 25 11:14:13   object/file.py
-rw-rw-r--   jao    jao       39519   2023 Oct 25 11:14:13   parser.py

You can also use a long option, for any parameter, e.g. recent --n 1.

Pipelines can also be used as arguments to operators. This is especially useful when combined with pipelines that store streams in variables. Example:

M 0.18.3 jao@loon ~/git/marcel$ ps | ifelse (p: p.username == 'root') (| >$ root |) >$ other

The ps operator generates a stream of Process objects, each representing one current process. The ifelse operator evaluates a predicate for each input, in this case, checking whether the Process, p, is owned by root. If the predicate evaluates to true, then p is passed to the bracketed pipeline immediately following the predicate, (| >$ root |). This pipeline stores its input stream in a variable named root. If the predicate evaluates to false, then p is passed downstream, to be stored in a variable named other. I.e., we have split the stream of Processes into two streams, and stored each. (There is also an ifthen operator. It is just like ifelse except that all inputs are passed downstream. I.e., an item causing the predicate to evaluate to true will be passed to both pipelines, the pipeline argument, and downstream.)

Pipelines storing and loading variables are particularly useful with marcel's set operators. For example, suppose that we explore a directory recursively, once to find recently modified files, and once to find .py files:

M 0.18.3 jao@loon ~/git/marcel$ ls -fr | select (f: now() -f.mtime < days(1)) >$ recent

M 0.18.3 jao@loon ~/git/marcel$ ls -fr | select (f: f.suffix == '.py') >$ py

We can now use set operators with these variables as inputs. For example, to find recently updated . files:

M 0.18.3 jao@loon ~/git/marcel$ recent <$ intersect (| py <$ |)

-rw-rw-r--   jao    jao       13314   2023 Oct 25 13:53:07   env.py
-rw-rw-r--   jao    jao       10895   2023 Oct 25 13:53:07   main.py
-rw-rw-r--   jao    jao        4793   2023 Oct 25 13:53:07   object/color.py
-rw-rw-r--   jao    jao        8763   2023 Oct 25 13:53:07   object/file.py
-rw-rw-r--   jao    jao       39519   2023 Oct 25 13:53:07   parser.py

This passes the recent files to the intersect operator. The second input to intersect comes from the pipeline argument, which loads contents of the stream stored in the py variable. (Marcel also has the other set operators, difference and union. There is also a join operator,  inspired by relational algebra.)

bottom of page