top of page
Search
  • Writer's pictureJack Orenstein

Database Access

Updated: Oct 28

Marcel pipes streams of tuples between operators. Relational databases operate on tables, which are sets of tuples, and queries yield sets of tuples. This similarity makes it very easy to add database access to marcel.


You configure database access in the configuration file, ~/.marcel.py. For example:

define_db(name='mgdb',
          driver='psycopg2',
          dbname='megacorp',
          user='hr',
          password='workers_R_us')

This specifies a database profile named mgdb. It accesses a Postgres database using the psycopg2 driver, (only psycopg2 is supported for now). The Postgres database is named megacorp, and we login using the specified user name and password.


We could configure other profiles, to access other databases, and to connect as different users. We can specify a default profile, to avoid having to specify a profile on every database access:

DB_DEFAULT = 'mgdb'

Now that the database is configured, we can access the database using the sql operator. For example, to retrieve all rows from the Employee table:

sql 'select * from Employee'

The query yields all rows from the Employee table. These are turned into Python tuples by the database driver, then the sql operator writes these tuples into its output pipe. This allows the piping of database output to other marcel operators. E.g., we could pipe the rows to another operator which writes them to a file:


sql 'select * from Employee' \
| write --csv /home/hr/Documents/all_emps.csv

The write operator generates CSV-formatted output, and writes to the specified file.


You can also load data using marcel. For example, suppose that you want to record in a database, all processes running on your machine, every second, in a process table, with columns storing time (seconds since the epoch), process pid, and the process command line. You can add data to this table as follows:


timer 1 \
| map (t: (t, processes()) \
| expand 1 \
| map (t, p: (t, p.pid, p.cmdline)) \
| sql --commit 1000 'insert into process(%s, %s, %s)'

  • timer 1: Generate a timestamp every 1 second.

  • map (t: (t, processes()): For each timestamp t, generate a tuple containing t and the list of all current processes.

  • expand 1: Expand each (t, [p1, p2, ...]) tuple into distinct tuples for each process, (t, p1), (t, p2), ...

  • map (t, p: ...): Map each (t, process) tuple into (t, process pid, process cmdline).

  • sql ...: Insert a tuple containing time, pid and command line into the process table, committing after every 1000 rows.


63 views0 comments

Recent Posts

See All

A number of ideas floating around in my head are coming together around the idea of adding sessions to marcel. Here are the pieces: Similar to bash, there is one file that accumulates the history of e

Wow! Somebody submitted another project of mine, Object Shell to Hacker News. Marcel supercedes Object Shell, and I guess someone on HN then submitted this website. And that submission reached the fro

bottom of page