zvrba/ software/ data stream toolkit

Data stream processing toolkit

This toolkit is written in C++ and is intended for processing huge amounts (gigabytes) of data in an efficient manner. The structure of the data files is user-defined, and the queries are written in a declarative style. Currently included algorithms range from simple filtering on predicates to aggregation and sorting. It also includes some convenience classes (e.g. for accessing BerkeleyDB database).

An example query:

size_t n = until_eof(
  filter(
    merge(
      typed_binary_istream(is1),
      typed_binary_istream(is2)),
    ptr_fun(is_even)),
  ptr_fun(print_int)) ();

Here, is1 and is2 are binary files in native machine format containing sorted unsigned integers. The query merges the two data streams in one single sorted stream (the merge functor), filters out odd integers (the filter functor) and finally prints the resulting data stream (until_eof combined with print_int). The print_int function is defined elsewhere.

The toolkit is released under a BSD-style license. code (.tar.bz2; ~64kB) PGP signature documentation (PDF; ~100 kB)