Data stream processing toolkit
This toolkit is written in C++ and is intended for processing huge amounts (gigabytes) of data in an efficient manner. The structure of the data files is user-defined, and the queries are written in a declarative style. Currently included algorithms range from simple filtering on predicates to aggregation and sorting. It also includes some convenience classes (e.g. for accessing BerkeleyDB database).
An example query:
size_t n = until_eof( filter( merge( typed_binary_istream(is1), typed_binary_istream(is2)), ptr_fun(is_even)), ptr_fun(print_int)) ();
is2 are binary files in native machine format containing
sorted unsigned integers. The query merges the two data streams in one single
sorted stream (the merge functor), filters out odd integers (the filter
functor) and finally prints the resulting data stream (
print_int function is defined elsewhere.