dendrite.core

col

(col type)(col type encoding)(col type encoding compression)
Returns a column specification. Takes one to three arguments:
- type         the column type symbol (e.g. int)
- encoding     the column encoding symbol (default: plain)
- compression  the column compression symbol (default: none)

See README for all supported encoding/compression types.

custom-types

(custom-types reader)
Returns a map of custom-type to base-type.

eduction

(eduction xform* view)
Returns a seqable, reducible, and foldable view of the application of the transducers to the
records. Transducers are applied in order as if combined with comp. Note that the transducers are applied in
parallel on each bundle (see assembly docs for full explanation), so this function can produce unexpected
results for stateful transducers such as `partition-all` or `distinct`. However, for stateless transducers
such as `map` or `filter` the results will be identical to calling `clojure.core/eduction` on the view, but
faster because applied in parallel.

file-reader

(file-reader file)(file-reader opts file)
Returns a dendrite reader for the provided file.

If provided, the options map supports the following keys:
:custom-types  - a list of custom-type specifications. Default: nil. See docs for full explanation.

file-writer

(file-writer schema file)(file-writer opts schema file)(file-writer opts xform schema file)
Returns a dendrite writer that outputs to a file according to the provided schema.

If the xform argument is passed, the records are striped subject to the provided transducer. Note that this
transducer is applied in the parallel record striping so this can produce unexpected results for stateful
transducers such as `partition-all` or `distinct`. However, stateless transducers such as `map` or
`filter` will produce the expected result.

If provided, the options map supports the following keys:

:data-page-length         The length in bytes of the data pages (default 262144)

:record-group-length      The length in bytes of each record group (default 134217728)

:optimize-columns?        Either :all, :none or :default. If :all, will attempt to optimize the
                          encoding and compression for each column; if :default, will only optimize
                          columns with the default encoding & compression (i.e., plain/none); if :none,
                          disables all optimization.

:compression-thresholds   A map of compression method (e.g., deflate) to the minimum compression ratio
                          (e.g., 2) below which the overhead of compression is not not deemed worthwhile.
                          Default: {'deflate 1.5}

:invalid-input-handler    A function with two arguments: record and exception. If an input record does
                          not conform to the schema, it will be passed to this function along with the
                          exception it triggered. By default, this option is nil and exceptions
                          triggered by invalid records are not caught.

:custom-types             A list of custom-type specifications. See docs for full explanation.

:ignore-extra-fields?     If true (default), ignore record fields that are not part of the schema upon
                          writing to file. If false, will throw an exception if a record contains a field
                          not defined in the schema.

files-reader

(files-reader files)(files-reader opts files)
Returns a dendrite reader on the provided files (a seq of files or string paths). Reads will query each
file in the provided order, opening and closing them as needed. Note that the files-reader should still be
closed to guarantee that all resources are properly released. Accepts all the same options as file-reader.

full-schema

(full-schema reader)
Returns this reader's schema with all encoding and compression annotations.

index-by

(index-by f view)
Returns a view of the records as the output of (f index record), where index goes from 0 (first record) to
num-records - 1 (last record) and record is the assembled record. The works just like a parallelized version
of `clojure.core/map-indexed`. Use this if you need the record's index for further processing. As with read,
this view is seqable, reducible, and foldable. A view can only have a single index-by function applied to it
and that function must be applied before any transducer.

metadata

(metadata reader)(metadata reader opts)
Returns the user-defined metadata for this reader. opts is a map as per clojure.edn/read.

num-records

(num-records reader)
Returns the number of records in the file.

pprint

(pprint schema)
Pretty-prints the schema.

read

(read reader)(read opts reader)
Returns a view of all the records in the reader. This view is seqable (lazy), reducible, and foldable (per
clojure.core.reducers, in which case the folding is done as part of record assembly).

If provided, the options map supports the following keys:

:missing-fields-as-nil?  Set to true (default) or false. If true, then fields that are specified in the
                         query but are not present in this reader's schema will be read as nil values. If
                         false, querying for fields not present in the schema will throw an exception.

:query                   The query. Default: '_. See docs for full explanation.

:sub-schema-in           Path to the desired sub-schema. The value should be a sequence of keys that cannot
                         contain any keys to repeated elements. If both :sub-schema-in and :query are
                         defined, the query applies to the specified sub-schema. See docs for full
                         explanation.

:readers                 A map of query tag symbol to tag function. Default: nil. See docs for full
                         explanation.

read-schema-string

(read-schema-string s)
Parse an edn-formatted dendrite schema string.

req

(req x)
Marks the enclosed schema element as required.

sample

(sample f view)
Returns a view of the records containing only those such that (f index) evaluates truthfully, where index
goes from 0 (first record) to num-records - 1 (last record). The sampling occurs before record assembly
thereby entirely skipping assembly for unselected records. As with read, this view is seqable, reducible,
and foldable. A view can only have a single sample function applied to it and that function must be applied
before any indexing function or transducer.

schema

(schema reader)
Returns this reader's schema.

set-metadata!

(set-metadata! writer metadata)
Sets the user-defined metadata for this writer.

stats

(stats reader)
Returns a map containing all the stats associated with this reader. The tree top-level keys
are :global, :record-groups, and :columns, that, respectively, contain stats summed over the entire file,
summed across all column-chunks in the same record-groups, and summed across all column-chunks belonging to
the same column.

tag

(tag tag elem)
Tags the enclosed query element with the provided tag. Meant to be used in combination with the :readers
option.