Available on crate feature io_parquet only.
Expand description

APIs to read from Parquet format.

Re-exports

pub use parquet2::fallible_streaming_iterator;
pub use schema::infer_schema;

Modules

API to perform page-level filtering (also known as indexes)

APIs to handle Parquet <-> Arrow schemas.

APIs exposing parquet2’s statistics as arrow’s statistics.

Structs

A FallibleStreamingIterator that decompresses CompressedPage into [DataPage].

Metadata for a column chunk.

A descriptor for leaf-level primitive columns. This encapsulates information such as definition and repetition levels and is used to re-assemble nested data.

A CompressedDataPage is compressed, encoded representation of a Parquet data page. It holds actual data and thus cloning it is expensive.

Decompressor that allows re-using the page buffer of [PageIterator].

Metadata for a Parquet file.

An iterator of Chunks coming from row groups of a parquet file.

A fallible Iterator of CompressedDataPage. This iterator reads pages back to back until all pages have been consumed. The pages from this iterator always have None crate::page::CompressedDataPage::selected_rows() since filter pushdown is not supported without a pre-computed page index.

A MutStreamingIterator of pre-read column chunks

An Iterator of Chunk that (dynamically) adapts a vector of iterators of Array into an iterator of Chunk.

Metadata for a row group.

An [Iterator<Item=RowGroupDeserializer>] from row groups of a parquet file.

Enums

A Page is an uncompressed, encoded representation of a Parquet page. It may hold actual data and thus cloning it may be expensive.

Errors generated by this crate

Representation of a Parquet type describing primitive and nested fields, including the top-level schema of the parquet file.

The set of all physical types representable in Parquet

Traits

Trait describing a MutStreamingIterator of column chunks.

A fallible, streaming iterator.

A special kind of fallible streaming iterator where advance consumes the iterator.

Trait describing a FallibleStreamingIterator of Page

Functions

Reads the column indexes of all ColumnChunkMetaData and deserializes them into [Index]. Returns an empty vector if indexes are not available

Reads a FileMetaData from the reader, located at the end of the file.

Asynchronously reads the files’ metadata

An iterator adapter that maps multiple iterators of Pages into an iterator of Arrays.

Decompresses the page, using buffer for decompression. If page.buffer.len() == 0, there was no decompression and the buffer was moved. Else, decompression took place.

Returns a [ColumnIterator] of column chunks corresponding to field.

Returns all ColumnChunkMetaData associated to field_name. For non-nested parquet types, this returns a single column

Returns all ColumnChunkMetaData associated to field_name. For non-nested parquet types, this returns a single column

Creates a new iterator of compressed pages.

Returns a stream of compressed data pages

Reads all columns that are part of the parquet field field_name

Reads all columns that are part of the parquet field field_name

Returns a vector of iterators of Array (ArrayIter) corresponding to the top level parquet fields whose name matches fields’s names.

Returns a vector of iterators of Array corresponding to the top level parquet fields whose name matches fields’s names.

Reads parquets’ metadata syncronously.

Reads parquets’ metadata asynchronously.

Read [PageLocation]s from the ColumnChunkMetaDatas. Returns an empty vector if indexes are not available

Converts a vector of columns associated with the parquet field whose name is Field to an iterator of Array, ArrayIter of chunk size chunk_size.

Type Definitions

Type def for a sharable, boxed dyn Iterator of arrays

Type declaration for a page filter