Skip to content

Refactor Suggestion: Rename ColumnChunk / RowGroup, and their read<...> to ColumnChunkMeta / RowGroupMeta #183

@Eiko-Tokura

Description

@Eiko-Tokura

In DataFrame.IO.Parquet.Thrift,

currently we use ColumnMetaData and FileMetadata (which is clear by looking at their name, they are metadata), and ColumnChunk, RowGroup

data ColumnChunk = ColumnChunk
    { columnChunkFilePath :: String
    , columnChunkMetadataFileOffset :: Int64
    , columnMetaData :: ColumnMetaData
    , columnChunkOffsetIndexOffset :: Int64
    , columnChunkOffsetIndexLength :: Int32
    , columnChunkColumnIndexOffset :: Int64
    , columnChunkColumnIndexLength :: Int32
    , cryptoMetadata :: ColumnCryptoMetadata
    , encryptedColumnMetadata :: BS.ByteString
    }
    deriving (Show, Eq)

data RowGroup = RowGroup
    { rowGroupColumns :: [ColumnChunk]
    , totalByteSize :: Int64
    , rowGroupNumRows :: Int64
    , rowGroupSortingColumns :: [SortingColumn]
    , fileOffset :: Int64
    , totalCompressedSize :: Int64
    , ordinal :: Int16
    }
    deriving (Show, Eq)

The issue:
these records does not contain actual bytes, they are also metadata, but their name suggests they do, readRowGroup, readColumnChunk also suggests it loads entire row group / column chunk into memory as a RowGroup / ColumnChunk. (confused me a bit)

It would be conceptually cleaner if we rename them to suggest they are just metadata (contain no actual bytes).

The risk is that, currently there is no export control and the entire module is exposed-module, rename would cause a public API change.

This is a simple change, I can submit a PR if needed. Alternatively we can add haddock documentation to the readColumnChunk / readRowGroup functions and their data types to reduce confusion.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions