Refactor Suggestion: Rename ColumnChunk / RowGroup, and their read<...> to ColumnChunkMeta / RowGroupMeta

In `DataFrame.IO.Parquet.Thrift`,

currently we use `ColumnMetaData` and `FileMetadata` (which is clear by looking at their name, they are metadata), and `ColumnChunk`, `RowGroup`

```haskell
data ColumnChunk = ColumnChunk
    { columnChunkFilePath :: String
    , columnChunkMetadataFileOffset :: Int64
    , columnMetaData :: ColumnMetaData
    , columnChunkOffsetIndexOffset :: Int64
    , columnChunkOffsetIndexLength :: Int32
    , columnChunkColumnIndexOffset :: Int64
    , columnChunkColumnIndexLength :: Int32
    , cryptoMetadata :: ColumnCryptoMetadata
    , encryptedColumnMetadata :: BS.ByteString
    }
    deriving (Show, Eq)

data RowGroup = RowGroup
    { rowGroupColumns :: [ColumnChunk]
    , totalByteSize :: Int64
    , rowGroupNumRows :: Int64
    , rowGroupSortingColumns :: [SortingColumn]
    , fileOffset :: Int64
    , totalCompressedSize :: Int64
    , ordinal :: Int16
    }
    deriving (Show, Eq)
```

The issue:
these records does not contain actual bytes, they are also metadata, but their name suggests they do, `readRowGroup`, `readColumnChunk`  also suggests it loads entire row group  / column chunk into memory as a `RowGroup` / `ColumnChunk`. (confused me a bit)

It would be conceptually cleaner if we rename them to suggest they are just metadata (contain no actual bytes).

The risk is that, currently there is no export control and the entire module is exposed-module, rename would cause a public API change.

This is a simple change, I can submit a PR if needed. Alternatively we can add haddock documentation to the `readColumnChunk / readRowGroup` functions and their data types to reduce confusion.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor Suggestion: Rename ColumnChunk / RowGroup, and their read<...> to ColumnChunkMeta / RowGroupMeta #183

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Refactor Suggestion: Rename ColumnChunk / RowGroup, and their read<...> to ColumnChunkMeta / RowGroupMeta #183

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions