Skip to content

[python] Support PyJindo in pyarrow_file_io#7410

Open
timmyyao wants to merge 3 commits intoapache:masterfrom
timmyyao:pypaimon-use-pyjindo
Open

[python] Support PyJindo in pyarrow_file_io#7410
timmyyao wants to merge 3 commits intoapache:masterfrom
timmyyao:pypaimon-use-pyjindo

Conversation

@timmyyao
Copy link
Contributor

Purpose

Implement a filesystem for OSS backed by PyJindo (https://aliyun.github.io/alibabacloud-jindodata/jindosdk/jindosdk_download/), which provides higher performance and stability for visiting OSS.

from pyarrow.fs import FileInfo, FileSelector, FileType

try:
import pyjindo.fs as jfs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where to install pyjindo? Could not find a version that satisfies the requirement pyjindo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pyjindo has not been uploaded to python repo yet. Currently I just install from local whl (from https://aliyun.github.io/alibabacloud-jindodata/jindosdk/jindosdk_download/).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should upload it to python repo first.

Copy link
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this approach (provide Python InputStream) performs well, we should do some performance testing.

Copy link
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments summary:

  1. Blocker: pyjindo has not been published to PyPI
  2. JindoOutputFile. exit resource leak, If JindoOutputFile is used through the 'with' statement, the underlying stream will not be closed upon exit.
  3. Lack of benchmark data
  4. The Boolean logo is not elegant enough and lacks a user switch option.
  5. No testing coverage (testing added after PyJindo release)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants