diff options
author | gal salomon <gal.salomon@gmail.com> | 2021-04-12 07:54:37 +0200 |
---|---|---|
committer | gal salomon <gal.salomon@gmail.com> | 2022-01-12 22:15:21 +0100 |
commit | e3254b630601f454d349c79b2486403fb99470e5 (patch) | |
tree | 56b71e26ac417871112b356391d77d26310a9432 /src/s3select | |
parent | Merge pull request #44473 from johnbent/patch-1 (diff) | |
download | ceph-e3254b630601f454d349c79b2486403fb99470e5.tar.xz ceph-e3254b630601f454d349c79b2486403fb99470e5.zip |
parquet implementation:
(1) adding arrow/parquet to make(install is missing)
(2) s3select-operation contains 2 flows CSV and Parquet
(3) upon parquet-flow s3select processing engine is calling (via callback) to get-size and range-request, the range-requests are a-sync, thus the caller is waiting until notification.
(4) flow : execute --> s3select --(arrow layer)--> range-request --> GetObj::execute --> send_response_data --> notify-range-request --> (back-to) --> s3select
(5) on parquet flow the s3select is handling the response (using call-backs) because of aws-response-limitation (16mb)
add unique pointer (rgw_api); verify magic number for parquet objects; s3select module update
fix buffer-over-flow (copy range request)
change the range-request flow. now,it needs to use the callback parametrs (ofs & len) and not to use the element length
refactoring. seperate the CSV flow from the parquet flow, a phase before adding conditional build(depend on arrow package installation)
adding arrow/parquet installation to debian/control
align s3select repo with RGW (missing API"s, such as get_error_description)
undefined reference to arrow symbol
fix comment: using optional_yield by value
fix comments; remove future/promise
s3select: a leak fix
s3select: fixing result production
s3select,s3tests : parquet alignments
typo: git-remote --> git_remote
s3select: remove redundant comma(end of projections); bug fix in parquet flow upon aggregation queries
adding arrow/parquet
editorial. remove blank lines
s3select: merged with master(output serialization,presto alignments)
merging(not rebase) master functionlities into parquet branch
(*) a dedicated source-files for s3select operation.
(*) s3select-engine: fix leaks on parquet flows, enabling allocate csv_object and parquet_object on stack
(*) the csv_object and parquet object allocated on stack (no heap allocation)
move data-members from heap to stack allocation, refactoring, separate flows for CSV and parquet. s3select: bug fix
conditional build: upon arrow package is installed the parquet flow become visable, thus enables to process parquet object. in case the package is not installed only CSV is usable
remove redundant try/catch, s3select: fix compile warning
arrow-devel version should be higher than 4.0.0, where arrow::io::AsyncContext become depecrated
missing sudo; wrong url;move the rm -f arrow.list
replace codename with $(lsb_release -sc)
arrow version should be >= 4.0.0; iocontext not exists in namespace on lower versions
RGW points to s3select/master
s3select submodule
sudo --> $SUDO
Signed-off-by: gal salomon <gal.salomon@gmail.com>
Diffstat (limited to 'src/s3select')
m--------- | src/s3select | 0 |
1 files changed, 0 insertions, 0 deletions
diff --git a/src/s3select b/src/s3select -Subproject f118a761300d7d10b910bc1ba24935e093c064e +Subproject 1609bb2ab5441d2314f56858f4f98fc2be509f8 |