summaryrefslogtreecommitdiffstats
path: root/src/s3select
diff options
context:
space:
mode:
authorgal salomon <gal.salomon@gmail.com>2021-04-12 07:54:37 +0200
committergal salomon <gal.salomon@gmail.com>2022-01-12 22:15:21 +0100
commite3254b630601f454d349c79b2486403fb99470e5 (patch)
tree56b71e26ac417871112b356391d77d26310a9432 /src/s3select
parentMerge pull request #44473 from johnbent/patch-1 (diff)
downloadceph-e3254b630601f454d349c79b2486403fb99470e5.tar.xz
ceph-e3254b630601f454d349c79b2486403fb99470e5.zip
parquet implementation:
(1) adding arrow/parquet to make(install is missing) (2) s3select-operation contains 2 flows CSV and Parquet (3) upon parquet-flow s3select processing engine is calling (via callback) to get-size and range-request, the range-requests are a-sync, thus the caller is waiting until notification. (4) flow : execute --> s3select --(arrow layer)--> range-request --> GetObj::execute --> send_response_data --> notify-range-request --> (back-to) --> s3select (5) on parquet flow the s3select is handling the response (using call-backs) because of aws-response-limitation (16mb) add unique pointer (rgw_api); verify magic number for parquet objects; s3select module update fix buffer-over-flow (copy range request) change the range-request flow. now,it needs to use the callback parametrs (ofs & len) and not to use the element length refactoring. seperate the CSV flow from the parquet flow, a phase before adding conditional build(depend on arrow package installation) adding arrow/parquet installation to debian/control align s3select repo with RGW (missing API"s, such as get_error_description) undefined reference to arrow symbol fix comment: using optional_yield by value fix comments; remove future/promise s3select: a leak fix s3select: fixing result production s3select,s3tests : parquet alignments typo: git-remote --> git_remote s3select: remove redundant comma(end of projections); bug fix in parquet flow upon aggregation queries adding arrow/parquet editorial. remove blank lines s3select: merged with master(output serialization,presto alignments) merging(not rebase) master functionlities into parquet branch (*) a dedicated source-files for s3select operation. (*) s3select-engine: fix leaks on parquet flows, enabling allocate csv_object and parquet_object on stack (*) the csv_object and parquet object allocated on stack (no heap allocation) move data-members from heap to stack allocation, refactoring, separate flows for CSV and parquet. s3select: bug fix conditional build: upon arrow package is installed the parquet flow become visable, thus enables to process parquet object. in case the package is not installed only CSV is usable remove redundant try/catch, s3select: fix compile warning arrow-devel version should be higher than 4.0.0, where arrow::io::AsyncContext become depecrated missing sudo; wrong url;move the rm -f arrow.list replace codename with $(lsb_release -sc) arrow version should be >= 4.0.0; iocontext not exists in namespace on lower versions RGW points to s3select/master s3select submodule sudo --> $SUDO Signed-off-by: gal salomon <gal.salomon@gmail.com>
Diffstat (limited to 'src/s3select')
m---------src/s3select0
1 files changed, 0 insertions, 0 deletions
diff --git a/src/s3select b/src/s3select
-Subproject f118a761300d7d10b910bc1ba24935e093c064e
+Subproject 1609bb2ab5441d2314f56858f4f98fc2be509f8