refactor: centralize TopK heap boundary handling#23091
Conversation
|
run benchmark topk_sorted_tpch |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing refactor/topk-heap-boundary (2a96e34) to 9e68a86 (merge-base) diff using: topk_sorted_tpch File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetopk_sorted_tpch — base (merge-base)
topk_sorted_tpch — branch
File an issue against this benchmark runner |
|
Nice cleanup. I'm wondering if That would make it clearer that the sort-key bytes, scalar values, and prefix row all come from the same row. |
Thank you @geoffreyclaude for the feedback! |
| ) -> Result<Vec<ScalarValue>> { | ||
| let mut scalar_values = Vec::with_capacity(sort_exprs.len()); | ||
| for sort_expr in sort_exprs { | ||
| let expr = Arc::clone(&sort_expr.expr); |
There was a problem hiding this comment.
Small cleanup suggestion: since expr is only used for this one evaluate call, the Arc clone can be avoided.
This could be written directly as:
let value = sort_expr.expr.evaluate(&batch_entry.batch.slice(self.row.index, 1))?;
Which issue does this PR close?
Rationale for this change
TopK derives local heap-boundary data in multiple places. This refactor names that boundary and keeps full sort-key bytes, scalar threshold values, and prefix comparison tied to the same heap row.
What changes are included in this PR?
Are these changes tested?
Yes
Are there any user-facing changes?
No