Spec: Clarify behavior of special geo objects for lower/upper bounds#12956
Conversation
|
@jiayuasu @paleolimbot fyi, let me know if this captures the meaning |
dc2b842 to
1cc8d28
Compare
paleolimbot
left a comment
There was a problem hiding this comment.
With Jia's edit this makes sense to me...thanks!
Co-authored-by: Jia Yu <jiayu@wherobots.com>
|
Had an offline sync with @jiayuasu and @rdblue , we simplified it:
|
a3d1d21 to
b4ef6a4
Compare
|
|
||
| For `geometry` and `geography` types, `lower_bounds` and `upper_bounds` are both points of the following coordinates X, Y, Z, and M (see [Appendix G](#appendix-g-geospatial-notes)) which are the lower / upper bound of all objects in the file. For the X values only, xmin may be greater than xmax, in which case an object in this bounding box may match if it contains an X such that `x >= xmin` OR`x <= xmax`. In geographic terminology, the concepts of `xmin`, `xmax`, `ymin`, and `ymax` are also known as `westernmost`, `easternmost`, `southernmost` and `northernmost`, respectively. For `geography` types, these points are further restricted to the canonical ranges of [-180 180] for X and [-90 90] for Y. | ||
|
|
||
| Like for other types, null or invalid `geometry` and `geography` objects are skipped when calculating the upper and lower bounds. In contrast, null or invalid (NaN) coordinate values within a `geometry` or `geography` do not lead to the entire object being skipped, instead only that coordinate value itself is omitted for calculation. Note, no bounding box is produced if all x values or all y values in the file are invalid. |
There was a problem hiding this comment.
For other types, only null and NaN values are omitted from the calculation, so I would rephrase this. It doesn't quite work to replace "invalid" with "NaN" though since I think you're talking about objects without coordinates. I think I'd just call out the two cases directly:
When calculating upper and lower bounds for
geometryandgeography, null and NaN values in a coordinate dimension are skipped; for example,POINT (1 NaN)contributes no value to the Y, Z, or M dimension bounds. If a dimension has no non-null or non-NaN values, that dimension is omitted from the bounding box. If either the X or Y dimension is missing then the bounding box itself is not produced.
There was a problem hiding this comment.
Changed:
When calculating upper and lower bounds for `geometry` and `geography`, null or NaN values in a coordinate dimension are skipped; for example, POINT (1 NaN) contributes a value to X but no values to Y, Z, or M dimension bounds. If a dimension has only null or NaN values, that dimension is omitted from the bounding box. If either the X or Y dimension is missing then the bounding box itself is not produced.
- clarified a little more your example (maybe redundant, but thought we should be clear as an example)
- changed double-negatives
25a5859 to
1fc6ba6
Compare
|
LGTM |
paleolimbot
left a comment
There was a problem hiding this comment.
Thank you for iterating on this!
|
Merged to master, thanks for all review! Reference: vote thread: https://lists.apache.org/thread/g7rz2kt12ytd5j2xnbdlk696cxm0d3s2 |
This is to match clarification for: https://github.com/apache/parquet-format/pull/494/files