About 112,000 results
Open links in new tab
  1. [#SPARK-29042] Sampling-based RDD with unordered input …

    We have found and fixed the correctness issue when RDD output is INDETERMINATE. One missing part is sampling-based RDD. This kind of RDDs is order sensitive to its input. A …

  2. OutOfMemory issue due to temporary buffer allocation when …

    ASF subversion and git services added a comment - 20/Sep/25 12:24 Commit cdc06e6e4774905c6dd690f59f8494e80f5f4a8f in asterixdb's branch refs/heads/master from …

  3. Block Sampling should adjust number of reducers accordingly to …

    Description Now number of reducers of block sampling is not modified, so that queries like: select c from tab tablesample (1 percent) group by c; can generate huge number of reducers …

  4. TABLESAMPLE with PERCENT throws FAILED: SemanticException …

    Description FAILED: SemanticException 1:68 Percentage sampling is not supported in org.apache.hadoop.hive.ql.io.HiveInputFormat. Error encountered near token '20' when I …

  5. [SPARK-23173] from_json can produce nulls for fields which are …

    The from_json function uses a schema to convert a string into a Spark SQL struct. This schema can contain non-nullable fields. The underlying JsonToStructs expression does not check if a …

  6. [SPARK-22947] SPIP: as-of join in Spark SQL - ASF JIRA

    This approach suffers in performance if sampling data is expensive. For instance, when the data to be sampled is the output of an expensive computation, sampling the data would cause the …

  7. [SPARK-22867] Add Isolation Forest algorithm to MLlib - ASF JIRA

    Sampling data from a Dataset. Data instances are sampled and grouped for each iTree. As indicated in the paper, the number samples for constructing each tree is usually not very large …

  8. [SPARK-15689] Data source API v2 - ASF JIRA

    Nice-to-have: support additional common operators, including limit and sampling. Note that both 1 and 2 are problems that the current data source API (v1) suffers.

  9. PoissonSampler single use speed improvements using a cache

    That improved the PoissonSampler by splitting it into two samplers for the different algorithms (SmallMeanPoissonSampler and LargeMeanPoissonSampler) and moving initialisation …

  10. [SPARK-46094] Support Executor JVM Profiling - ASF JIRA

    Nov 24, 2023 · This feature is to add a low overhead sampling profiler like async-profiler as a built in capability to the Spark job that can be turned on using only user configurable parameters …