新闻 Apache Spark 1.6.2 发布，集群计算环境下载

漂亮的石头 · 2016-06-28

Apache Spark 1.6.2 发布了，Apache Spark 是一种与 Hadoop 相似的开源集群计算环境，但是两者之间还存在一些不同之处，这些有用的不同之处使 Spark 在某些工作负载方面表现得更加优越，换句话说，Spark 启用了内存分布数据集，除了能够提供交互式查询外，它还可以优化迭代工作负载。

Spark 是在 Scala 语言中实现的，它将 Scala 用作其应用程序框架。与 Hadoop 不同，Spark 和 Scala 能够紧密集成，其中的 Scala 可以像操作本地集合对象一样轻松地操作分布式数据集。

尽管创建 Spark 是为了支持分布式数据集上的迭代作业，但是实际上它是对 Hadoop 的补充，可以在 Hadoo 文件系统中并行运行。

改进日志如下：

Sub-task

[SPARK-15613] - Incorrect days to millis conversion

[SPARK-15723] - SimpleDateParamSuite test is locale-fragile and relies on deprecated short TZ name

Bug

[SPARK-8428] - TimSort Comparison method violates its general contract with CLUSTER BY

[SPARK-10722] - Uncaught exception: RDDBlockId not found in driver-heartbeater

[SPARK-11327] - spark-dispatcher doesn't pass along some spark properties

[SPARK-11507] - Error thrown when using BlockMatrix.add

[SPARK-12655] - GraphX does not unpersist RDDs

[SPARK-12712] - test-dependencies.sh script fails when run against empty .m2 cache

[SPARK-12941] - Spark-SQL JDBC Oracle dialect fails to map string datatypes to Oracle VARCHAR datatype

[SPARK-13023] - Check for presence of 'root' module after computing test_modules, not changed_modules

[SPARK-13207] - _SUCCESS should not break partition discovery

[SPARK-13227] - Risky apply() in OpenHashMap

[SPARK-13242] - Moderately complex `when` expression causes code generation failure

[SPARK-13327] - colnames()<- allows invalid column names

[SPARK-13352] - BlockFetch does not scale well on large block

[SPARK-13444] - QuantileDiscretizer chooses bad splits on large DataFrames

[SPARK-13522] - Executor should kill itself when it's unable to heartbeat to the driver more than N times

[SPARK-13566] - Deadlock between MemoryStore and BlockManager

[SPARK-13622] - Issue creating level db file for YARN shuffle service if URI is used in yarn.nodemanager.local-dirs

[SPARK-13631] - getPreferredLocations race condition in spark 1.6.0?

[SPARK-13642] - Properly handle signal kill of ApplicationMaster

[SPARK-13648] - org.apache.spark.sql.hive.client.VersionsSuite fails NoClassDefFoundError on IBM JDK

[SPARK-13652] - TransportClient.sendRpcSync returns wrong results

[SPARK-13697] - TransformFunctionSerializer.loads doesn't restore the function's module name if it's '__main__'

[SPARK-13705] - UpdateStateByKey Operation documentation incorrectly refers to StatefulNetworkWordCount

[SPARK-13711] - Apache Spark driver stopping JVM when master not available

[SPARK-13755] - Escape quotes in SQL plan visualization node labels

[SPARK-13772] - DataType mismatch about decimal

[SPARK-13803] - Standalone master does not balance cluster-mode drivers across workers

[SPARK-13806] - SQL round() produces incorrect results for negative values

[SPARK-13845] - BlockStatus and StreamBlockId keep on growing result driver OOM

[SPARK-13850] - TimSort Comparison method violates its general contract

[SPARK-13901] - We get wrong logdebug information when jump to the next locality level.

[SPARK-13958] - Executor OOM due to unbounded growth of pointer array in Sorter

[SPARK-14006] - Builds of 1.6 branch fail R check

[SPARK-14074] - Use fixed version of install_github in SparkR build

[SPARK-14159] - StringIndexerModel sets output column metadata incorrectly

[SPARK-14187] - Incorrect use of binarysearch in SparseMatrix

[SPARK-14204] - [SQL] Failure to register URL-derived JDBC driver on executors in cluster mode

[SPARK-14219] - Fix `pickRandomVertex` not to fall into infinite loops for graphs with one vertex

[SPARK-14232] - Event timeline on job page doesn't show if an executor is removed with multiple line reason

[SPARK-14243] - updatedBlockStatuses does not update correctly when removing blocks

[SPARK-14261] - Memory leak in Spark Thrift Server

[SPARK-14298] - LDA should support disable checkpoint

[SPARK-14322] - Use treeAggregate instead of reduce in OnlineLDAOptimizer

[SPARK-14357] - Tasks that fail due to CommitDeniedException (a side-effect of speculation) can cause job failure

[SPARK-14363] - Executor OOM due to a memory leak in Sorter

[SPARK-14368] - Support python.spark.worker.memory with upper-case unit

[SPARK-14454] - Better exception handling while marking tasks as failed

[SPARK-14468] - Always enable OutputCommitCoordinator

[SPARK-14495] - Distinct aggregation cannot be used in the having clause

[SPARK-14563] - SQLTransformer.transformSchema is not implemented correctly

[SPARK-14665] - PySpark StopWordsRemover default stopwords are Java object

[SPARK-14671] - Pipeline.setStages needs to handle Array non-covariance

[SPARK-14679] - UI DAG visualization causes OOM generating data

[SPARK-14739] - Vectors.parse doesn't handle dense vectors of size 0 and sparse vectors with no indices

[SPARK-14757] - Incorrect behavior of Join operation in Spqrk SQL JOIN : "false" in the left table is joined to "null" on the right table

[SPARK-14915] - Tasks that fail due to CommitDeniedException (a side-effect of speculation) can cause job to never complete

[SPARK-14965] - StructType throws exception for missing field

[SPARK-15165] - Codegen can break because toCommentSafeString is not actually safe

[SPARK-15209] - Web UI's timeline visualizations fails to render if descriptions contain single quotes

[SPARK-15260] - UnifiedMemoryManager could be in bad state if any exception happen while evicting blocks

[SPARK-15262] - race condition in killing an executor and reregistering an executor

[SPARK-15528] - conv function returns inconsistent result for the same data

[SPARK-15601] - CircularBuffer's toString() to print only the contents written if buffer isn't full

[SPARK-15736] - Gracefully handle loss of DiskStore files

[SPARK-15754] - org.apache.spark.deploy.yarn.Client changes the credential of current user

[SPARK-15892] - Incorrectly merged AFTAggregator with zero total count

[SPARK-15975] - Improper Popen.wait() return code handling in dev/run-tests

[SPARK-16017] - YarnClientSchedulerBackend now registers backends as IPs instead of Hostnames which causes all tasks to run with RACK_LOCAL locality.

[SPARK-16035] - The SparseVector parser fails checking for valid end parenthesis

[SPARK-16086] - Python UDF failed when there is no arguments

[SPARK-16173] - Can't join describe() of DataFrame in Scala 2.10

Documentation

[SPARK-14618] - RegressionEvaluator doc out of date

[SPARK-15223] - spark.executor.logs.rolling.maxSize wrongly referred to as spark.executor.logs.rolling.size.maxBytes

Improvement

[SPARK-13599] - Groovy-all ends up in spark-assembly if hive profile set

[SPARK-13601] - Invoke task failure callbacks before calling outputstream.close()

[SPARK-13663] - Upgrade Snappy Java to 1.1.2.1

[SPARK-13810] - Add Port Configuration Suggestions on Bind Exceptions

[SPARK-14058] - Incorrect docstring in Window.orderBy

[SPARK-14107] - PySpark spark.ml GBT algs need seed Param

[SPARK-14149] - Log exceptions in tryOrIOException

[SPARK-14242] - avoid too many copies in network when a network frame is large

[SPARK-14787] - Upgrade Joda-Time library from 2.9 to 2.9.3

[SPARK-15205] - Codegen can compile the same source code more than twice

[SPARK-15827] - Publish Spark's forked sbt-pom-reader to Maven Central

New Feature

[SPARK-11515] - QuantileDiscretizer should take random seed

[SPARK-13465] - Add a task failure listener to TaskContext

Apache Spark 1.6.2 发布，集群计算环境下载地址

登录或注册

新闻 Apache Spark 1.6.2 发布，集群计算环境下载

漂亮的石头版主管理成员

登录或注册

新闻 Apache Spark 1.6.2 发布，集群计算环境 下载

漂亮的石头 版主 管理成员

新闻 Apache Spark 1.6.2 发布，集群计算环境下载

漂亮的石头版主管理成员