1. XenForo 1.5.14 中文版——支持中文搜索!现已发布!查看详情
  2. Xenforo 爱好者讨论群:215909318 XenForo专区

新闻 jsoup 1.8.2 发布,HTML 解析器 下载

本帖由 漂亮的石头2015-04-15 发布。版面名称:软件资讯

  1. 漂亮的石头

    漂亮的石头 版主 管理成员

    注册:
    2012-02-10
    帖子:
    486,020
    赞:
    46
    jsoup 1.8.2 发布,此版本提升了 Android,HTML 解析,HTML 生成,查询等方面的性能。同时添加了文件上传,W3C DOM 互操作等功能,还有其他的改进和 bug 修复。

    更新内容


    改进


    • Performance improvements for parsing HTML on Android, of 1.5x to 1.9x, with larger parses getting a bigger speed increase. For non-Android JREs, around 1.1x to 1.2x.


    • Dramatic performance improvement in HTML serialization on Android (KitKat and later), of 115x.


    • Improvement by working around a character set encoding speed regression in Android.


    • Performance improvement for the class name selector on Android (.class) of 2.5x to 14x. Around 1.2x on non-Android JREs.


    • File upload support. Added the ability to specify input streams for POST data, which will upload content in MIME multipart/form-data encoding.


    • Add a meta-charset element to documents when setting the character set, so that the document's charset is unambiguous.


    • Added ability to disable TLS (SSL) certificate validation. Helpful if you're hitting a host with a bad cert, or your JDK doesn't support SNI.


    • Added ability to further tweak the canned Cleaner Whitelists by removing existing settings.


    • Added option in Cleaner Whitelist to allow linking to in-page anchors (#)
      Use a lowercase doctype tag for HTML5 documents.


    • Add support for 201 Created with redirect, and other status codes. Treats any HTTP status code 2xx or 3xx as an OK response, and follow redirects whenever there is a Location header.


    • Added support for HTTP method verbs PUT, DELETE, and PATCH.


    • Added support for overriding the default POST character of UTF-8 in Connection.


    • W3C DOM support: added ability to convert from a jsoup document to a W3C document, with the W3CDom helper class.


    • In the HtmlToPlainText example program, added the ability to filter using a CSS selector. Also clarified the usage documentation.


    • Improved the equals() and hashcode() methods in Node, to consider all their child content, for DOM tree comparisons.


    • Improved performance in Selector when searching multiple roots.

    Bug 修复


    • Fixed validation of cookie names in HttpConnection cookie methods.


    • Fixed an issue where option tags would be missed when preparing a form for submission if missing a selected attribute.


    • Fixed an issue where submitting a form would incorrectly include radio and checkbox values without the checked attribute.


    • Fixed an issue where Element.classNames() would return a set containing an empty class; and may have extraneous whitespace.


    • Fixed an issue where attributes selected by value were not correctly space normalized.
      In head+noscript elements, treat content as character data, instead of jumping out of head parsing.


    • Fixed performance issue when parsing HTML with elements with many children that need re-parenting.


    • Fixed an issue where a server returning an unsupport character set response would cause a runtime


    • UnsupportedCharsetException, instead of falling back to the default UTF-8 charset.


    • Fixed an issue where Jsoup.Connection would throw an IO Exception when reading a page with zero content-length.

    更多内容请看发行说明

    OSChina 使用 jsoup 来解析 HTML。

    jsoup 是一款 Java 的HTML 解析器,可直接解析某个URL地址、HTML文本内容。它提供了一套非常省力的API,可通过DOM,CSS以及类似于JQuery的操作方法来取出和操作数据。

    jsoup的主要功能如下:


    1. 从一个URL,文件或字符串中解析HTML;


    2. 使用DOM或CSS选择器来查找、取出数据;


    3. 可操作HTML元素、属性、文本;

    jsoup是基于MIT协议发布的,可放心使用于商业项目。
    jsoup 1.8.2 发布,HTML 解析器下载地址
     
正在加载...