jsoup 1.8.2 发布,此版本提升了 Android,HTML 解析,HTML 生成,查询等方面的性能。同时添加了文件上传,W3C DOM 互操作等功能,还有其他的改进和 bug 修复。 更新内容 改进 Performance improvements for parsing HTML on Android, of 1.5x to 1.9x, with larger parses getting a bigger speed increase. For non-Android JREs, around 1.1x to 1.2x. Dramatic performance improvement in HTML serialization on Android (KitKat and later), of 115x. Improvement by working around a character set encoding speed regression in Android. Performance improvement for the class name selector on Android (.class) of 2.5x to 14x. Around 1.2x on non-Android JREs. File upload support. Added the ability to specify input streams for POST data, which will upload content in MIME multipart/form-data encoding. Add a meta-charset element to documents when setting the character set, so that the document's charset is unambiguous. Added ability to disable TLS (SSL) certificate validation. Helpful if you're hitting a host with a bad cert, or your JDK doesn't support SNI. Added ability to further tweak the canned Cleaner Whitelists by removing existing settings. Added option in Cleaner Whitelist to allow linking to in-page anchors (#) Use a lowercase doctype tag for HTML5 documents. Add support for 201 Created with redirect, and other status codes. Treats any HTTP status code 2xx or 3xx as an OK response, and follow redirects whenever there is a Location header. Added support for HTTP method verbs PUT, DELETE, and PATCH. Added support for overriding the default POST character of UTF-8 in Connection. W3C DOM support: added ability to convert from a jsoup document to a W3C document, with the W3CDom helper class. In the HtmlToPlainText example program, added the ability to filter using a CSS selector. Also clarified the usage documentation. Improved the equals() and hashcode() methods in Node, to consider all their child content, for DOM tree comparisons. Improved performance in Selector when searching multiple roots. Bug 修复 Fixed validation of cookie names in HttpConnection cookie methods. Fixed an issue where option tags would be missed when preparing a form for submission if missing a selected attribute. Fixed an issue where submitting a form would incorrectly include radio and checkbox values without the checked attribute. Fixed an issue where Element.classNames() would return a set containing an empty class; and may have extraneous whitespace. Fixed an issue where attributes selected by value were not correctly space normalized. In head+noscript elements, treat content as character data, instead of jumping out of head parsing. Fixed performance issue when parsing HTML with elements with many children that need re-parenting. Fixed an issue where a server returning an unsupport character set response would cause a runtime UnsupportedCharsetException, instead of falling back to the default UTF-8 charset. Fixed an issue where Jsoup.Connection would throw an IO Exception when reading a page with zero content-length. 更多内容请看发行说明。 OSChina 使用 jsoup 来解析 HTML。 jsoup 是一款 Java 的HTML 解析器,可直接解析某个URL地址、HTML文本内容。它提供了一套非常省力的API,可通过DOM,CSS以及类似于JQuery的操作方法来取出和操作数据。 jsoup的主要功能如下: 从一个URL,文件或字符串中解析HTML; 使用DOM或CSS选择器来查找、取出数据; 可操作HTML元素、属性、文本; jsoup是基于MIT协议发布的,可放心使用于商业项目。 jsoup 1.8.2 发布,HTML 解析器下载地址