htmlcleaner - HTML parser written in Java

Distribution: Mageia 6.0
Repository: Mageia Core i586
Package name: htmlcleaner
Package version: 2.2.1
Package release: 8.mga6
Package architecture: noarch
Package type: rpm
Installed size: 121.45 KB
Download size: 115.95 KB
Official Mirror:
HtmlCleaner is open-source HTML parser written in Java. HTML found on Web is usually dirty, ill-formed and unsuitable for further processing. For any serious consumption of such documents, it is necessary to first clean up the mess and bring the order to tags, attributes and ordinary text. For the given HTML document, HtmlCleaner reorders individual elements and produces well-formed XML. By default, it follows similar rules that the most of web browsers use in order to create Document Object Model. However, user may provide custom tag and rule set for tag filtering and balancing.



  • htmlcleaner == 2.2.1-8.mga6
  • mvn(net.sourceforge.htmlcleaner:htmlcleaner) == 2.2.1
  • mvn(net.sourceforge.htmlcleaner:htmlcleaner:pom:) == 2.2.1

    Install Howto

    1. Enable Mageia Core repository on "Install and Remove Software"
    2. Update packages list:
      # urpmi.update -a
    3. Install htmlcleaner rpm package:
      # urpmi htmlcleaner


    • /usr/share/doc/htmlcleaner/licence.txt
    • /usr/share/java/htmlcleaner/htmlcleaner.jar
    • /usr/share/maven-metadata/htmlcleaner.xml
    • /usr/share/maven-poms/htmlcleaner/htmlcleaner.pom


    2017-05-19 - neoclust <neoclust> 2.2.1-8.mga6 + Revision: 1103187 - Fix BuildRequires - Fix BuildRequires

    2016-03-02 - neoclust <neoclust> 2.2.1-7.mga6 + Revision: 982616 - Second rebuild of the java stack

    2016-02-28 - umeabot <umeabot> 2.2.1-6.mga6 + Revision: 980456 - Mageia 6 Mass Rebuild

    2014-10-15 - umeabot <umeabot> 2.2.1-5.mga5 + Revision: 749665 - Second Mageia 5 Mass Rebuild - Mageia 5 Mass Rebuild

    2014-05-24 - dmorgan <dmorgan> 2.2.1-3.mga5 + Revision: 625383 - imported package htmlcleaner