htmlcleaner - HTML parser written in Java

Property Value
Distribution Mageia 6.1
Repository Mageia Core i586
Package name htmlcleaner
Package version 2.2.1
Package release 8.mga6
Package architecture noarch
Package type rpm
Installed size 121.45 KB
Download size 115.95 KB
Official Mirror
HtmlCleaner is open-source HTML parser written in Java. HTML found on Web is
usually dirty, ill-formed and unsuitable for further processing.
For any serious consumption of such documents, it is necessary to first
clean up the mess and bring the order to tags, attributes and ordinary text.
For the given HTML document, HtmlCleaner reorders individual elements and
produces well-formed XML. By default, it follows similar rules that the most
of web browsers use in order to create Document Object Model. However, user
may provide custom tag and rule set for tag filtering and balancing.


Package Version Architecture Repository
htmlcleaner-2.2.1-8.mga6.noarch.rpm 2.2.1 noarch Mageia Core
htmlcleaner - - -


Name Value
java -
java-headless >= 1:1.6
javapackages-tools -
mvn(org.jdom:jdom) -


Name Value
htmlcleaner == 2.2.1-8.mga6
mvn(net.sourceforge.htmlcleaner:htmlcleaner) == 2.2.1
mvn(net.sourceforge.htmlcleaner:htmlcleaner:pom:) == 2.2.1


Type URL
Binary Package htmlcleaner-2.2.1-8.mga6.noarch.rpm
Source Package htmlcleaner-2.2.1-8.mga6.src.rpm

Install Howto

  1. Enable Mageia Core repository on Install and Remove Software
  2. Update packages list:
    # urpmi.update -a
  3. Install htmlcleaner rpm package:
    # urpmi htmlcleaner




2017-05-19 - neoclust <neoclust> 2.2.1-8.mga6
+ Revision: 1103187
- Fix BuildRequires
- Fix BuildRequires
2016-03-02 - neoclust <neoclust> 2.2.1-7.mga6
+ Revision: 982616
- Second rebuild of the java stack
2016-02-28 - umeabot <umeabot> 2.2.1-6.mga6
+ Revision: 980456
- Mageia 6 Mass Rebuild
2014-10-15 - umeabot <umeabot> 2.2.1-5.mga5
+ Revision: 749665
- Second Mageia 5 Mass Rebuild
- Mageia 5 Mass Rebuild
2014-05-24 - dmorgan <dmorgan> 2.2.1-3.mga5
+ Revision: 625383
- imported package htmlcleaner

See Also

Package Description
htmlcleaner-javadoc-2.2.1-8.mga6.noarch.rpm API documentation for htmlcleaner
htmldoc-1.8.29-1.mga6.i586.rpm Convert HTML documents into PDF or PS format
htmldoc-nogui-1.8.29-1.mga6.i586.rpm Convert HTML documents into PDF or PS format
htmlunit-2.20-1.mga6.noarch.rpm A headless web browser for automated testing
htmlunit-core-js-2.17-5.mga6.noarch.rpm Rhino fork for htmlunit
htmlunit-core-js-javadoc-2.17-5.mga6.noarch.rpm Javadoc for htmlunit-core-js
htmlunit-javadoc-2.20-1.mga6.noarch.rpm API documentation for htmlunit
htop-2.0.2-1.mga6.i586.rpm Interactive text-mode process viewer for Linux
htrace-3.1.0-1.mga6.noarch.rpm Tracing framework for java based distributed systems
httpcomponents-asyncclient-4.1.1-5.mga6.noarch.rpm Apache components to build asynchronous client side HTTP services
httpcomponents-asyncclient-cache-4.1.1-5.mga6.noarch.rpm Apache HttpAsyncClient Cache
httpcomponents-asyncclient-javadoc-4.1.1-5.mga6.noarch.rpm Javadoc for httpcomponents-asyncclient
httpcomponents-asyncclient-parent-4.1.1-5.mga6.noarch.rpm Apache HttpAsyncClient Parent POM
httpcomponents-client-4.5.2-4.mga6.noarch.rpm HTTP agent implementation based on httpcomponents HttpCore
httpcomponents-client-cache-4.5.2-4.mga6.noarch.rpm Cache module for httpcomponents-client