Version 0.3.1
Updated 29/06/04
Application view

The point of KHTMLScrub is to remove unwanted tags and attributes from html files. Once the options are set you can apply the options to as many htmls as you want, and KHTMLScrub will also give you a preview of how the altered document looks, before you save them.

This program came about from a simple request at Groklaw about a GUI frontend for html_scrub.

I started coding one up, but had problems making my program behave as I wanted it too. So after a couple of re-writes, I ended up implementing on its own.

Since 0.3 KHTMLScrub is a Kpart application, which means it can be used as a plugin for Quanta.



The current available files are:


To install simply follow these simple steps

	$ tar khtmlscrub-version.tar.gz
	$ cd khtmlscrub-version
	$ ./configure
	$ make
	$ su
	# make install

By default this will install KHTMLScrub under the /usr/local/kde prefix. TTo alter where its installed add --prefix=thepathtoinstall as an option to configure. Usually this is something like /usr or /opt/kde.


The basic operation of KHTMLSCrub is very simple. You set the options you want. Open the document and hit "Apply options" then KHTMLScrub will process the document for you, and provide a HTML preview and a Document Source preview so you can see exactly what KHTMLScrub has done.

You can play around with the options, and keep hitting apply until your happy with the result. Then simply save the document. You can also save and load the options, as well as set them as defaults.

KHTMLScrub tries as much as possible to save the original formatting of the HTML source. Including the case of tags and attributes, even though it uses case insensitive searches itself.


The most important part of KHTMLScrub is the options. this is where you set the behaviour of the program.

Options window

On the left is a list of currently set options. It lists the tag/attribute name and what action should be taken when the program finds that tag/attribute.

Next to the list are three buttons. Which can add, edit or remove options from the list.

At the bottom is the default action setting. Which lets you select what happens to tags, that aren't listed in the options list.

The other button on this page is the Apply Options. This applies the currently set options to the open Document. This is not an accumulative action. Which means you can play about with the options as much as you like, adding and removing them, and its the original document the options are always applied too.

Add/Edit Options Dialog

When you press add or edit option, this is the dialog that appears.

Edit options dialog

Starting at the top, The first thing is the tag itself. followed by the action that should be taken when the tag is encountered/

Keep tags This simply tells the program not to do anything to these tags. If you select this option you can then also add options for the attributes.

Warn about tags When the program is applying the options to the document, it will pop up a dialog asking you what you want to do with each occurrence of the tag.

Remove just the tags All tags of this type are removed. The contents of the tags, are left intact. for instance if you have b for bold and you asked for it to be removed

<b>This was bold </b>
This was bold

Remove the tags, and contents This action will remove the tags, and anything that was between the start and end tags.

If you select "Keep tags" or "Warn about tags" Then you can also have options for the attributes. The first thing is if you want to deal with attributes, you need to select the check box "Separate attributes". You can then select the default action to be taken with attributes you haven't listed.

Simply add the attribute by typing in its name at the bottom, and select the action you would like to be taken. then press add. To remove an attribute, select it in the list and hit remove.


The settings menu has three entries for the options. Load, Save, and Save as default.

You can load and save different options files for KHTMLScrub, with the load, and save options. The Save as default takes the current options, and saves them so that they are loaded everytime the program starts.

You can also load and save configuration files for html_scrub. In the load/save options dialog change the file type too html_scrub configuration.