About JTidy

JTidy is a Java port of HTML Tidy , a HTML syntax checker and pretty printer. Like its non-Java cousin, JTidy can be used as a tool for cleaning up malformed and faulty HTML. In addition, JTidy provides a DOM interface to the document that is being processed, which effectively makes you able to use JTidy as a DOM parser for real-world HTML.

JTidy was written by Andy Quick, who later stepped down from the maintainer position. Now JTidy is maintained by a group of volunteers.

More information on JTidy can be found on the JTidy SourceForge project page .

News

November, 2004

New subproject: jtidyservlet

The JTidy Servlet library is an open source suite of custom tags and servlets that integrate JTidy HTML syntax checker and pretty printer functionality into Servlet/JSP container.

JTidyservlet is managed by Vlad Skarzhevskyy, who recently joined the JTidy project.

September 13, 2004

Two major new features expected for the next release have been finally committed to cvs!

  • You can now use any supported Java character encoding for input or output , with the classic tidy encoding hadling replaced by a new implementation that take advantage of built-in java character encoding support.
  • You don't need anymore to parse a text output to extract JTidy messages in your application: you can simply attach a listener using Tidy.setMessageListener() and be notified for error, warnings and summary messages.

What is missing before a release?

  • Cleanup the support for new charsets (still to be refined)
  • Doctype handling (seem to be really different from the current tidy c release)
  • Some more tests, expecially for the dom parser/prettyprinter
  • A working ant task and command line interface

August 20, 2004

More than 50% of the tests are now working, and hundreds of fixes and new features have been ported from the c version. Xml/xhtml output is now fairly more robust. Check out a nightly build and reports any bug found!

March 12, 2004

Nightly builds are now automatically generated daily and the whole website is refresh at the same time. 1/3 of the implemented tests is working now. Two years of reported bugs are difficult to catch up, but the change log starts becoming "important"...

Do you wanna play with a recent build? Get the source or binary distribution from the nightly builds page .

January 21, 2004

Site updated using the latest maven version: test report is a lot more readable now (formatting has been fixed in the latest junit-report plugin)... new site layout (using a tweaked version of the maven xdoc plugin: xhtml + tableless CSS)

January 16, 2004

183 test cases fully implemented now. All the test cases from Tidy and some new tests for JTidy have been added.

All the test cases which caused JTidy to crash or loop have been fixed! Priority (1) is done, now there are other 139 tests failing. Note most of the test are failing at the firsts lines for differences in doctype handling and formatting in Tidy (the latest Tidy release has been used to produce output files for comparison).

These are the priorities before a release:

  • Doctype handling (needed for tests)
  • Formatting (needed for tests)
  • Xml output: making Jtidy always produce valid xml (the well known "duplicate attributes" bug)

Mh, formatting in maven-generated junit report is really bad, I just submitted a bug report to maven: error messages are escaped two times, newlines are not preserved and random whitespaces are added. I think I should spend some time in fixing junit report plugin bugs if I want to be able to fix JTidy bugs...

January 8, 2004

179 test cases for JTidy have been partially implemented and added!

All the test cases for the non java version of Tidy have been integrated. Partially because most of them don't check yet output or warnings produced by Tidy, but simply test that JTidy doesn't crash or loop.

Well, actually as you can see in the junit report we have 1 test causing a NPE and 4 causing infinite loops! These bugs will have the precedence over any incorrect output bug (fixing these will probably worth a new release, you don't want your software to hang using JTidy, right?).

Anyway, in the TidyCrashingBugsTest (test that crashed the c version of Tidy) 21 of the 24 tests works without problems... not so bad as expected.

See testcases , if you wanna help JTidy supplying tests or fixes.

Thanks to the Clover team for the free license for the JTidy project! Code Coverage by Clover

January 6, 2004

JTidy new website is online!

The project is starting again after two years without a release. I (Fabrizio Giustina) just joined the project as new administrator and developer.

Main targets are now:

  • migrate to maven as a build system (done)
  • old code cleanup: remove unused code, clean up everything with the help of checkstyle and pmd , and update code to use new coding conventions .
  • add junit test coverage (started, see junit report and clover report ). I'm trying to integrate all the standard tidy testcases to check that jtidy will behave like its non-Java cousin
  • Finally: integrate all the patches supplied by users in these years and the fixes in the non-Java version

A note about mailing lists: there are two new mailing list, specific to jtidy, see project mailing lists . You can find previous discussions in html-tidy@w3.org archives (common to tidy and jtidy).