Technorama

An omnibus of tech posts by a Futurologist on software development primarily.

Monday, 11 December 2006

 

WebDoc - A document format for all?

We've had PDF and PostScript for many years, they work okay, but they aren't as accessible as web pages are. What we really need is a format which combines XHTML and images into a single file. I call this proposal WebDoc. It could be a compressed ZIP archive with a .webdoc extension (mime type application/webdoc).

Simple to use, click on the file to open it in the web browser which then displays the index.html within. The reason this is better than a PDF or an OpenDocument file is that existing web browsers will be able to display, navigate, bookmark and copy/paste from the WebDoc, no extra PDF Viewer software such as Adobe Reader is required.

WebDoc is a collection of open formats in a ZIP archive, ergo this really opens the doors to accessibility products, such as screen readers or braille displays for the blind. Also automated translations are possible, keeping the flow of the document, and the result as complete as the original; not as nearly as difficult as dealing with PDF files at present! Let's see where we are with this development in a few years time; a vendor might have popularised their own equivalent proposal by then! ;)

Digg!

Labels:


Comments:
Have you checked into the ISO Standard for archiving yet? It includes information on why current HTML renderers may be transitory, the difficulties of many package formats, more:
http://www.iso.org/iso/en/CatalogueDetailPage.CatalogueDetail?CSNUMBER=38920

You may want to check into the actual screenreader story for PDF files too... here's an entrypoint:
http://www.adobe.com/accessibility/

jd/adobe
 
John, Thanks for the links, I looked at the ISO draft. Couldn't spot an HTML rendering section, but yes I agree that HTML layout is often transitory because it is not fully prescribed. This means a reflow of the document is possible for a small mobile screen. Opportunity to display the document in a different layout is a great asset IMHO. Page size guidance could be included in the WebDoc meta data, so users could view in the original format if their screen was suitable.

Going onto the related topic of PDF:
Reprocessing documents in open text based formats is very easy, which is another reason XHTML is a good option for my WebDoc proposal. PDF import and decomposition into a different format is more difficult in my experience, and even then the results are far from equivalent.

Take for example the Gowers Review of Intellectual Property, it would be great to convert this into a commentable wiki on QuickTopic. However, using the tools available results in a document which has spaces within words and line-wrapped text. Perhaps some of the original mark-up is lost when it is converted to PDF. Unfortunately even Adobe's Online PDF to HTML Converter was also unable to convert the Gowers PDF into HTML. Using Adobe Reader and copying and pasting the contents into OpenOffice loses all the images and font colours.

Perhaps this is just a problem with tools not being able to decompose the PDF document fully? There do not appear to be as many PDF editing tools as there are XHTML.

The other problem I have seen is that the PDF minor version number is incremented almost yearly. When it is changed, viewers like KPDF and Xpdf display corrupted text until they are updated (viz. Power Inquiry Executive Summary). XHTML is a locked standard.
Jon
 
Post a Comment

Subscribe to Post Comments [Atom]





<< Home

Archives

February 2003   March 2003   April 2003   August 2004   September 2004   December 2004   May 2005   June 2005   December 2006   January 2007   February 2007   March 2007   April 2007   July 2007   August 2007   September 2007   October 2007   November 2007   December 2007   January 2008   February 2008   March 2008   April 2008   May 2008   June 2008   July 2008   August 2008   September 2008   October 2008   November 2008   December 2008   January 2009   February 2009   March 2009   April 2009   September 2009   November 2009   December 2009   January 2010   April 2010   September 2010   October 2010   November 2010   December 2010   January 2011   February 2011   March 2011   April 2011   May 2011   June 2011   July 2011   August 2011   September 2011   October 2011   November 2011   December 2011   January 2012   February 2012   March 2012   April 2012   May 2012   June 2012   July 2012   October 2012   December 2012   March 2013   May 2013   August 2013   September 2013   October 2013   November 2013   March 2014   May 2014   June 2014   July 2014   September 2014   October 2014   December 2014   January 2015   February 2015   March 2015   April 2015   May 2015   June 2015   July 2015   August 2015   September 2015   October 2015   November 2015   December 2015   March 2016   April 2016   May 2016   July 2016   August 2016   September 2016   October 2016   November 2016   December 2016   January 2017   February 2017   March 2017   April 2017   May 2017   June 2017   July 2017   August 2017   September 2017   November 2017   March 2018   April 2018   May 2018   June 2018   August 2018   October 2018   December 2018   January 2019   March 2019   May 2019   August 2019   September 2019   March 2020   April 2020   May 2020   September 2020   October 2020   February 2022   June 2022   July 2022   October 2022   December 2022   February 2023   April 2023   September 2023   October 2023   May 2024   June 2024   July 2024  

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]