Tuesday 16 September 2008

Digitization blues

Die paper die!!

We all know that it is very Important to adopt digital ways to create and store our work (documents, reports etc) in public administrations so that we can create step-by-step a digital library... build our institutional memory so that we can easily access and share our work ... both inside the administration and among Institutions and with the citizen of course ... in the present and the future... Amen!

That is absolutely the way to go ...

As we said in some other post this is often Not! the case ...

Or lets say that while we have mostly adopted digital ways to create our work (almost no-one uses hand-written documents anymore), we still missed the point in how to further benefit from the digital form all this information is in, and in many cases we quickly reconvert it in its analogue equivalents... printing it (paper stays, remeber?), filing it in our binders, piling them on our desks, even still faxing copies to colleagues or circulating printed documents manually by internal mail... to eventually bin most of it in our nice yellow "paper recycle" bins.

[By the way do we know how many forests we consume a year?]

Nevertheless, there are policies and measures that try to remedy to this situation, and to incite people to do more work digitally than with paper...

One of the tools that were put at staffers disposal to achieve this was the state-of-the art massive and very expensive photocopy/ print/scanner machines ... which beyond printing ... they can also scan documents into PDFs and ... even mail the file to someone!

One would say ... ok ... thats a start ...
At least those that want to can now take the paper-only documents they have and convert them into digital form so that they will never need to make another photocopy of it but just mail it!

Great !! So??

I recently realized that there is a huuuge misunderstanding of what we mean by "digitizing" ... and not only at staffers level, but at the level of those who decide policy on Information and document management...

What these versatile machines are actually doing is they take in a "paper only" document (most of the times a printed version of a document produced in MS Word somewhere in the administration) and generate a digital copy of it ... BUT ... as an image!!

That means that these newly "digitised" or "re-digitised" documents are in fact turned into pictures, completely useless and "unreadable" for any modern knowledge / content management / content indexing application ... unless of course you pass it through an OCR system ... to convert it into word again...

That means that we are massively "stupidifying" (i don't know how else to put it...) our valuable knowledge and information, moreover making it huge in size (cos image files tend to be big... reealy big...)

We are in 2008 creating masses of documents that we eventually store in our shared drives and document management systems (...) that will need to be re-OCR-ed in order to be used in any intelligent way ... someday...

What did you say ??

XML ??? Interoperability??? Transparency ???

oh ... yes ! absolutely !