On a screen, for posterity

18th February, 2019

Natraj Kumar knows what William Shakespeare’s scribbled writing looks like. He is duty-bound not to reveal what he read, but when you ask him to describe the handwriting itself, he sums it up in one word: “Difficult. Not only because of the slanted cursive scrawl, but also because of the words he used, old English words,” says the Vice President of HTC Global Services, whose team in Chennai was responsible for digitising that draft of William Shakespeare’s work for archival purposes.

Whom the archive was being built for, Kumar is not at liberty to say. What he can say, however, is that his clientelle includes the British Library, National Library of Australia, National Library of The Netherlands and United States’ Library of Congress. So, gems like a draft by Shakespeare — though rare — are not quite unheard of, and keep cropping up amid HTC’s large volumes of daily data.

“We scan about three to four million pages a week,” says Kumar. These include everything from daily editions of some of UK’s leading newspapers and journals published by colleges to full-fledged, humongous cartographic maps. And it’s not just a matter of mere conversion to digital form — each piece of writing, heading, photograph and caption has to be individually tagged and made responsive to search algorithms. But these are just some of the more run-of-the-mill assignments. Digitisation is a service that can come in handy in numerous sectors.

Even in genealogy. How, you ask? By performing a task as seemingly routine as the digitisation of students’ photographs from old college yearbooks. “The uploaded photographs were used for facial recognition,” explains Kumar, adding that the project helped people in the US trace their relatives, who had moved multiple cities over multiple years.

The process itself is a long one. It involves cleaning up of scanned data that comes in from around the world, enhancing poor quality photographs, segmentation of the article to identify and capture its different components. “Segementation is of two types. One is of routine material, and the other is for specialised matter that has to be read and understood,” says Kumar. For that, HTC hires students of the subject to comb through the matter and help categorise it.

The next step is character recognition, that helps capture keywords and metadata. “Once all the data is collected, it is assembled into a deliverable file, which goes through strict quality control measures,” before being sent back to the client. The entire process for — say, a newspaper edition — takes six to seven days. “But that is more of a comfortable estimate,” says Kumar, “There are US-based and Singapore-based newspapers for whom we do the job in two hours.”

This is the simpler procedure. Things get more complex when the product is an ancient manuscript, or a valuable book. Kumar explains the challenges that his teams go through when trying to scan the contents of national libraries — “some of the papers are so old that they might crumble if there is too much light, or if they aren’t handled gently enough” — even as he weaves in and out of a maze of staircases, computers bays and private offices to a large, dark room at the end of a series of smaller ones. At the far end of the room sits a man in utter darkness, wearing blue rubber gloves, working with the soft light emitted by a specialised scanning machine. He carefully stowes away the project he was working on, to demonstrate the machine itself. It is big enough to hold an open cartographic map. The scanning surface is split in two halves, the height of each of which can be adjusted. “Sometimes, we scan large books that cannot be unbound or cut. But the curve near the spine is not easy to scan, so we raise one face of the book to flatten out the other,” explains Kumar.

He seems proud of the machine as he describes it, but the project that remains closest to his heart has nothing to do with impressive publications. “It is always good to receive direct feedback of your work, he says, “I got a chance to do that when one of the projects we worked on was launched in Australia recently. We had digitised children’s songs into vocal and visual versions, so that special needs children could learn them without having to read them. Their teachers were delighted.”

Click here to view the article

More News

Contact Us