[NTG-context] Bad PDF to text crawlers

Kip Warner kip at thevertigo.com
Wed Aug 19 23:05:51 CEST 2015


Hey list,

I have an important document online that I would prefer to keep as a PDF 
and not in another format. Unfortunately bots frequently try to provide 
those looking for it with a text version they try to extract (beyond my 
control). The extraction looks just absolutely awful and has been a 
major pain in leaving readers with a really bad understanding of the 
contents of the document.

I was thinking that there must be some way of tricking these bots, 
depending on how they are implemented, and let's assume they will always 
find the PDF, to get them to extract only a small invisible layer that 
just contains some hidden text directing a user to the location to 
download the original high quality ConTeXt PDF.

Any suggestions?

-- 
Kip Warner -- Senior Software Engineer
OpenPGP encrypted/signed mail preferred
http://www.thevertigo.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <http://www.ntg.nl/pipermail/ntg-context/attachments/20150819/dd4d93e4/attachment.sig>


More information about the ntg-context mailing list