Post-processing ConTeXt's output for text search
Hello list, I would like to implement online text search of a book I'm publishing with ConTeXt now, however, without making the book itself available online, something like the snippet view in text search on Google Books site. I don't even need snippet view, just page numbers would be sufficient for the beginning. I am not asking for complete solution, of course, rather an advice on direction in which to go. May first vague idea is following: to make queries fast I think it would be useful to obtain a list of words with numbers of pages on which they appear, something very similar to plain index. So perhaps it would be possible to force indexing engine to treat every word in text as if it was an argument of \index command and split out a list in text format which it would be easy to feed into a database. This is just a blind guess which is far from perfect by design, only something that seems easiest to implement. Please, tell me what other approaches would be more promising. Thanks in advance Piotr -- Piotr Kopszak, Ph.D. Polish Art Gallery, National Museum in Warsaw -----------------------------> http://kopszak.mnw.art.pl/ http://www.magnatune.com/artists/altri_stromenti
Hi, you could split your pdf into separate pages, use an full-text search engine such as swish-e to index each of the pages and store the results together with the pagenumber (from the splitted pages). IMO this would be the easiest thing. Patrick
participants (2)
-
Patrick Gundlach
-
Piotr Kopszak