Good evening, Jannis! On 2011-11-01 17:16, Jan Heinen wrote:
Today I wrote the function "ConvertToConteXt" which converts special ConTeXt-characters. You can see it below.
The data necessary for converting HTML entities is already in Context, have a look at “char-ent.lua” if you are interested. Based on this you could write the deentitizer (== your “html_entity_decode” function?) + character handler in “pure” Context (no PHP needed) as follows: ··· deent.cld ··················································· thirddata = thirddata or { } thirddata.myfunctions = thirddata.myfunctions or { } local myfunctions = thirddata.myfunctions local entities = characters.entities local utf8byte = unicode.utf8.byte local lpegmatch = lpeg.match local P, R, S, Cs = lpeg.P, lpeg.R, lpeg.S, lpeg.Cs local fmt, stringupper = string.format, string.upper do local s_hex = [[{\char"%s}]] local s_dec = [[{\char%s}]] local replace_hex_entity = function (hexnum) return fmt(s_hex, stringupper(hexnum)) end local replace_dec_entity = function (decimal) return fmt(s_dec, decimal) end local replace_named_entity = function (name) return fmt(s_dec, utf8byte(entities[name])) end local replace_unsafe = function (char) return fmt(s_dec, utf8byte(char)) end --local backslash = P[[\]] --local escaped = backslash / "" * 1 local semicolon = P";" local ucase_letter = R"AZ" local lcase_letter = R"az" local decimal_digit = R"09" local decimal_number = decimal_digit^1 local hex_digit = decimal_digit + R"AF" + R"af" local hex_number = hex_digit^1 local entity_char = ucase_letter + lcase_letter + decimal_digit local entity_chars = entity_char^1 local entity = (P"" / "") * (hex_number / replace_hex_entity) * (semicolon / "") + (P"" / "") * (decimal_number / replace_dec_entity) * (semicolon / "") + (P"&" / "") * (entity_chars / replace_named_entity) * (semicolon / "") local unsafe = S[[{}\$~%]] / replace_unsafe --local p_characters = Cs((escaped + unsafe + entity + 1)^0) local p_characters = Cs((unsafe + entity + 1)^0) myfunctions.convert_to_context = function (str) return lpegmatch(p_characters, str) end end --- Testing ... local someinput = [[ a º b • c ° d B e Ł f } g } h &non-well-formed; i { j } k { l \ m $ n + o - p ^ q _ r @ s ` t ~ u ! v % w « x ]] context.starttext() context(myfunctions.convert_to_context(someinput)) context.stoptext() ·································································
3. Did I forget to convert a character?
Most of the chars you substituted have no special semantics in the first place. Philipp
Before I put it into contextgarden.net ... 1. ... please test it. 2. You see three characters, where I don't know the code-number \char??? for ConTeXt. Do you know them? 3. Did I forget to convert a character?
Regards Jannis
function ConvertToConteXt ( $xstring ) { /* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * author: Jörg Kopp * www.dr-kopp.com * 01.11.2011 * * Convert special ConTeXt-characters with php * Works with PHP5 * * Call it with the string you want to convert ... * ConvertToConteXt ($xstring); * * ... and you get back the converted string * * e.g.: * Input: * $string = "My root-Directory: /home/hans"; * $string = ConvertToConteXt ( $string ); * * Output/Return: * $string = "My root\\char45Directory\\char58 \\char47home\\char47hans"; * * When you write this into a file ... * file_put_contents ( "example.tex", "My root\\char45Directory\char58 \\char47home\\char47hans", FILE_APPEND ); * * ... You will find the following in example.tex: * My root\char45Directory\char58 \char47home\char47hans * * An when you compile example.tex with ConTeXt * context example.text * * You can read the following in the resulting example.pdf: * My root-Directory: /home/hans * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * */
$xstring = html_entity_decode ( $xstring ); // convert HTML-entities into normal characters $xstring = str_replace ( "!", "\\char33", $xstring ); // Ausrufungszeichen/ConvertToConteXt $xstring = str_replace ( "\"", "\\char34", $xstring ); // Anführungszeichen/quotation mark $xstring = str_replace ( "#", "\\char35", $xstring ); // Raute/number sign $xstring = str_replace ( "$", "\\char36", $xstring ); // Dollar-Zeichen/dollar sign $xstring = str_replace ( "%", "\\char37", $xstring ); // Prozent-Zeichen/percent sign $xstring = str_replace ( "&", "\\char38", $xstring ); // Kaufmännisches Und/ampersand $xstring = str_replace ( "'", "\\char39", $xstring ); // Apostroph/apostrophe $xstring = str_replace ( "(", "\\char40", $xstring ); // Klammer auf/left parenthesis $xstring = str_replace ( ")", "\\char41", $xstring ); // Klammer zu/right parenthesis $xstring = str_replace ( "*", "\\char42", $xstring ); // Stern/asterisk $xstring = str_replace ( "+", "\\char43", $xstring ); // Plus/plus sign $xstring = str_replace ( ",", "\\char44", $xstring ); // Komma/comma $xstring = str_replace ( "-", "\\char45", $xstring ); // Minus/hyphen $xstring = str_replace ( ".", "\\char46", $xstring ); // Punkt/period $xstring = str_replace ( "/", "\\char47", $xstring ); // Schrägstrich/period $xstring = str_replace ( ":", "\\char58", $xstring ); // Doppelpunkt/colon $xstring = str_replace ( ";", "\\char59", $xstring ); // Semikolon/semicolon $xstring = str_replace ( "<", "\\char60", $xstring ); // Kleinerzeichen/less-than $xstring = str_replace ( "=", "\\char61", $xstring ); // Gleichzeichen/equals-to $xstring = str_replace ( ">", "\\char62", $xstring ); // Größerzeichen/greater-than $xstring = str_replace ( "?", "\\char63", $xstring ); // Fragezeichen/question mark $xstring = str_replace ( "@", "\\char64", $xstring ); // at-Zeichen/at sign $xstring = str_replace ( "[", "\\char91", $xstring ); // eckige Klammer auf/left square bracket $xstring = str_replace ( "\\", "\\char92", $xstring ); // Backslash/backslash $xstring = str_replace ( "]", "\\char93", $xstring ); // eckige Klammer zu/right square bracket $xstring = str_replace ( "^", "\\char94", $xstring ); // Zirkumflex/caret $xstring = str_replace ( "_", "\\char95", $xstring ); // Unterstrich/underscore //$xstring = str_replace ( "°", "\\char", $xstring ); // Grad/ < ------ missing $xstring = str_replace ( "`", "\\char96", $xstring ); // accent aigu/acute accent $xstring = str_replace ( "{", "\\char123", $xstring ); // geschweifte Klammer auf/left curly brace $xstring = str_replace ( "|", "\\char124", $xstring ); // Pipezeichen/vertical bar $xstring = str_replace ( "}", "\\char125", $xstring ); // geschweifte Klammer zu/right curly brace $xstring = str_replace ( "~", "\\char126", $xstring ); // Tilde/tilde //$xstring = str_replace ( "•", "\\char", $xstring ); // ?/ < ------ missing //$xstring = str_replace ( "º", "\\char", $xstring ); // ?/ < ------ missing
return $xstring; }
___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________
-- () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments