Hi, Attached is an xml file that describes the hyphenation pattern files. I'd appreciate checking (some records are incomplete). I'd also like to add (for each language) a couple of tricky hyphenatable words [for testing]. Preferable in utf-8 encoding. There is room for more comments as well, like: prefered input and font encodings etc. Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Hans Hagen wrote:
Hi,
Attached is an xml file that describes the hyphenation pattern files. I'd appreciate checking (some records are incomplete). I'd also like to
I do not known any relation about <copyright> <year>1998-2001</year> <owner>Walter Schmidt</owner> <comment>Adaption to new German orthography</comment> </copyright> in Czech section.
add (for each language) a couple of tricky hyphenatable words [for testing]. Preferable in utf-8 encoding. There is room for more comments as well, like: prefered input and font encodings etc.
Attached. Vit
Hi Vit Sorry for this! - It is corrected in the meantime. Willi Vit Zyka wrote:
Hans Hagen wrote:
Hi,
Attached is an xml file that describes the hyphenation pattern files. I'd appreciate checking (some records are incomplete). I'd also like to
I do not known any relation about <copyright> <year>1998-2001</year> <owner>Walter Schmidt</owner> <comment>Adaption to new German orthography</comment> </copyright> in Czech section.
add (for each language) a couple of tricky hyphenatable words [for testing]. Preferable in utf-8 encoding. There is room for more comments as well, like: prefered input and font encodings etc.
Attached.
Vit
------------------------------------------------------------------------
_______________________________________________ ntg-context mailing list ntg-context@ntg.nl http://www.ntg.nl/mailman/listinfo/ntg-context
Hans Hagen wrote:
Hi,
Attached is an xml file that describes the hyphenation pattern files. I'd appreciate checking (some records are incomplete). I'd also like to add (for each language) a couple of tricky hyphenatable words [for testing]. Preferable in utf-8 encoding. There is room for more comments as well, like: prefered input and font encodings etc.
Hans
Leon "Zlajpah should probably be changed to Leon \v{Z}lajpah (or Žlajpah if in Unicode) (I suppose the information by itself is correct.) (There's a comment "Use of code page 852 in patterns", which probably remained from the "old good DOS times".) Default encoding? 98% use cp1250, some Linux people use latin2. (well, maybe there are still freaks somewhere on this planet using cp852 :) UTF 8 should be(come) standard, but it is coming pretty slowly. But writing UTF8 as the default encoding in ConTeXt should be OK. How can I try it (hyphenation)? I did \enableregime[utf] \mainlanguage[sl] \starttext Železničar \showhyphens{Železničar} \showhyphens{zeleznicar} \showhyphens{mojca pokrajculja} \stoptext It should be že-le-zni-čar (ze-le-zni-car) and moj-ca po-kraj-cu-lja (in latex "mojca" is hyphenated wrong anyway), but in ConTeXt the first two don't get hyphenated at all and the second one becomes mo-j-ca pokra-jcul-ja. I guess the Slovenian patterns are not loaded at all. One more question about language specific issues: we always write 1. first section 2. second section 2.1. subsection 2.2. some other subsection 2.2.1. some subsubsection (with a dot before the space after every (sub)section). What is the best place to store the default settings to? (For all the other Slovenian users as well.) Thank you, Mojca
Mojca Miklavec wrote:
How can I try it (hyphenation)? I did
\enableregime[utf] \mainlanguage[sl] \starttext Železničar \showhyphens{Železničar} \showhyphens{zeleznicar} \showhyphens{mojca pokrajculja} \stoptext
can you send me a zip with that test file? my mailer cq. cut/paste messes up the encoding keep in mind that utf is an input encoding regime, while hyphenation is based on font encodings; in principle each font encoding that has all chars needed for a language can be used, but the patterns need to be loaded also, the name of the pattern files for slovenian have changes (si vs sl), which is one of the reasons for context going to ship its own pattern files Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
I need for my job a regime for Latin5 languages (spec. Turkish). I have made two files regi-lt5.tex and enco-lt5.tex that seem to do a good work. Any comment is useful. luigi ---begin regi-lt5.tex------------------------------------------------------------------- %D \module %D [ file=regi-lt5, %D version=2005.0.0, %D title=\CONTEXT\ Encoding Macros, %D subtitle=TEST fot latin5, %D author=Luigi Scarso, %D date=\currentdate, %D copyright=PRAGMA-ADE] %C %C This module is NOT part of the \CONTEXT\ macro||package. %C rif: %C www.ecma-internaltional.org/pubblications/standards/Ecma-128.htm %C This module is NOT part of the \CONTEXT\ macro||package. \startregime [latin5] %% \defineactivetoken 32 {} % SPACE %% \defineactivetoken 33 {} % EXCLAMATION MARK %% \defineactivetoken 34 {} % QUOTATION MARK %% \defineactivetoken 35 {} % NUMBER SIGN %% \defineactivetoken 36 {} % DOLLAR SIGN %% \defineactivetoken 37 {} % PERCENT SIGN %% \defineactivetoken 38 {} % AMPERSAND %% \defineactivetoken 39 {} % APOSTROPHE %% \defineactivetoken 40 {} % LEFT PARENTHESIS %% \defineactivetoken 41 {} % RIGHT PARENTHESIS %% \defineactivetoken 42 {} % ASTERISK %% \defineactivetoken 43 {} % PLUS SIGN %% \defineactivetoken 44 {} % COMMA %% \defineactivetoken 45 {} % HYPHEN-MINUS %% \defineactivetoken 46 {} % FULL STOP %% \defineactivetoken 47 {} % SOLIDUS %% \defineactivetoken 48 {} % DIGIT ZERO %% \defineactivetoken 49 {} % DIGIT ONE %% \defineactivetoken 50 {} % DIGIT TWO %% \defineactivetoken 51 {} % DIGIT THREE %% \defineactivetoken 52 {} % DIGIT FOUR %% \defineactivetoken 53 {} % DIGIT FIVE %% \defineactivetoken 54 {} % DIGIT SIX %% \defineactivetoken 55 {} % DIGIT SEVEN %% \defineactivetoken 56 {} % DIGIT EIGHT %% \defineactivetoken 57 {} % DIGIT NINE %% \defineactivetoken 58 {} % COLON %% \defineactivetoken 59 {} % SEMICOLON %% \defineactivetoken 60 {} % LESS-THAN SIGN %% \defineactivetoken 61 {} % EQUALS SIGN %% \defineactivetoken 62 {} % GREATER-THAN SIGN %% \defineactivetoken 63 {} % QUESTION MARK %% \defineactivetoken 64 {} % COMMERCIAL AT %% \defineactivetoken 65 {} % LATIN CAPITAL LETTER A %% \defineactivetoken 66 {} % LATIN CAPITAL LETTER B %% \defineactivetoken 67 {} % LATIN CAPITAL LETTER C %% \defineactivetoken 68 {} % LATIN CAPITAL LETTER D %% \defineactivetoken 69 {} % LATIN CAPITAL LETTER E %% \defineactivetoken 70 {} % LATIN CAPITAL LETTER F %% \defineactivetoken 71 {} % LATIN CAPITAL LETTER G %% \defineactivetoken 72 {} % LATIN CAPITAL LETTER H %% \defineactivetoken 73 {} % LATIN CAPITAL LETTER I %% \defineactivetoken 74 {} % LATIN CAPITAL LETTER J %% \defineactivetoken 75 {} % LATIN CAPITAL LETTER K %% \defineactivetoken 76 {} % LATIN CAPITAL LETTER L %% \defineactivetoken 77 {} % LATIN CAPITAL LETTER M %% \defineactivetoken 78 {} % LATIN CAPITAL LETTER N %% \defineactivetoken 79 {} % LATIN CAPITAL LETTER O %% \defineactivetoken 80 {} % LATIN CAPITAL LETTER P %% \defineactivetoken 81 {} % LATIN CAPITAL LETTER Q %% \defineactivetoken 82 {} % LATIN CAPITAL LETTER R %% \defineactivetoken 83 {} % LATIN CAPITAL LETTER S %% \defineactivetoken 84 {} % LATIN CAPITAL LETTER T %% \defineactivetoken 85 {} % LATIN CAPITAL LETTER U %% \defineactivetoken 86 {} % LATIN CAPITAL LETTER V %% \defineactivetoken 87 {} % LATIN CAPITAL LETTER W %% \defineactivetoken 88 {} % LATIN CAPITAL LETTER X %% \defineactivetoken 89 {} % LATIN CAPITAL LETTER Y %% \defineactivetoken 90 {} % LATIN CAPITAL LETTER Z %% \defineactivetoken 91 {} % LEFT SQUARE BRACKET %% \defineactivetoken 92 {} % REVERSE SOLIDUS %% \defineactivetoken 93 {} % RIGHT SQUARE BRACKET %% \defineactivetoken 94 {} % CIRCUMFLEX ACCENT %% \defineactivetoken 95 {} % LOW LINE %% \defineactivetoken 96 {} % GRAVE ACCENT %% \defineactivetoken 97 {} % LATIN SMALL LETTER A %% \defineactivetoken 98 {} % LATIN SMALL LETTER B %% \defineactivetoken 99 {} % LATIN SMALL LETTER C %% \defineactivetoken 100 {} % LATIN SMALL LETTER D %% \defineactivetoken 101 {} % LATIN SMALL LETTER E %% \defineactivetoken 102 {} % LATIN SMALL LETTER F %% \defineactivetoken 103 {} % LATIN SMALL LETTER G %% \defineactivetoken 104 {} % LATIN SMALL LETTER H %% \defineactivetoken 105 {} % LATIN SMALL LETTER I %% \defineactivetoken 106 {} % LATIN SMALL LETTER J %% \defineactivetoken 107 {} % LATIN SMALL LETTER K %% \defineactivetoken 108 {} % LATIN SMALL LETTER L %% \defineactivetoken 109 {} % LATIN SMALL LETTER M %% \defineactivetoken 110 {} % LATIN SMALL LETTER N %% \defineactivetoken 111 {} % LATIN SMALL LETTER O %% \defineactivetoken 112 {} % LATIN SMALL LETTER P %% \defineactivetoken 113 {} % LATIN SMALL LETTER Q %% \defineactivetoken 114 {} % LATIN SMALL LETTER R %% \defineactivetoken 115 {} % LATIN SMALL LETTER S %% \defineactivetoken 116 {} % LATIN SMALL LETTER T %% \defineactivetoken 117 {} % LATIN SMALL LETTER U %% \defineactivetoken 118 {} % LATIN SMALL LETTER V %% \defineactivetoken 119 {} % LATIN SMALL LETTER W %% \defineactivetoken 120 {} % LATIN SMALL LETTER X %% \defineactivetoken 121 {} % LATIN SMALL LETTER Y %% \defineactivetoken 122 {} % LATIN SMALL LETTER Z %% \defineactivetoken 123 {} % LEFT CURLY BRACKET %% \defineactivetoken 124 {} % VERTICAL LINE %% \defineactivetoken 125 {} % RIGHT CURLY BRACKET %% \defineactivetoken 126 {} % TILDE \defineactivetoken 160 {~} % NO-BREAK SPACE \defineactivetoken 161 {\exclamdown} % INVERTED EXCLAMATION MARK \defineactivetoken 162 {\textcent} % CENT SIGN \defineactivetoken 163 {\pound} % POUND SIGN \defineactivetoken 164 {\textcurrency} % CURRENCY SIGN \defineactivetoken 165 {\textyen} % YEN SIGN \defineactivetoken 166 {\textbrokenbar} % BROKEN BAR \defineactivetoken 167 {\sectionmark} % SECTION SIGN \defineactivetoken 168 {\textdiaeresis} % DIAERESIS \defineactivetoken 169 {\copyright} % COPYRIGHT SIGN \defineactivetoken 170 {\textordfeminine} % FEMININE ORDINAL INDICATOR \defineactivetoken 171 {\lowerleftdoubleninequote} % LEFT-POINTING DOUBLE ANGLE QUOTATION MARK \defineactivetoken 172 {\textlognot} % NOT SIGN \defineactivetoken 173 {\softhyphen} % SOFT HYPHEN \defineactivetoken 174 {\registered} % REGISTERED SIGN \defineactivetoken 175 {\registered} % MACRON \defineactivetoken 176 {\textdegree} % DEGREE SIGN \defineactivetoken 177 {\textpm} % PLUS-MINUS SIGN \defineactivetoken 178 {\twosuperior} % SUPERSCRIPT TWO \defineactivetoken 179 {\threesuperior} % SUPERSCRIPT THREE \defineactivetoken 180 {\textacute} % ACUTE ACCENT \defineactivetoken 181 {\mathematics{\mu}} % MICRO SIGN \defineactivetoken 182 {\paragraphmark}% PILCROW SIGN \defineactivetoken 183 {\periodcentered} % MIDDLE DOT \defineactivetoken 184 {\textcedilla} % CEDILLA \defineactivetoken 185 {\onesuperior} % SUPERSCRIPT \defineactivetoken 186 {\textordmasculine} % MASCULINE ORDINAL INDICATOR \defineactivetoken 187 {\lowerrightdoubleninequote} % RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK \defineactivetoken 188 {\onequarter} % VULGAR FRACTION ONE QUARTER \defineactivetoken 189 {\onehalf} % VULGAR FRACTION ONE HALF \defineactivetoken 190 {\threequarter} % VULGAR FRACTION THREE QUARTER \defineactivetoken 191 {\questiondown} % INVERTED QUESTION MARK \defineactivetoken 192 {\Agrave} % LATIN CAPITAL LETTER A WITH GRAVE \defineactivetoken 193 {\Aacute} % LATIN CAPITAL LETTER A WITH ACUTE \defineactivetoken 194 {\Acircumflex} % LATIN CAPITAL LETTER A WITH CIRCUMFLEX \defineactivetoken 195 {\Atilde} % LATIN CAPITAL LETTER A WITH TILDE \defineactivetoken 196 {\Adiaeresis} % LATIN CAPITAL LETTER A WITH DIAERESIS \defineactivetoken 197 {\Aring} % LATIN CAPITAL LETTER A WITH RING \defineactivetoken 198 {\AEligature} % LATIN CAPITAL LETTER AE \defineactivetoken 199 {\Ccedilla} % LATIN CAPITAL LETTER C WITH CEDILLA \defineactivetoken 200 {\Egrave} % LATIN CAPITAL LETTER E WITH GRAVE \defineactivetoken 201 {\Eacute} % LATIN CAPITAL LETTER E WITH ACUTE \defineactivetoken 202 {\Ecircumflex} % LATIN CAPITAL LETTER E WITH CIRCUMFLEX \defineactivetoken 203 {\Ediaeresis} % LATIN CAPITAL LETTER E WITH DIAERESIS \defineactivetoken 204 {\Igrave} % LATIN CAPITAL LETTER I WITH GRAVE \defineactivetoken 205 {\Iacute} % LATIN CAPITAL LETTER I WITH ACUTE \defineactivetoken 206 {\Icircumflex} % LATIN CAPITAL LETTER I WITH CIRCUMFLEX \defineactivetoken 207 {\Idiaeresis} % LATIN CAPITAL LETTER I WITH DIAERESIS \defineactivetoken 208 {\Gbreve} % LATIN CAPITAL LETTER G WITH BREVE \defineactivetoken 209 {\Ntilde} % LATIN CAPITAL LETTER N WITH TILDE \defineactivetoken 210 {\Ograve} % LATIN CAPITAL LETTER O WITH GRAVE \defineactivetoken 211 {\Oacute} % LATIN CAPITAL LETTER O WITH ACUTE \defineactivetoken 212 {\Ocircumflex} % LATIN CAPITAL LETTER O WITH CIRCUMFLEX \defineactivetoken 213 {\Otilde} % LATIN CAPITAL LETTER O WITH TILDE \defineactivetoken 214 {\Odiaeresis} % LATIN CAPITAL LETTER O WITH DIAERESIS \defineactivetoken 215 {\textmultiply} % MULTIPLICATION SIGN \defineactivetoken 216 {\Ostroke} % LATIN CAPITAL LETTER O WITH STROKE \defineactivetoken 217 {\Ugrave} % LATIN CAPITAL LETTER U WITH GRAVE \defineactivetoken 218 {\Uacute} % LATIN CAPITAL LETTER U WITH ACUTE \defineactivetoken 219 {\Ucircumflex} % LATIN CAPITAL LETTER U WITH CIRCUMFLEX \defineactivetoken 220 {\Udiaeresis} % LATIN CAPITAL LETTER U WITH DIAERESIS \defineactivetoken 221 {\Idotaccent} % LATIN CAPITAL LETTER I WITH DOT \defineactivetoken 222 {\Scedilla} % LATIN CAPITAL LETTER S WITH CEDILLA \defineactivetoken 223 {\ssharp} % LATIN SMALL LETTER SHARP S (German) \defineactivetoken 224 {\agrave} % LATIN SMALL LETTER A WITH GRAVE \defineactivetoken 225 {\aacute} % LATIN SMALL LETTER A WITH ACUTE \defineactivetoken 226 {\acircumflex} % LATIN SMALL LETTER A WITH CIRCUMFLEX \defineactivetoken 227 {\atilde} % LATIN SMALL LETTER A WITH TILDE \defineactivetoken 228 {\adiaeresis} % LATIN SMALL LETTER A WITH DIAERESIS \defineactivetoken 229 {\aring} % LATIN SMALL LETTER A WITH RING \defineactivetoken 230 {\aeligature} % LATIN SMALL LETTER AE \defineactivetoken 231 {\ccedilla} % LATIN SMALL LETTER C WITH CEDILLA \defineactivetoken 232 {\egrave} % LATIN SMALL LETTER E WITH GRAVE \defineactivetoken 233 {\eacute} % LATIN SMALL LETTER E WITH ACUTE \defineactivetoken 234 {\ecircumflex} % LATIN SMALL LETTER E WITH CIRCUMFLEX \defineactivetoken 235 {\ediaeresis} % LATIN SMALL LETTER E WITH DIAERESIS \defineactivetoken 236 {\igrave} % LATIN SMALL LETTER I WITH GRAVE \defineactivetoken 237 {\iacute} % LATIN SMALL LETTER I WITH ACUTE \defineactivetoken 238 {\icircumflex} % LATIN SMALL LETTER I WITH CIRCUMFLEX \defineactivetoken 239 {\idiaeresis} % LATIN SMALL LETTER I WITH DIAERESIS \defineactivetoken 240 {\gbreve} % LATIN SMALL LETTER G WITH BREVE \defineactivetoken 241 {\ntilde} % LATIN SMALL LETTER N WITH TILDE \defineactivetoken 242 {\ograve} % LATIN SMALL LETTER O WITH GRAVE \defineactivetoken 243 {\oacute} % LATIN SMALL LETTER O WITH ACUTE \defineactivetoken 244 {\ocircumflex} % LATIN SMALL LETTER O WITH CIRCUMFLEX \defineactivetoken 245 {\otilde} % LATIN SMALL LETTER O WITH TILDE \defineactivetoken 246 {\odiaeresis} % LATIN SMALL LETTER O WITH DIAERESIS \defineactivetoken 247 {\textdiv} % DIVISION SIGN \defineactivetoken 248 {\ostroke} % LATIN SMALL LETTER O WITH STROKE \defineactivetoken 249 {\ugrave} % LATIN SMALL LETTER U WITH GRAVE \defineactivetoken 250 {\uacute} % LATIN SMALL LETTER U WITH ACUTE \defineactivetoken 251 {\ucircumflex} % LATIN SMALL LETTER U WITH CIRCUMFLEX \defineactivetoken 252 {\udiaeresis} % LATIN SMALL LETTER U WITH DIAERESIS \defineactivetoken 253 {\dotlessi} % LATIN SMALL LETTER DOTLESS I \defineactivetoken 254 {\scedilla} % LATIN SMALL LETTER S WITH CEDILLA \defineactivetoken 255 {\ydiaresis} % LATIN SMALL LETTER Y WITH DIAERESIS \stopregime \endinput ---end regi-lt5.tex------------------------------------------------------------------- ---begin enco-lt5.tex------------------------------------------------------------------- % temporary module, needed for downward compatibility \input regi-lt5.tex \enableregime[latin5] \endinput ---end enco-lt5.tex-------------------------------------------------------------------
luigi.scarso wrote:
I need for my job a regime for Latin5 languages (spec. Turkish). I have made two files regi-lt5.tex and enco-lt5.tex that seem to do a good work. Any comment is useful.
do you also need a real font encoding (i.e. an enc file and a real enco-lt5 file); this depends on availability of the glyphs you need in other encodings as well as hyphenation; [for instance, are all \namedchars you need part of texnsnsi and/or ec?]
---begin regi-lt5.tex-------------------------------------------------------------------
looks ok to me
enco-lt5.tex-------------------------------------------------------------------
% temporary module, needed for downward compatibility
\input regi-lt5.tex
\enableregime[latin5]
\endinput
this one is not needed Hans ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
Hans Hagen wrote:
do you also need a real font encoding (i.e. an enc file and a real enco-lt5 file); this depends on availability of the glyphs you need in other encodings as well as hyphenation; [for instance, are all \namedchars you need part of texnsnsi and/or ec?]
I should like to do this by myself:what can I do ? (I must do this pdf with an Helvetica like fonts (I use \setupbodyfont[pos,ss,11pt] and an old context distro, TeXExec 3.0 - ConTeXt / PRAGMA ADE 1997-2002)).
luigi.scarso wrote:
%C This module is NOT part of the \CONTEXT\ macro||package. %C rif: %C www.ecma-internaltional.org/pubblications/standards/Ecma-128.htm %C This module is NOT part of the \CONTEXT\ macro||package.
do you want this module to be part of the distribution? if so I will change these lines
Hans
absolutely yes: I want this module to be part of the distribution. luigi
On Wed, 23 Feb 2005 18:23:09 +0100, Hans Hagen
Hi,
Attached is an xml file that describes the hyphenation pattern files. I'd appreciate checking (some records are incomplete). I'd also like to add (for each language) a couple of tricky hyphenatable words [for testing]. Preferable in utf-8 encoding. There is room for more comments as well, like: prefered input and font encodings etc.
Hans
Hi Hans, Vietnamese lang uses empty hyphenation pattern. FYI, Q.
VnPenguin wrote:
Vietnamese lang uses empty hyphenation pattern.
ok, se i've now added: <description language='vn'> <comment>Vietnamese needs no patterns.</comment> </description> ----------------------------------------------------------------- Hans Hagen | PRAGMA ADE Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com | www.pragma-pod.nl -----------------------------------------------------------------
participants (7)
-
h h extern
-
Hans Hagen
-
luigi.scarso
-
Mojca Miklavec
-
Vit Zyka
-
VnPenguin
-
Willi Egger