I do not really understand unicode. I will try to figure out which unicode characters need special consideration, and then make up the specs.
In Unicode the most important dash-or-hyphen-like characters are: U+002D HYPHEN-MINUS (-): The “usual” ASCII character with an ambiguous meaning (hyphen? minus?); U+00AD SOFT HYPHEN (): Indicate a line break opportunity, no visible glyph; U+2010 HYPHEN (‐): Carries the “hyphenation” meaning of hyphen-minus; preferred over the latter to indicate a visible hyphen; U+2011 NON-BREAKING HYPHEN (‑): Well ... a hyphen, but non-breaking; U+2012 FIGURE DASH (‒): Same ambiguous meaning as hyphen-minus, but has the same width as digits; U+2013 EN DASH (–): Used to indicate ranges of values (1910–2007); the equivalent to TeX's “--” ligature; U+2014 EM DASH (—): Used to separate quotes—like this—; the equivalent to TeX's “---” ligature. The above is an extract of the “Dashes and hyphen” paragraph of section 6.2 of the Unicode Standard (http://www.unicode.org/versions/Unicode5.0.0/ch06.pdf). You might also want to look into the Unicode line breaking properties for a complete description (http://www.unicode.org/reports/tr14/). I can summarize that for you if you want. Arthur