Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dandas are wrapped alone to the beginning of a line #88

Open
r12a opened this issue Feb 5, 2020 · 8 comments
Open

Dandas are wrapped alone to the beginning of a line #88

r12a opened this issue Feb 5, 2020 · 8 comments
Labels
doc:deva doc:gujr gap i:segmentation Grapheme/word segmentation & selection l:gu Gujurati language & script l:hi Hindi, Devanagari script p:basic s:deva Devanagari script s:gujr Gurajati script x:deva x:gujr

Comments

@r12a
Copy link
Contributor

r12a commented Feb 5, 2020

When the Devanagari phrase separator । U+0964 DEVANAGARI DANDA (called purna viram in Hindi) or ॥ U+0965 DEVANAGARI DOUBLE DANDA (deergh viram in Hindi) are used, some browsers select them with the preceding word on double-click, while in other browsers they are selected separately.

The properties of purna viram and deergh viram should be the same as the properties of FullStop or other punctuation marks, and a new line should not begin with purna viram and deergh viram.

@r12a r12a added i:segmentation Grapheme/word segmentation & selection gap p:basic doc:deva labels Feb 5, 2020
@r12a
Copy link
Contributor Author

r12a commented Feb 5, 2020

The first comment in this issue contains text that will automatically appear in the Devanagari gap-analysis document as a subsection with the same title as this issue. Any edits made to that comment will be immediately available in the document. Proposals for changes or discussion of the content can be made in comments below this point.

@lianghai
Copy link

lianghai commented Mar 7, 2020

“FullStop”, or is there a better categorization that can categorize danda and double danda into the same category of question mark (?) and exclamation mark (!) so a preceding space doesn’t create a line break opportunity either?

@miloush
Copy link

miloush commented Mar 9, 2020

@lianghai I have seen danda printed at the beginning of a line, although it might be just bad typesetting. Do you have a reference to suport forbidding line-break there?

@lianghai
Copy link

@miloush I believe metal type printed books show a general preference of avoiding dandas at the beginning of a line. I don’t have good materials in hands now. I was mostly talking about the situation of <…, letter, space, danda, space, letter, …> though, where a danda is surrounded by a pair of space characters when it’s intended to have balanced wide spacing on both sides of the sentence-terminating mark.

@miloush
Copy link

miloush commented Mar 10, 2020

@lianghai as in here? :)

image

@tiroj
Copy link

tiroj commented Mar 10, 2020

There are two different (and conflicting) practices used in inputting and displaying danda characters. There are reasonable arguments to be made in favour of either practice, but the fact that both exist and are used seems an issue unlikely to go away soon.

Some users always input a space character before the danda, and like their fonts to space the danda accordingly, i.e. to be narrowly and evenly spaced on both sides.

Some users don't input a space character before the danda, and also like their fonts to space the danda accordingly, i.e. with more space on the left side.

The visual result desired by both sets of users is the same: a danda with a roughly equal amount of space on the left and right. But they are used to achieving it in different ways, largely dependent on the typesetting systems and fonts with which they are most familiar. And yes, this means that text ends up differently encoded depending on what font is used.

In our fonts, we've tended to use the second option, with the space built in on the left side of the danda because this is how our clients in India encode their text, without a space character before the danda. These clients are newspaper publishers, and this encoding practice is something they've inherited from previous typesetting technologies. One of the benefits, from their perspective, is that this practice prevents the danda from getting separated from the preceding word at line breaks.

Liang Hai has already convinced me that the line breaking danda handling is something that should be independent of whether there is a space character inserted before the danda or not. But in practice, this isn't something one can rely on yet.

@xfq
Copy link
Member

xfq commented Jan 18, 2024

We can add a link to the Devanagari Layout Requirements (we don't have a Gujarati Layout Requirements yet): https://www.w3.org/International/ilreq/devanagari/#h_line_breaking

@r12a
Copy link
Contributor Author

r12a commented Jan 18, 2024

In the Character Usage app i found 13 languages that use । and 8 that use ॥. These languages use 7 scripts that would fit under the IIP purvue. My orthography notes only indicate relationships to spaces for 2 languages:

a. Hindi. I looked at a few style guides for Hindi (eg. for authors in Microsoft, etc.) and my conclusion was: A number of Hindi style guides consulted require that the danda follow the last letter in the sentence, with no intervening space.

b. Odia. Here a space is expected before । because otherwise it can be confused with a vowel.

Lepcha has it's own version of these punctuation marks.

(Several southeast asian scripts also have their own punctuation that looks like the dandas, and other punctuation besides which may have a special relationship with spaces - eg. the Thai repetition marker.)

@r12a r12a added l:hi Hindi, Devanagari script l:gu Gujurati language & script labels May 1, 2024
@r12a r12a added s:gujr Gurajati script s:deva Devanagari script labels Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc:deva doc:gujr gap i:segmentation Grapheme/word segmentation & selection l:gu Gujurati language & script l:hi Hindi, Devanagari script p:basic s:deva Devanagari script s:gujr Gurajati script x:deva x:gujr
Projects
Status: Under discussion
Development

No branches or pull requests

5 participants