PDA

View Full Version : Does QWebView support Unicode characters > 0xFFFF (for Cuneiform fonts)



nickw2066
4th November 2012, 01:53
Hi,

I'm using: PyQt4, OS X 10.7.5

In an editor generated by QWebView I am unable to display five-digit hex unicode code points properly.

For example, in the editor, the character "ð’€€" (U+12000, a Cuneiform sign, using the Neo-Assyrian font at: http://www.hethport.uni-wuerzburg.de/cuneifont/) displays both a "ð’€€" and a square box which is shaded on the left side and has the right side of an "A" on its right side.

When I put a space (" ") in front of the Cuneiform sign the box disappears, but an empty space (which can't be deleted) appears after the Cuneiform sign.

Other five-digit hex symbols (e.g. "ð“€€", U+13000, using the Aegyptus font from: http://users.teilar.gr/~g1951d/) also have the same problem.

Four-digit hex symbols (e.g. "ሀ", U+1200) seem to work fine, without any of these issues.
Mac's TextEdit has no troubles displaying the five-digit hex unicode symbols.

Could the issue be QWebView's handling of Unicode symbols outside the four-digit hex range?

Thanks,

Nick Webb

ChrisW67
6th November 2012, 00:02
This code snippet:


QString cuneiform = QString::fromUtf8("\xF0\x92\x80\x80");
// OR
// uint character = 0x00012000;
// QString cuneiform = QString::fromUcs4(&character, 1);
// OR
// ushort character[] = {0xD808,0xDC00};
// QString cuneiform = QString::fromUtf16(character, 2);
qDebug() << cuneiform;

QWebView w;
w.setHtml(QString("<html><body>++%1++</body></html>").arg(cuneiform));
w.show();

results in the correct glyph in my Linux console and QWebView.

What is the encoding of the HTML file and how are you loading it?

nickw2066
7th November 2012, 20:58
The following displays ++<two empty boxes>++ in a small window for me:



import sys

from PyQt4.QtCore import QString
from PyQt4.QtGui import QApplication, QWidget
from PyQt4.QtWebKit import QWebView

# Constants
CUNE_STR = u"\U00012000"
HTML = r"<html><body>++%1++</body></html>"

# Main

qStr = QString(HTML).arg(CUNE_STR)

app = QApplication(sys.argv)

widget = QWidget()

widget.resize(320, 240)
widget.setWindowTitle("Hello, World!")
widget.show()

webView = QWebView(widget);
webView.setHtml(qStr);
webView.show()

sys.exit(app.exec_())

ChrisW67
7th November 2012, 22:01
Works fine here. What happens if you add:

print type(CUNE_STR), CUNE_STR.encode('utf-8')

Assuming you have a console that handles UTF-8 and has access to the font you should see "<type 'unicode'>" and the correct glyph.

As experiments you could try:


HTML = u"<html><body>++%1++</body></html>"

to force Unicode encoding of the HTML. Also try putting the cuneiform glyph directly into the unicode HTML avoiding the possible mangling that QString::arg() might inflict.

There are some rather unclear rules that apply to Python unicode to QString transforms:
http://www.riverbankcomputing.com/static/Docs/PyQt4/html/gotchas.html

nickw2066
8th November 2012, 06:45
Firstly, thanks for your help.

The following produces "<type 'unicode'>" and the correct glyph:


print type(CUNE_STR), CUNE_STR.encode('utf-8')

This does not fix my problem:


HTML = u"<html><body>++%1++</body></html>"

Finally, putting the cuneiform glyph directly into the unicode HTML does not fix my problem either.

I'm using Python 2.7.2 and I can't find anything on http://www.riverbankcomputing.com/static/Docs/PyQt4/html/gotchas.html that helps solve my problem.

ChrisW67
8th November 2012, 10:14
OK. We know the Python string is OK. If you:


print qStr.toUtf8()

I will guess that the result is not good.

Perhaps this has a different effect:


qStr = QString(HTML).arg(QString(CUNE_STR))

by forcing the CUNE_STR explicitly through the QString constructor

nickw2066
8th November 2012, 19:16
If I use:


HTML = u"<html><body>++\U00012000++</body></html>"
qStr = QString(HTML)
print qStr.toUtf8()

then the HTML string with the correct glyph in it is printed.

If I use:


#!/usr/bin/env python
# -*- coding: UTF-8 -*-
'''
Created on 7 November 2012

@author: nick
'''

import sys

from PyQt4.QtCore import QString
from PyQt4.QtGui import QApplication, QWidget
from PyQt4.QtWebKit import QWebView

# Constants
CUNE_STR = u"\U00012000"
HTML = u"<html><body>++%1++</body></html>"

# Main

qStr = QString(HTML).arg(QString(CUNE_STR))

app = QApplication(sys.argv)

widget = QWidget()

widget.resize(320, 240)
widget.setWindowTitle("Hello, World!")
widget.show()

webView = QWebView(widget);
webView.setHtml(qStr);
webView.show()

sys.exit(app.exec_())

then the same problem occurs: a window pops up with: ++<two empty boxes>++.

ChrisW67
8th November 2012, 19:40
Aarrgghh!! It looks like the Python-esque QString::arg() is mangling your character.

Perhaps you could use Python methods to put the strings together then do a single conversion into QString.

nickw2066
8th November 2012, 20:13
I don't think it's a string-combining problem, as even this:


#!/usr/bin/env python
# -*- coding: UTF-8 -*-
'''
Created on 7 November 2012

@author: nick
'''

import sys

from PyQt4.QtCore import QString
from PyQt4.QtGui import QApplication, QWidget
from PyQt4.QtWebKit import QWebView

# Constants
CUNE_STR = u"\U00012000"
HTML = u"<html><body>++\U00012000++</body></html>"

# Main

#qStr = QString(HTML).arg(QString(CUNE_STR))

app = QApplication(sys.argv)

widget = QWidget()

widget.resize(320, 240)
widget.setWindowTitle("Hello, World!")
widget.show()

webView = QWebView(widget);
webView.setHtml(HTML);
webView.show()

sys.exit(app.exec_())

where I make no use of QString, still only produces a box with: "++<two empty boxes>++"

Considering unicode characters < 0xFFFF, I'm getting some strange behaviour. For example:


#!/usr/bin/env python
# -*- coding: UTF-8 -*-
'''
Created on 7 November 2012

@author: nick
'''

import sys

#from PyQt4.QtCore import QString
from PyQt4.QtGui import QApplication, QWidget
from PyQt4.QtWebKit import QWebView

# Constants
#CUNE_STR = u"\U00012000"
HTML = u"<html><body>++؀؀\u1250++</body></html>"

# Main

#qStr = QString(HTML).arg(QString(CUNE_STR))

app = QApplication(sys.argv)

widget = QWidget()

widget.resize(320, 240)
widget.setWindowTitle("Hello, World!")
widget.show()

webView = QWebView(widget);
webView.setHtml(HTML);
webView.show()

sys.exit(app.exec_())

displays the correct character in the window, but if I change the HTML string to:


HTML = u"<html><body>++؀؀\u1200++</body></html>"

only "++<empty box>++" displays (EDIT: My system seems to have the font to display this glyph because it displays correctly in Mac's Character Viewer).

nickw2066
15th November 2012, 02:23
**bump**

Anyone have any idea about the \u1250 vs. \u1200 problem described immediately above?