View Full Version : Does QWebView support Unicode characters > 0xFFFF (for Cuneiform fonts)
nickw2066
4th November 2012, 01:53
Hi,
I'm using: PyQt4, OS X 10.7.5
In an editor generated by QWebView I am unable to display five-digit hex unicode code points properly.
For example, in the editor, the character "ð’€€" (U+12000, a Cuneiform sign, using the Neo-Assyrian font at: http://www.hethport.uni-wuerzburg.de/cuneifont/) displays both a "ð’€€" and a square box which is shaded on the left side and has the right side of an "A" on its right side.
When I put a space (" ") in front of the Cuneiform sign the box disappears, but an empty space (which can't be deleted) appears after the Cuneiform sign.
Other five-digit hex symbols (e.g. "ð“€€", U+13000, using the Aegyptus font from: http://users.teilar.gr/~g1951d/) also have the same problem.
Four-digit hex symbols (e.g. "ሀ", U+1200) seem to work fine, without any of these issues.
Mac's TextEdit has no troubles displaying the five-digit hex unicode symbols.
Could the issue be QWebView's handling of Unicode symbols outside the four-digit hex range?
Thanks,
Nick Webb
ChrisW67
6th November 2012, 00:02
This code snippet:
QString cuneiform = QString::fromUtf8("\xF0\x92\x80\x80");
// OR
// uint character = 0x00012000;
// QString cuneiform = QString::fromUcs4(&character, 1);
// OR
// ushort character[] = {0xD808,0xDC00};
// QString cuneiform = QString::fromUtf16(character, 2);
qDebug() << cuneiform;
QWebView w;
w.setHtml(QString("<html><body>++%1++</body></html>").arg(cuneiform));
w.show();
results in the correct glyph in my Linux console and QWebView.
What is the encoding of the HTML file and how are you loading it?
nickw2066
7th November 2012, 20:58
The following displays ++<two empty boxes>++ in a small window for me:
import sys
from PyQt4.QtCore import QString
from PyQt4.QtGui import QApplication, QWidget
from PyQt4.QtWebKit import QWebView
# Constants
CUNE_STR = u"\U00012000"
HTML = r"<html><body>++%1++</body></html>"
# Main
qStr = QString(HTML).arg(CUNE_STR)
app = QApplication(sys.argv)
widget = QWidget()
widget.resize(320, 240)
widget.setWindowTitle("Hello, World!")
widget.show()
webView = QWebView(widget);
webView.setHtml(qStr);
webView.show()
sys.exit(app.exec_())
ChrisW67
7th November 2012, 22:01
Works fine here. What happens if you add:
print type(CUNE_STR), CUNE_STR.encode('utf-8')
Assuming you have a console that handles UTF-8 and has access to the font you should see "<type 'unicode'>" and the correct glyph.
As experiments you could try:
HTML = u"<html><body>++%1++</body></html>"
to force Unicode encoding of the HTML. Also try putting the cuneiform glyph directly into the unicode HTML avoiding the possible mangling that QString::arg() might inflict.
There are some rather unclear rules that apply to Python unicode to QString transforms:
http://www.riverbankcomputing.com/static/Docs/PyQt4/html/gotchas.html
nickw2066
8th November 2012, 06:45
Firstly, thanks for your help.
The following produces "<type 'unicode'>" and the correct glyph:
print type(CUNE_STR), CUNE_STR.encode('utf-8')
This does not fix my problem:
HTML = u"<html><body>++%1++</body></html>"
Finally, putting the cuneiform glyph directly into the unicode HTML does not fix my problem either.
I'm using Python 2.7.2 and I can't find anything on http://www.riverbankcomputing.com/static/Docs/PyQt4/html/gotchas.html that helps solve my problem.
ChrisW67
8th November 2012, 10:14
OK. We know the Python string is OK. If you:
print qStr.toUtf8()
I will guess that the result is not good.
Perhaps this has a different effect:
qStr = QString(HTML).arg(QString(CUNE_STR))
by forcing the CUNE_STR explicitly through the QString constructor
nickw2066
8th November 2012, 19:16
If I use:
HTML = u"<html><body>++\U00012000++</body></html>"
qStr = QString(HTML)
print qStr.toUtf8()
then the HTML string with the correct glyph in it is printed.
If I use:
#!/usr/bin/env python
# -*- coding: UTF-8 -*-
'''
Created on 7 November 2012
@author: nick
'''
import sys
from PyQt4.QtCore import QString
from PyQt4.QtGui import QApplication, QWidget
from PyQt4.QtWebKit import QWebView
# Constants
CUNE_STR = u"\U00012000"
HTML = u"<html><body>++%1++</body></html>"
# Main
qStr = QString(HTML).arg(QString(CUNE_STR))
app = QApplication(sys.argv)
widget = QWidget()
widget.resize(320, 240)
widget.setWindowTitle("Hello, World!")
widget.show()
webView = QWebView(widget);
webView.setHtml(qStr);
webView.show()
sys.exit(app.exec_())
then the same problem occurs: a window pops up with: ++<two empty boxes>++.
ChrisW67
8th November 2012, 19:40
Aarrgghh!! It looks like the Python-esque QString::arg() is mangling your character.
Perhaps you could use Python methods to put the strings together then do a single conversion into QString.
nickw2066
8th November 2012, 20:13
I don't think it's a string-combining problem, as even this:
#!/usr/bin/env python
# -*- coding: UTF-8 -*-
'''
Created on 7 November 2012
@author: nick
'''
import sys
from PyQt4.QtCore import QString
from PyQt4.QtGui import QApplication, QWidget
from PyQt4.QtWebKit import QWebView
# Constants
CUNE_STR = u"\U00012000"
HTML = u"<html><body>++\U00012000++</body></html>"
# Main
#qStr = QString(HTML).arg(QString(CUNE_STR))
app = QApplication(sys.argv)
widget = QWidget()
widget.resize(320, 240)
widget.setWindowTitle("Hello, World!")
widget.show()
webView = QWebView(widget);
webView.setHtml(HTML);
webView.show()
sys.exit(app.exec_())
where I make no use of QString, still only produces a box with: "++<two empty boxes>++"
Considering unicode characters < 0xFFFF, I'm getting some strange behaviour. For example:
#!/usr/bin/env python
# -*- coding: UTF-8 -*-
'''
Created on 7 November 2012
@author: nick
'''
import sys
#from PyQt4.QtCore import QString
from PyQt4.QtGui import QApplication, QWidget
from PyQt4.QtWebKit import QWebView
# Constants
#CUNE_STR = u"\U00012000"
HTML = u"<html><body>++؀؀\u1250++</body></html>"
# Main
#qStr = QString(HTML).arg(QString(CUNE_STR))
app = QApplication(sys.argv)
widget = QWidget()
widget.resize(320, 240)
widget.setWindowTitle("Hello, World!")
widget.show()
webView = QWebView(widget);
webView.setHtml(HTML);
webView.show()
sys.exit(app.exec_())
displays the correct character in the window, but if I change the HTML string to:
HTML = u"<html><body>++؀؀\u1200++</body></html>"
only "++<empty box>++" displays (EDIT: My system seems to have the font to display this glyph because it displays correctly in Mac's Character Viewer).
nickw2066
15th November 2012, 02:23
**bump**
Anyone have any idea about the \u1250 vs. \u1200 problem described immediately above?
Powered by vBulletin® Version 4.2.5 Copyright © 2024 vBulletin Solutions Inc. All rights reserved.