PDA

View Full Version : Storing filenames in QString: a bad idea?



drhex
24th August 2008, 19:08
I'm writing an application that manages files in large file system hierarcies. Until now, I have stored filenames in QStrings and browsed directories using QDir and QFileInfo.

But consider this small example program which simply loops over the file system entries in the current directory and prints whether they are "files" or not:


#include <QApplication>
#include <QDir>
#include <QFileInfo>

int main(int argc, char *argv[])
{
QApplication app(argc, argv);

QDir d(".");
foreach (QFileInfo inf, d.entryInfoList(QDir::AllEntries|QDir::NoDotAndDot Dot|QDir::System))
{
qDebug("Entry %s: isFile: %s",
qPrintable(inf.fileName()),
inf.isFile() ? "yes":"no");
}

return 0;
}

I live in Sweden, where we have a few extra characters in the alphabet in addition to A-Z. Some people use iso8859-1 where those characters are encoded as single bytes in the 128-255 range. Others use utf-8 where the "funny" characters have multi-byte sequences. The files in my "large hierarcies" come from many sources and I cannot guarantee that the filenames have a consistent encoding.

If encoding inconsistencies only resulted in some filenames looking weird on the screen, that would be OK, but my experience is worse: non-conforming files become invisible in Qt!

I've tested the above program on a Linux system with LANG/LC_CTYPE set to "en_US.UTF-8" and with some files in the current directory copied from a Windows VFAT system (which used iso8859 to encode the non-ascii chars in the filenames).

Browsing the filesystem with QDir::AllEntries, those copied files are not found at all. If I add QDir::System (as in the example above), they are found but with isFile() returning false and the offending characters removed meaning I can't use the fileName() to e.g. open() the files.

Is there any way out besides using the OS's native functions to browse directories and storing filenames in a QByteArray?

wysota
24th August 2008, 19:40
Eventually you'll have to go down to using platform dependent code for accessing files which uses strings and not raw bytes so this is not likely to work. I suggest you try to detect file encoding based on the contents of the file path.

drhex
24th August 2008, 22:26
I dont think I quite understand your suggestion about detecting encoding from filepath.

Do you mean I should read the directory with native os-calls to get the raw filename, guess the encoding based on that byte sequence and figure out som way of making that into a QString? If that suceeds i still can't use QFile to open, because it will use utf-8 when converting the QString filename to char *. So I would ... need to keep the filename as a QByteArray too so I can open the file with native os-calls and promote the FILE * to a QFile later. This is getting messy :)

drhex
25th August 2008, 18:21
Found a better solution which means I can keep using the high-level classes instead of going down to os-specific code:

I browse directories with QDir/QFileInfo and let the filters include QDir::System. If an entry is neither A File nor a Directory, I make another QFileInfo on it using the fileName() from the first QFileInfo. I that works, the entry is most likely a socket/fifo/device node and will be ignored. But if it was not possible to make another QFileInfo it is probably because the filename was in the wrong encoding and has been stripped of some characters. The program will present a selection of failed filenames and suggest the user does e.g.


convmv -f iso-8859-1 -t utf8 -r --notest *

to fix the filenames and then try again.

wysota
25th August 2008, 22:21
I dont think I quite understand your suggestion about detecting encoding from filepath.

Do you mean I should read the directory with native os-calls to get the raw filename, guess the encoding based on that byte sequence and figure out som way of making that into a QString?
I mean to use statistics to guess what the encoding of the filename is and then to normalize names by converting them to Unicode.


If that suceeds i still can't use QFile to open, because it will use utf-8 when converting the QString filename to char *. So I would ... need to keep the filename as a QByteArray too so I can open the file with native os-calls and promote the FILE * to a QFile later. This is getting messy :)

Be aware that "down the well" QFile uses platform dependent means to access files and that was what I meant in my previous post.