Results 1 to 5 of 5

Thread: Storing filenames in QString: a bad idea?

  1. #1
    Join Date
    Jan 2006
    Location
    Knivsta, Sweden
    Posts
    153
    Thanks
    30
    Thanked 13 Times in 12 Posts
    Qt products
    Qt4 Qt5
    Platforms
    Unix/X11

    Default Storing filenames in QString: a bad idea?

    I'm writing an application that manages files in large file system hierarcies. Until now, I have stored filenames in QStrings and browsed directories using QDir and QFileInfo.

    But consider this small example program which simply loops over the file system entries in the current directory and prints whether they are "files" or not:

    Qt Code:
    1. #include <QApplication>
    2. #include <QDir>
    3. #include <QFileInfo>
    4.  
    5. int main(int argc, char *argv[])
    6. {
    7. QApplication app(argc, argv);
    8.  
    9. QDir d(".");
    10. foreach (QFileInfo inf, d.entryInfoList(QDir::AllEntries|QDir::NoDotAndDotDot|QDir::System))
    11. {
    12. qDebug("Entry %s: isFile: %s",
    13. qPrintable(inf.fileName()),
    14. inf.isFile() ? "yes":"no");
    15. }
    16.  
    17. return 0;
    18. }
    To copy to clipboard, switch view to plain text mode 

    I live in Sweden, where we have a few extra characters in the alphabet in addition to A-Z. Some people use iso8859-1 where those characters are encoded as single bytes in the 128-255 range. Others use utf-8 where the "funny" characters have multi-byte sequences. The files in my "large hierarcies" come from many sources and I cannot guarantee that the filenames have a consistent encoding.

    If encoding inconsistencies only resulted in some filenames looking weird on the screen, that would be OK, but my experience is worse: non-conforming files become invisible in Qt!

    I've tested the above program on a Linux system with LANG/LC_CTYPE set to "en_US.UTF-8" and with some files in the current directory copied from a Windows VFAT system (which used iso8859 to encode the non-ascii chars in the filenames).

    Browsing the filesystem with QDir::AllEntries, those copied files are not found at all. If I add QDir::System (as in the example above), they are found but with isFile() returning false and the offending characters removed meaning I can't use the fileName() to e.g. open() the files.

    Is there any way out besides using the OS's native functions to browse directories and storing filenames in a QByteArray?

  2. #2
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,359
    Thanks
    3
    Thanked 5,015 Times in 4,792 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Wiki edits
    10

    Default Re: Storing filenames in QString: a bad idea?

    Eventually you'll have to go down to using platform dependent code for accessing files which uses strings and not raw bytes so this is not likely to work. I suggest you try to detect file encoding based on the contents of the file path.

  3. #3
    Join Date
    Jan 2006
    Location
    Knivsta, Sweden
    Posts
    153
    Thanks
    30
    Thanked 13 Times in 12 Posts
    Qt products
    Qt4 Qt5
    Platforms
    Unix/X11

    Default Re: Storing filenames in QString: a bad idea?

    I dont think I quite understand your suggestion about detecting encoding from filepath.

    Do you mean I should read the directory with native os-calls to get the raw filename, guess the encoding based on that byte sequence and figure out som way of making that into a QString? If that suceeds i still can't use QFile to open, because it will use utf-8 when converting the QString filename to char *. So I would ... need to keep the filename as a QByteArray too so I can open the file with native os-calls and promote the FILE * to a QFile later. This is getting messy

  4. #4
    Join Date
    Jan 2006
    Location
    Knivsta, Sweden
    Posts
    153
    Thanks
    30
    Thanked 13 Times in 12 Posts
    Qt products
    Qt4 Qt5
    Platforms
    Unix/X11

    Default Re: Storing filenames in QString: a bad idea?

    Found a better solution which means I can keep using the high-level classes instead of going down to os-specific code:

    I browse directories with QDir/QFileInfo and let the filters include QDir::System. If an entry is neither A File nor a Directory, I make another QFileInfo on it using the fileName() from the first QFileInfo. I that works, the entry is most likely a socket/fifo/device node and will be ignored. But if it was not possible to make another QFileInfo it is probably because the filename was in the wrong encoding and has been stripped of some characters. The program will present a selection of failed filenames and suggest the user does e.g.

    Qt Code:
    1. convmv -f iso-8859-1 -t utf8 -r --notest *
    To copy to clipboard, switch view to plain text mode 

    to fix the filenames and then try again.

  5. #5
    Join Date
    Jan 2006
    Location
    Warsaw, Poland
    Posts
    33,359
    Thanks
    3
    Thanked 5,015 Times in 4,792 Posts
    Qt products
    Qt3 Qt4 Qt5 Qt/Embedded
    Platforms
    Unix/X11 Windows Android Maemo/MeeGo
    Wiki edits
    10

    Default Re: Storing filenames in QString: a bad idea?

    Quote Originally Posted by drhex View Post
    I dont think I quite understand your suggestion about detecting encoding from filepath.

    Do you mean I should read the directory with native os-calls to get the raw filename, guess the encoding based on that byte sequence and figure out som way of making that into a QString?
    I mean to use statistics to guess what the encoding of the filename is and then to normalize names by converting them to Unicode.

    If that suceeds i still can't use QFile to open, because it will use utf-8 when converting the QString filename to char *. So I would ... need to keep the filename as a QByteArray too so I can open the file with native os-calls and promote the FILE * to a QFile later. This is getting messy
    Be aware that "down the well" QFile uses platform dependent means to access files and that was what I meant in my previous post.

Similar Threads

  1. Convert from iso-8859-1 to... Something else :-)
    By Nyphel in forum Qt Programming
    Replies: 4
    Last Post: 7th March 2007, 17:59

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Digia, Qt and their respective logos are trademarks of Digia Plc in Finland and/or other countries worldwide.