Results 1 to 3 of 3

Thread: String similarity check

Threaded View

Previous Post Previous Post   Next Post Next Post
  1. #1
    Join Date
    Jan 2012
    Location
    Dortmund, Germany
    Posts
    159
    Qt products
    Qt4
    Platforms
    Windows Android
    Thanks
    69
    Thanked 10 Times in 8 Posts

    Default String similarity check

    Is there any ready-made string similarity check in Qt?
    I've read about Levenshtein distance or soundex, is there anything like this built in?

    According to this thread (sorry, German language) I've implemented a counting of small substrings that gives me quite reasonable results for Names (which I need to compare) when I take substrings of n=2 chars and an 80% threshold.

    I'd like to know if there is a better routine fast enough that I can just take and use.

    This is what I use right now (corrections on style and speed are welcome!)

    header:
    Qt Code:
    1. bool isSimilar(QString a, QString b, qreal percentage=80, int n = 2, Qt::CaseSensitivity caseSense= Qt::CaseInsensitive);
    To copy to clipboard, switch view to plain text mode 

    implementation:
    Qt Code:
    1. bool MainWindow::isSimilar(QString a, QString b, qreal percentage, int n, Qt::CaseSensitivity caseSense)
    2. //Iterates substrings in groups of n chars from a und finds these in b.
    3. //The number of hits is then divided by the length of the shorter string.
    4. //To properly take word beginnings and endings into account
    5. //spaces are being inserted before and after the strings.
    6. {
    7. if (a.isEmpty()||b.isEmpty()) return false;
    8. qreal hits=0;
    9. a=QString(" ").repeated(n-1)+a+QString(" ").repeated(n-1);
    10. b=QString(" ").repeated(n-1)+b+QString(" ").repeated(n-1);
    11. QString part;
    12. for (int i=0;i<a.count()-(n-1);i++)
    13. {
    14. part=a.mid(i,n);
    15. if (b.contains(part,caseSense)) hits++;
    16. }
    17. if (a.length()<b.length()) return (percentage < (100*hits/(a.length()-(n-1))));
    18. else return (percentage < (100*hits/(b.length()-(n-1))));
    19. }
    To copy to clipboard, switch view to plain text mode 

    For the name "Markus Bertram" I get these results:
    • Bertram, Markus - 93,3
    • Markus E. Bertram - 100
    • Marcus Emil Bertram - 86,7
    • marc bertram - 84,6 (case-insensitive)
    • Martin Bertram - 73,3 (false)
    • Martin Bergmann - 46,7 (false)
    Last edited by sedi; 23rd June 2012 at 17:46. Reason: spelling corrections

Similar Threads

  1. Replies: 3
    Last Post: 8th June 2011, 07:36
  2. std:string how to change into system:string?
    By yunpeng880 in forum Qt Programming
    Replies: 1
    Last Post: 14th April 2009, 09:51
  3. qregexp to check string
    By mattia in forum Newbie
    Replies: 3
    Last Post: 19th February 2008, 15:13
  4. Int to String - manipulating string
    By mickey in forum General Programming
    Replies: 6
    Last Post: 5th November 2007, 21:11
  5. How to check if a string starts with a substring?
    By lni in forum Qt Programming
    Replies: 3
    Last Post: 18th April 2007, 01:36

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
Qt is a trademark of The Qt Company.