PDA

View Full Version : Qt regular expressions!!



notsonerdysunny
29th March 2007, 00:30
I was wondering if there is are ways to acess the sub-matches apart from the complete match when we do regular-expression matching. for instance,

"^[a-z](\d+)" when mached with "abcd1243abc" should not only return the final match i.e "d1234" but also give me access to sub-match "1234" which corresponds to (\d+) which can be done in perl and other scripting languages which support regular expressions using variables like $1, $2 .. etc.

jacek
29th March 2007, 00:35
QRegExp::cap()

fullmetalcoder
29th March 2007, 18:32
QRegExp::capturedTexts() is another solution. The choice of either of these depends on what you want to do... the aforementioned returns a single item at a given index whereas this one returns a list of all captured texts...

patrik08
29th March 2007, 20:38
I was wondering if there is are ways to acess the sub-matches apart from the complete match when we do regular-expression matching. for etc.

Wath is sub-matches ... on italian , german is this like find each find image on tag .. src="image.png" ?? on text....?

like this smal piece to replace qt uncool text style css....



/* find style,lang,class on a html file && append o a list && and end remove....*/
QString TidyExternalHtml( QString body )
{
QString prehtml = TidyCleanhtml(body); /* base clean to stay on minimal standard xhtml and capable to find .... */
QStringList notneed;
notneed.clear();
///////////////width="456" lang class
QRegExp expression( "width=[\"\'](.*)[\"\']", Qt::CaseInsensitive ); /* table td tr width image amen */
QRegExp expression2( "style=[\"\'](.*)[\"\']", Qt::CaseInsensitive );
QRegExp expression3( "lang=[\"\'](.*)[\"\']", Qt::CaseInsensitive );
QRegExp expression4( "class=[\"\'](.*)[\"\']", Qt::CaseInsensitive );


expression.setMinimal(true);
expression2.setMinimal(true);
expression3.setMinimal(true);
expression4.setMinimal(true);

int iPosition = 0;
while( (iPosition = expression.indexIn( prehtml , iPosition )) != -1 ) {
QString semi1 = expression.cap( 1 );
notneed.append(QString("width=\"%1\"").arg(semi1));
notneed.append(QString("width='%1'").arg(semi1));
iPosition += expression.matchedLength();
}

iPosition = 0;
while( (iPosition = expression2.indexIn( prehtml , iPosition )) != -1 ) {
QString semi2 = expression2.cap( 1 );
notneed.append(QString("style=\"%1\"").arg(semi2));
notneed.append(QString("style='%1'").arg(semi2));
iPosition += expression2.matchedLength();
}

iPosition = 0;
while( (iPosition = expression3.indexIn( prehtml , iPosition )) != -1 ) {
QString semi3 = expression3.cap( 1 );
notneed.append(QString("lang=\"%1\"").arg(semi3));
notneed.append(QString("lang='%1'").arg(semi3));
iPosition += expression3.matchedLength();
}

iPosition = 0;
while( (iPosition = expression4.indexIn( prehtml , iPosition )) != -1 ) {
QString semi4 = expression4.cap( 1 );
notneed.append(QString("class=\"%1\"").arg(semi4));
notneed.append(QString("class='%1'").arg(semi4));
iPosition += expression4.matchedLength();
}

for (int i = 0; i < notneed.size(); ++i) {
const QString fluteremove = notneed.at(i);
prehtml = prehtml.replace(fluteremove,"", Qt::CaseInsensitive );
}

return prehtml;
}

fullmetalcoder
30th March 2007, 12:16
You're parsing HTML right? So why don't you use Qt XML processing facilities??? :confused: HTML is nothing but specialized XML...

elcuco
31st March 2007, 01:38
HTML is not XML. See this example:

wrong nesting of nodes:


<div><b>a</div></b>


not closing nodes (fint with html 4.01)


<ul>
<li>one
<li>two
</il>
<br>


not using quotes on properties:


<img src=test.jpg border=0></img>

patrik08
31st March 2007, 12:14
HTML is not XML. See this example:
<img src=test.jpg border=0></img>
[/code]

Right .... so many format html the best is XHTML .... tidy can handle .....

qtclass
https://qt-webdav.svn.sourceforge.net/svnroot/qt-webdav/lib_tidy_src/QT4_doc/qtidy.h
static lib .... svn co
https://qt-webdav.svn.sourceforge.net/svnroot/qt-webdav/lib_tidy_src/ path...

if i grab image on regex or on dom xml as xhtml is work all two.....

But have a look on word or openoffice or other programm... the copy fragment...
this is horror... :crying:

Wenn i import on copy paste extern text ... evryting go before on tidy .... otherwise i can not find image and link ..... && i remove extern class && table widht .... from A4 or other
and reformat to new ....


tidiconfigfile.append("output-xhtml: yes");
tidiconfigfile.append("clean: yes");
tidiconfigfile.append("wrap: 550");
tidiconfigfile.append("indent-spaces: 1");
tidiconfigfile.append("char-encoding: utf8");
tidiconfigfile.append("output-encoding: utf8");
tidiconfigfile.append("wrap-attributes: yes");
tidiconfigfile.append("doctype: yes");
tidiconfigfile.append("hide-comments: yes");
tidiconfigfile.append("numeric-entities: yes");
tidiconfigfile.append("drop-proprietary-attributes: yes");
tidiconfigfile.append("word-2000: yes");
tidiconfigfile.append("bare: yes");
//////tidiconfigfile.append("show-body-only: yes"); /* only body checks */





void QVimedit::insertFromMimeData ( const QMimeData * source )
{

////////////////QTextEdit::insertFromMimeData(source);
if ( source->formats().contains("text/html") ) {
////////qDebug() << "### incomming paste text/html ";
const QString tidicaches = QString("%2/.qtidy/").arg(QDir::homePath());
QString draghtml = source->html();
/* fwriteutf8(QString fullFileName,QString xml) */
QTidy *tidy = new QTidy(); /* QTidy *tidy; */
tidy->Init(tidicaches); /* tidy cache remove on last event */
const QString xhtmlnew = tidy->TidyExternalHtml(draghtml);
///////fwriteutf8("copy_in.html",xhtmlnew);
QTextDocumentFragment fragment = QTextDocumentFragment::fromHtml(xhtmlnew);
textCursor().insertFragment(fragment);
emit IncommingHTML(); /* signal to reload image on resource... QTextDocument */
return;
}
}

if hasimage ........

if url .....

wysota
31st March 2007, 14:15
AFAIK Qt uses xhtml.

patrik08
31st March 2007, 19:32
AFAIK Qt uses xhtml.


<br> this here is xhtml ? QTexEdit you can reload 2 or 6 time

QTextBrowser (on edit modus) or QTexEdit not handle to <br /> ....:(
only tidy handle that .... && asci or other tag....

kiker99
1st April 2007, 13:56
if you just want to use the captured groups in a replace, you can also use QString::replace ( const QRegExp & rx, const QString & after ). After can have '\\1', '\\2', etc items, which will be replaced with the corresponding capture groups.