QRegExp Help; remove all html tag
I wand to remove all HTML tag to reformat document ...
Tidy can not make the job...
I test QString::remove & QRegExp line 10 and line 11 remove the close tag .. now i wand to remove the open tag i tested line 13 but .. remove all..
How can i make this?...
Code:
{
qDebug() << "### start clean tag ";
body.replace("<br>","##break##");
body.replace("</br>","##break##");
body.replace("</p>","##break##");
body.replace("</td>","##break##");
body.
remove(QRegExp("<head>(.*)</head>"));
body.
remove(QRegExp("<form(.*)</form>"));
body.
remove(QRegExp("</(div|span|tr|td|br|body|html|tt|a|strong|p)>"));
body.
remove(QRegExp("</(DIV|SPAN|TR|TD|BR|BODY|HTML|TT|A|STRONG|P)>"));
/*body.remove(QRegExp("<(div|span|tr|td|br|body|html|tt|a|strong|p)>"));*/
/*body.remove(QRegExp("<(div|span|tr|td|br|body|html|tt|a|strong|p)( )(.*)(!>)>"));*/
qDebug() << "### newbody " << body;
return body;
}
Re: QRegExp Help; remove all html tag
You need something like:
Code:
body.
remove( QRegExp( "<(?:div|span|tr|td|br|body|html|tt|a|strong|p)[^>]*>", Qt
::CaseInsensitive ) );
Re: QRegExp Help; remove all html tag
tanks ... the open tag is going out ... now stay only...
Code:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<!--UdmComment-->
Re: QRegExp Help; remove all html tag
Now is run and clean all tag:
Code:
{
qDebug() << "### start clean tag ";
body.replace("<br>","##break##");
body.replace("</br>","##break##");
body.replace("</p>","##break##");
body.replace("</td>","##break##");
body.
remove(QRegExp("<head>(.*)</head>"));
body.
remove(QRegExp("<form(.*)</form>"));
body.
remove( QRegExp( "<(.)[^>]*>"));
qDebug() << "### newbody " << body;
return body;
}
Re: QRegExp Help; remove all html tag
Quote:
Originally Posted by patrik08
body.remove(QRegExp("<form(.*)</form>"));
What if a page contains more than one form?
Re: QRegExp Help; remove all html tag
Quote:
Originally Posted by jacek
What if a page contains more than one form?
I hope ... body.remove( QRegExp( "<(.)[^>]*>"));
remove 2° inside form tag.... but on my CMS is only News article ... to reformat color and Style... I replace new break-line and go to tidy to controll....
Re: QRegExp Help; remove all html tag
Quote:
Originally Posted by patrik08
I hope ... body.remove( QRegExp( "<(.)[^>]*>"));
remove 2° inside form tag....
Then you should better try your code on:
[html]text1
<form>form1</form>
text2
<form>form2</form>
text3[/html]
hint
Re: QRegExp Help; remove all html tag
Quote:
Originally Posted by jacek
Then you should better try your code on:
[html]text1
<form>form1</form>
text2
<form>form2</form>
text3[/html]
hint
Now take moore as on form and java scripts or style...
Run so...
Code:
{
qDebug() << "### start clean tag "; /* */
body.replace(" "," ");
body.replace("<br>","##break##");
body.replace("</br>","##break##");
body.replace("</p>","##break##");
body.replace("</td>","##break##");
body.
remove(QRegExp("<head>(.*)</head>",Qt
::CaseInsensitive));
body.
remove(QRegExp("<form(.)[^>]*</form>",Qt
::CaseInsensitive));
body.
remove(QRegExp("<script(.)[^>]*</script>",Qt
::CaseInsensitive));
body.
remove(QRegExp("<style(.)[^>]*</style>",Qt
::CaseInsensitive));
body.
remove(QRegExp("<(.)[^>]*>"));
body.replace("##break##","</br>");
qDebug() << "### newbody " << body;
return body;
}
html result: