PDA

View Full Version : bluetooth LE receive bandwidth



efiLabs
8th September 2018, 07:55
i have upgraded my hardware from a china ble module to a dedicated northern semiconductor arm cortex-m4f mcu which also holds now all of the firmware together with the nordic serial data protocol (20 bytes max / gatt pkg)

this hw sends out binary data pkgs, either to an android app running on a nexus-7 2013 or a ubuntu 16.04 or 18.04 desktop computer

data rcve rate is about 30 bytes every 50 ms, so roughly 600 bytes / sec (6 k baud)

for comparison, the android app consists of the same c++ bin pkg decode code in the JNI portion of the app and java is getting the decoded data, stuffs it into it's data array and displays it on the tablet

the QT app uses the exact same c++ bin pkg decode code and puts the data into the data structs ... a timer updates this data on the appropriate dialog window

the android app with a humble quad core arm cpu can keep up with a pkg rate of 20 ms / 30 byte pkgs, which was a big improvement when i switched the china ble modules against the nordic mcus

the QT desktop apps dies slowly even with the 50 ms data pkg rate ... what i mean by it is observing the system monitor cpu load and having some dbg info (single char print) for each pkg reception as well as a 1000 rcv char cnt timestamp

it all starts out working fine for a few secs and then the cpu load on the 8 cores (evenly spread out, one at a time) goes up and goes down again the next time it goes up higher and after 30 plus secs this game happened about 10 plus times and then one cpu reaches 100 % of it's slice for too long and the whole thing hangs ... also QT debug mode with breakpoints doesn't seem to like ble based apps since it never connects

i don't know how else to describe it

on another test app i only count 1000 rcv bytes and print a timestamp and nothing else and i can reach a 30 ms pkg rate ... if i want to display the data in 2 byte hex format in a text edit it dies the same way

is this all i can expect from an amd 8 core or and intel i7 8 core cpu on a QT desktop computer running the above mentioned ubuntu versions ???

at this point i wouldn't even know how to spit the whole app up into threads or similar to even the load

for the simple bandwidth test app i see no reason what part to put on a separate thread

for the data processing app i have 2 solutions, one being single threaded and the other one as follows

the code for "characteristicChanged" slot sticks the data into a very simple queue and a 2nd thread checks to see if there is something in this que and takes it out and does the pkg decode and sticks it into the data struct
the main app updates the gui based upon a timer from this data struct
note, it's interesting that the que never gets full, overflow ... how comes that the ble rcve from the qt library doesn't run independent in it's own thread (environment) and produces the rcv data regardless of the main app ... this data has to come from a system driver ???

can this QT ble data rcv library be run in it's own thread ... right now i wouldn't know how

there seemed no performance difference for both app versions (single vs double threaded)

sorry for the lengthy description

i feel that others might run eventually into similar issues and face the same ble bandwidth dilemma.

i have not found anything applicable to my performance issue, but maybe i'm not good at gooooogling

any thoughts and suggestions are highly appreciated

cheers efiLabs

d_stranz
8th September 2018, 19:01
the QT desktop apps dies slowly even with the 50 ms data pkg rate ... what i mean by it is observing the system monitor cpu load and having some dbg info (single char print) for each pkg reception as well as a 1000 rcv char cnt timestamp

it all starts out working fine for a few secs and then the cpu load on the 8 cores (evenly spread out, one at a time) goes up and goes down again the next time it goes up higher and after 30 plus secs this game happened about 10 plus times and then one cpu reaches 100 % of it's slice for too long and the whole thing hangs ... also QT debug mode with breakpoints doesn't seem to like ble based apps since it never connects


This sounds like a classic case of a memory leak where your app continuously allocates more memory without freeing any up. Eventually your system chokes and dies because all available RAM has been allocated. If you are running a 64-bit app with a 64-bit memory address space, your program could request terabytes of space, far beyond physical RAM.

Look for places where you are pushing data into buffers or other data structures that continue to grow. At a 50ms receive rate, this could happen pretty quickly.

Also look for algorithms where you are either repeatedly copying data structures that get larger and larger, or repeatedly processing large data structures over and over starting from their beginning, etc. The bigger things get, the more time it takes to process them, and eventually your program is spinning its wheels copying data around or rehashing the same old thing millions of times.

efiLabs
8th September 2018, 23:29
i have upgraded my hardware from a china ble module to a dedicated northern semiconductor arm cortex-m4f mcu which also holds now all of the firmware together with the nordic serial data protocol (20 bytes max / gatt pkg)

this hw sends out binary data pkgs, either to an android app running on a nexus-7 2013 or a ubuntu 16.04 or 18.04 desktop computer

data rcve rate is about 30 bytes every 50 ms, so roughly 600 bytes / sec (6 k baud)

for comparison, the android app consists of the same c++ bin pkg decode code in the JNI portion of the app and java is getting the decoded data, stuffs it into it's data array and displays it on the tablet

the QT app uses the exact same c++ bin pkg decode code and puts the data into the data structs ... a timer updates this data on the appropriate dialog window

the android app with a humble quad core arm cpu can keep up with a pkg rate of 20 ms / 30 byte pkgs, which was a big improvement when i switched the china ble modules against the nordic mcus

the QT desktop apps dies slowly even with the 50 ms data pkg rate ... what i mean by it is observing the system monitor cpu load and having some dbg info (single char print) for each pkg reception as well as a 1000 rcv char cnt timestamp

it all starts out working fine for a few secs and then the cpu load on the 8 cores (evenly spread out, one at a time) goes up and goes down again the next time it goes up higher and after 30 plus secs this game happened about 10 plus times and then one cpu reaches 100 % of it's slice for too long and the whole thing hangs ... also QT debug mode with breakpoints doesn't seem to like ble based apps since it never connects

i don't know how else to describe it

on another test app i only count 1000 rcv bytes and print a timestamp and nothing else and i can reach a 30 ms pkg rate ... if i want to display the data in 2 byte hex format in a text edit it dies the same way

is this all i can expect from an amd 8 core or and intel i7 8 core cpu on a QT desktop computer running the above mentioned ubuntu versions ???

at this point i wouldn't even know how to spit the whole app up into threads or similar to even the load

for the simple bandwidth test app i see no reason what part to put on a separate thread

for the data processing app i have 2 solutions, one being single threaded and the other one as follows

the code for "characteristicChanged" slot sticks the data into a very simple queue and a 2nd thread checks to see if there is something in this que and takes it out and does the pkg decode and sticks it into the data struct
the main app updates the gui based upon a timer from this data struct
note, it's interesting that the que never gets full, overflow ... how comes that the ble rcve from the qt library doesn't run independent in it's own thread (environment) and produces the rcv data regardless of the main app ... this data has to come from a system driver ???

can this QT ble data rcv library be run in it's own thread ... right now i wouldn't know how

there seemed no performance difference for both app versions (single vs double threaded)

sorry for the lengthy description

i feel that others might run eventually into similar issues and face the same ble bandwidth dilemma.

i have not found anything applicable to my performance issue, but maybe i'm not good at gooooogling

any thoughts and suggestions are highly appreciated

cheers efiLabs


i would like to provide some code snippets on the extremely simple byte counting and display app

ble connect to service funct setting up signal / slot for "characteristicChanged", which is the byte rcve part



void TDevice::ConnectToService (void)
{
setUpdMsg ("Connecting to service ...") ;

SioChariF = RxChariF = TxChariF = false ;

/*! [ble-service 2] */
connect (pService, SIGNAL (error (QLowEnergyService::ServiceError)),
this, SLOT (SvcError (QLowEnergyService::ServiceError))) ;
connect (pService, SIGNAL (stateChanged (QLowEnergyService::ServiceState)),
this, SLOT (SvcDetailsDisc (QLowEnergyService::ServiceState))) ;
connect (pService, SIGNAL (characteristicChanged (QLowEnergyCharacteristic, QByteArray)),
this, SLOT (RdyRead (QLowEnergyCharacteristic, QByteArray))) ;
/*connect (pService, SIGNAL (characteristicWritten (QLowEnergyCharacteristic, QByteArray)),
this, SLOT (WrData (QLowEnergyCharacteristic, QByteArray))) ; */

pService->discoverDetails () ;

setUpdMsg ("Discovering details ...") ;
/*! [ble-service 2] */
}


i like to have a dedicated read funct in the ble device module and provide rdyRead read signal to outside
contains still some now unused debug statements



void TDevice::RdyRead (const QLowEnergyCharacteristic &chari, const QByteArray &val)
{
//qDebug () << "dev.ChariValChg" << chari.uuid () << val ;

if (chari != TxChari) return ;

//MSG_PRN (".") ;
emit rdyRead (val) ;

//qDebug () << "dev.ChariChg" << chari.uuid () << val ;
}


providing the signal / slot connect funct in the module of the data count and display processing


void TTestDlg::Connect (void)
{
Com.Lim (Sze) ;

connect (Device, SIGNAL (rdyRead (QByteArray)),
this, SLOT (Put (QByteArray))) ;
}


this is the final data byte counting and display funct ... nothing more which would to my opinion produce a memory leak ... there is nowhere any dynamic "new / delete" operation used

" if (!DispF) return ;" determines the counting only or also subsequent data display


void TTestDlg::Put (const QByteArray &data)
{
//if (!this->isVisible ()) return ;
//if (!this->hasFocus ()) return ;

static qint64 t64 = QDateTime::currentMSecsSinceEpoch () ;

enum { eCnt = 1000 } ;
static int cnt = 0 ;
int len = data.length () ;
for (char *p = (char *) data.data (),
*q = &p [len] ; p < q ; p++) {
if (!(cnt++ % eCnt)) {
qint64 t = QDateTime::currentMSecsSinceEpoch () ;
MSG_PRN ("%d-%04d ", cnt / eCnt, t - t64) ; t64 = t ;
}
}
if (!DispF) return ;

if (!BinF && data.length () == 1 && data [0] == '\n') return ;

// qDebug () << data ;

if (!BinF) ui->MsgTxe->insertPlainText (QString (data)) ;
else {
uint sz = data.length () ;
char buf [0x200], *p = buf ;
for (uint i = 0 ; i < sz ; i++) {
uint x = data [i] ;
*p++ = i2hex (x >> 4) ;
*p++ = i2hex (x) ;
*p++ = ' ' ;
} *p = '\0' ;
ui->MsgTxe->insertPlainText (buf) ;
}
QScrollBar *bar = ui->MsgTxe->verticalScrollBar () ;
bar->setValue (bar->maximum ()) ;
}




static char i2hex (uint x) { return "0123456789abcdef" [x & 0x0f] ; }


i realize that i have 2 signal / slots chained together, but i don't know the penalty for doing this as compared to one

see attached pict :

console is my debug outlet with "1-2144 2-2306 3-2249 ..." the 1st number being the 1 k byte count (1 2 3 ...) and the second (2144 2306 2249) the time in msec it took to rcve 1 k bytes, which is roughly 450 / sec

Test-Dlg will print the incoming hex values (bin check-box) if the disp check-box is checked

this code handles the 450 bytes / sec for counting only fine, as soon as i start displaying it it freezes slowly over a period of minutes

at this point i do not see any use of dynamic mem allocation like "new / delete" which would cause mem leaks and eventually consuming all avail mem

i realize that the Test-Dlg gui has to update a hex pair and space 450 times a sec ... so every 2 ms ... is this asking too much ???

this is more or less the whole byte count and display app and a bit glue fluff around it ... it would provide also data transmit, which is not being used here

and why does it take several minutes to freeze


12962

here are pics from the system monitor showing the gradual increase until this simple byte counting / display app freezes

129631296412965

i also ran this simple test app in valgrind (i'm new to it) for a short while watching the data coming in and there were no leaks detected while running

upon closing my simple app, 5 leaks showed up, where 4 of them appeared purely within some libs ... basically not directly called by my code

one line pointed out as a leak was my code calling "discoverServices ()" which is a QT bluetooth lib funct to start discovering services


void TDevice::DevConnect (void)
{
setUpdMsg ("Discovering services ...") ;
ConnF = true ;
/*! [ble-service 1] */
pControl->discoverServices () ;
/*! [ble-service 1] */
}

here is the valgrind info related to those leaks

1296612967

again, i'm new to valgrind, which seems to be a cooooooool tool ;)

why does it freeze after a few minutes receiving data with gradually increasing cpu usage

and if there is a mem leak, how can this be found as long as my code doesn't make unbalanced "new / delete" calls

yes, i use "new / delete" in the apps ctor to instantiate the various dialogs and close (delete) them upon app close

well, i ran valgrind now directly from the cli (not through valkyrie gui) and saved the cli output

and posted it as BleTerm-leaks0.txt and BleTerm-leaks2.txt attachment (one exceeded the up[load size limit)

anyway, the only real leak possibly caused by my code would in the QT ble lib usage setup and it seems to be in error, since i delete the allocated "pControl = new QLowEnergyController (DevInfo.getDevice ().address()) ;" later at several places, including the dtor

and now i'm worried that no one will answer after such a lengthy explanation ;)

d_stranz
15th September 2018, 17:11
at this point i do not see any use of dynamic mem allocation like "new / delete" which would cause mem leaks and eventually consuming all avail mem

There may be memory allocations internal to Qt or whatever other libraries you are using where ownership of the allocated memory is transferred to your program. It also doesn't have to be a classic unpaired new / delete problem. It could be that there are buffers being allocated that should be cleared and aren't, so they just continue to grow.


so every 2 ms ... is this asking too much ???

There is a reason why refresh rates for video displays are in the less than 100 Hz range. The human visual system doesn't respond consciously to anything that occurs faster than 30 Hz, and even then merges things into a continuous stream, like video. Your reaction time to something that happens suddenly is on the order of 0.3 s or greater. So updating a UI with a value that changes at 450 Hz is sort of useless. It would make more sense to display a moving average at less than 30 Hz, and even then if the value changes each time all your eye will see is blurred numbers.

If the number you are updating is something that occurs in a well-defined range (X +/- Y, where Y is small compared to X), then a histogram might be a more useful display. So instead of displaying a constantly changing X, you count the number of times each X occurs and display the counts vs. X in a histogram. Even then, you don't update the replotting of that more than a few time a second at most.


and now i'm worried that no one will answer after such a lengthy explanation

There is that problem. Sometimes less is more - distilling the problem down to a few lines usually gets a better response than pages of detailed explanation, all in lowercase...

Besides that, you are dealing with a hardware / software configuration that probably very few people who follow this forum have any experience with. I don't. I am just suggesting areas to look, based on my long experience trying to track down these kinds of bugs.