Actually 40 * 40 * 8 if your math had been correct. But I am not sure if I can calculate the true number.Since NCOL is 40, I thought I would have 40*8 objects
So let's see:
For the first row:
mat[0][0] contains 8 copies of mat, none of which have copies of mat.
mat[0][1] contains 8 more copies, one of which (mat[0][0]) also has 8 copies
mat[0][2] - mat[0][38] also each contain 8 copies, one of which (mat[0][n-1]) has 8 more
mat[0][39] has 8 copies plus 16 more (mat[0][0] and mat[0][38]) (I think your grid wraps)
For the second row:
mat[1][0] has 8 copies, plus mat[0][0] and mat[0][1] for 16 more. But mat[0][1] also has mat[0][0] for 8 more.
mat[1][2] has 8, plus mat[0][0], mat[0][1] and mat[0][2] for 24 more. The last two have [0][n-1] for another 16
same for 3 - 38.
mat[1][39] has 8 plus 24 plus 16 plus the 32 from mat[1][0] because of wrapping
and it just gets worse. By the time you get to [39][39] the total is somewhere near a gazillion. :-)
Glad the simpler solution worked for you.
Bookmarks