Duy's page
Better deals with smart code
Better deals with smart code
I have thousands of files and want to summarize the results with only show the mean values and the standard deviation values of each entries.
I made an attempt to use single core single core calculation by looping all-over-the files. I did that but it turns out to be much much patient than I thought, and of course with annoyingly dummy errors.
Then I though of using multiprocessing with Pool.map to handle the calculations with multithreading. It is great idea and seems to be faster.
With the natively lazy characteristics, I slightly modify the code from single-thread to multi-thread by just changing the outer loops to use Pool.map, and let it naively modify the numpy array for me.
Unfortunately, it always return to [0,...,0] arrays in Python 3.8.10. You can see the code below
angledistyb=numpy.zeros(len(resinfo))
def calres(cnt):
bound=[ [] for cnt in range(0,len(resinfo)) ]
pbound=[ [] for cnt in range(0,len(resinfo)) ]
unbound=[ [] for cnt in range(0,len(resinfo)) ]
bound=[ ]
pbound=[ ]
unbound=[ ]
for trialrun in range(1,6):
for x in range(1,1000):
for y in range(1,31):
datfn=path1
tmpdatfn=path2
if ((os.path.isfile(datfn)) and (os.path.isfile(tmpdatfn))):
dat=numpy.loadtxt(datfn,comments=["@","#","%","$"])
tmpdat=numpy.loadtxt(tmpdatfn,comments=["@","#","%","$"])
if dat[:,1].max()<=2.0:
bound.extend(tmpdat[1:,1])
if ((dat[:,1].min()>2.0) and dat[:,1].max()<=3.5):
pbound.extend(tmpdat[1:,1])
if dat[:,1].min()>3.5:
unbound.extend(tmpdat[1:,1])
print("Processing "+trial,resinfo[cnt][1],resinfo[cnt][0])
angledistyb[cnt]=numpy.mean(bound[~numpy.isnan(bound).any(axis=0)])
return
for trial in [<trial list>]:
angledistyb=numpy.zeros(len(resinfo))
p=Pool(16)
with p:
angledist=p.map(calres,range(0,len(resinfo)))
p.join()
p.close()
Does it work? NOoooooooooo
It seems that the outer angledistyb does not really pass to the function calres, so I decided to make it global array by adding the following.
def calres(cnt):
global angledistyb
Does it work right now? NOoooooooooo
By using multithreading, each threads will modify a single different copy of the array in memory, so it cannot modify directly the arrays angledistyb to protect the data during multithreading.
So using global properties in this case is useless. And of course, it took me a day to realize my silly code.
In python multiprocessing with Pool.map, the return action in the function will return an array or a list of variable. It will concatenate all of the return actions into a single list of the action Pool.map
To take advantage of this, I decide to take out the value directly from Pool.map. In this case, the angledistyb will be the return of the function calres.
So I will receive an 2D array from the Pool.map and smoothly to proceed further.
...
return angledistxb,angledistyb,angledisterrb,angledistypb,angledisterrpb,angledistyub,angledisterrub
p=Pool(16)
with p:
angledist=p.map(calres,range(0,len(resinfo)))
p.join()
p.close()
numpy.savetxt( trial+"angledist.txt",numpy.array(angledist) )