As part of creating a tool to edit aggregate data by the hour, I came across an interesting problem when trying to split the total into 4 bukcets for each hour.
sub_count = total_count/4
rem_count = total_count%4
counts = [sub_count]*3+[sub_count+rem_count]
This works but always makes the last bucket have a larger count which makes the distribution of counts generated almost uniform. This led to trying to randomize which bucket was gonna by selected.
count_list = defaultdict(list)
choices = [0, 1, 2, 3]
for i in xrange(0, total_count):
count_list[random.choice(choices)].append(i)
result_count_list = [len(count) for index, count in count_list.iteritems()]
# For lower total_count, pad 0s
if len(result_count_list) != 4:
result_count_list += [0] * (4-len(result_count_list))
return result_count_list
Ugh! A lot of fluff. Why not just initialize the list with 0s ?
count_list = [0] * 4
for r in xrange(total_count):
count_list[random.randrange(0, 4)] += 1
return count_list