As part of creating a tool to edit aggregate data by the hour, I came across an interesting problem when trying to split the total into 4 bukcets for each hour.
sub_count = total_count/4 rem_count = total_count%4 counts = [sub_count]*3+[sub_count+rem_count]This works but always makes the last bucket have a larger count which makes the distribution of counts generated almost uniform. This led to trying to randomize which bucket was gonna by selected.
count_list = defaultdict(list) choices = [0, 1, 2, 3] for i in xrange(0, total_count): count_list[random.choice(choices)].append(i) result_count_list = [len(count) for index, count in count_list.iteritems()] # For lower total_count, pad 0s if len(result_count_list) != 4: result_count_list += [0] * (4-len(result_count_list)) return result_count_listUgh! A lot of fluff. Why not just initialize the list with 0s ?
count_list = [0] * 4 for r in xrange(total_count): count_list[random.randrange(0, 4)] += 1 return count_list