I wrote this after finding a bug in some of my code. Basically, I was iterating over a slice of inputs, processing each one. On occasion this processing would reveal new inputs to test, so I appended them to the slice.
for i := range input {
if newValue := test(input); newValue != nil {
input = append(input, newValue)
}
}
Of course, this doesn't work, and I figured out (and appreciated) why by reading Go's spec.
Hopefully this brief guide will be of use to someone.
First off, good write up, I'm sure others will find it useful.
To your specific issue, I thought it was good programming etiquette to never modify an object you're iterating over, regardless of how the language handles such a thing. Were my instructors too strict? Is this a common idiom in other environments?
"modify an object" means "change the number of elements", since you obviously want to be able to manipulate the individual elements of a container as you iterate over them. The object here is the container, not the elements themselves.
I can't comment on other languages, but I'd say that guideline is a little too strict for Go.
The classic implementation of Breadth First Search involves iterating over a queue as you fill it.
"don't modify the RHS while in a range clause" would be a more suitable guideline for Go. Note that it's subtley different from "iterating over" - indeed, the answer to my bug was to iterate without using the range clause:
for i := 0; i < len(input); i++{
if newValue := test(input); newValue != nil {
input = append(input, newValue)
}
}
I now appreciate the difference between this and the range clause - the length is evaluated every iteration this way. The range clause evaluates it once, at the beginning - rule (1).
After thinking a bit about your examples it makes me appreciate the keyword: the range clause's strictness guarantees iterating only on a certain `range` (self-duh) hence why it's not just called `iterate`.
I found myself making simple mistakes by assuming that range reading on a synchronous channel would cause the goroutine that is sending to the channel to become active. Instead, I wanted to use a for-select statements or a buffered channel because a length guarantee couldn't be made (or so I assume).
The GP's example is equivalent to pushing to the tail of a queue while you're consuming its head, which is a relatively common (and safe) pattern. It looks like the Go designers made this pattern a bit harder to express, in favour of making the general case a bit harder to mess up.
> the Go designers made this pattern a bit harder to express
GP can just do an old-school `for` loop without a `range` clause (generally everyone learns about `for` before learning about `range`) and this immediately becomes incredibly easy to express.
Thanks for writing this, very informative. You say that adding to a channel's buffer to allow you to append to it while you're reading is code smell. Would you consider just spinning off a goroutine to do the append the same? It seems like it gets around the problem pretty well, though it's pretty much the same thing. Just wondering what you think because I've used this pattern before.
This seems to me to be the same class of error as those that aren't so much a go thing as a very comma gotcha for closures; you're placing x in the scope of the for loop and passing it into a series of anonymous functions, so it gets closed over and referenced as a common variable between each of those functions; thus before each goroutine has a chance to run the loop has completed and x == 3. You'll find this behaviour in any language with closures.
In the second example you're allocating a new local variable on each iteration, so each individual value gets closed over separately. That's probably not what you'd want usually, hence that not being default behaviour.
Right. I'm just saying I think the scope ought to be different. The 'x' in the loop should be a new variable each time because its not really 'a common variable between each of those functions'.
In the
for i := 0; i < 10; i++ {}
case it's definitely more clear that i should be the same thing between iterations. (So you can tinker with i inside the loop) It just seems like they could've done something different for the 'range' for loop.
Javascript is plagued with this same problem (though its even worse because it doesn't even follow { } blocks)
After I wrote my reply I played around with some C# and found to my surprise that foreach does in fact provide a new variable to be captured on each iteration, so clearly this is a design decision that varies between languages.
Not all languages with closures work this way. In Objective-C blocks, the default is to capture locals by value, so this sort of error is less likely. In C++11 lambdas, you have to specify the capture type as well.
Capturing variables by value has both safety and performance benefits in a multithreaded world, and it's unfortunate that Go chose not to do that.
This really isn't a gotcha. This happens in every language with closures, except Java. People complain about Java's insistence on labeling every closed-over variable "final", but it sure does nip this problem in the bud.
I was curious if a simple fix for the bug would be wrapping a `defer` statement around the anonymous function. It seems only arguments are evaluated when defering, not blocks like I had assumed.
Yeah that kind of approach is typically how you resolve this issue, coffeescript even has a neat wrapper around (function([args...]) { ... })([args...]) of do (args...) -> ... to make this less painful.
Interesting how the go syntax makes it easy to explicitly pass in arguments. That is very nice.
I dislike the fact that the value in range over a slice is a copy, rather than an alias, to the slice value.
It seems to me to be strictly less useful than the alternative (aliasing).
And I also don't really buy the argument that "it is a normal assignment and so has to copy", since:
i, v := range s
isn't a normal assignment. It has special rules to do with looping. Having the additional rule that v aliases to the entry seems to me to be a full win (too late to change now I guess).
The reason it's right is that if it was done by aliasing, it would create a wierd trap variable, an invisible pointer dereference which would modify something else.
a := 1
b := []int{2,3,4}
for i, c := range b {
// lots more code
a = 5
c = 6
// now a is 5 and c is 6 as you'd expect
// but also by magic, b[i] is 6
}
// now by magic, b is {6,6,6}
This catches people out, and is something that some people on the Go team have expressed regret over - unfortunately it is too late to change due to backwards compatibility promises.
Though I like the copy behavior more than the reference one I wish there was an option to turn on aliasing. Kinda like in C++11's for(:) where you can get both - a copy or a reference.
Having strict control over reference/value semantics is a feature I don't want to miss in a systems language.
Yep, I suppose this would be the preferable way to deal with this whole situation, although it might confuse new Go programmers and it may produce nasty bugs which are rather hard to find (especially in a larger code base, obviously).
Does go have a generic iterator interface? When I looked through the docs range seemed the closest, but all the examples seemed tied to array or an array of the keys of a dictionary.
For example Python has a iterator protocol and language support, and Java has Iterator/Iterable with language support.
The first one is eight tokens, of which at least three are necessary: 100, for, and i; let's say four. The second one is 13 tokens. That means it has more than twice as much noise to distract you from the signal, and to get right when you write the code. As a result, variations like these require more attention to notice when you're reading the code:
for i = 0; i < 100; i++ { }
for i := 1; i < 100; i++ { }
for i := 0; j < 100; i++ { }
for i := 0; i <= 100; i++ { }
for i := 0; i++; i < 100 { }
The first has no counterpart in Python (where the 8-token version comes from) since Python always has that bug. The others are:
for i in range(1, 100):
i=0; while j < 100: i += 1; ...
for i in range(101):
raise TypeError
In short, every bit of extraneous information you put into your code distracts you from the relevant information, and that extraneous information is something else you can get wrong. Golang does a lot better at this than C does, but it could do better still.
In some cases, where Golang is noisier than Python or Ruby, it's because the extra redundancy is there to catch errors or encourage you to handle failures properly. This is not one of those cases.
Hopefully this brief guide will be of use to someone.