(Comments)
The Python's glossary defines an iterator as
an object representing a stream of data. Repeated calls to the iterator's
next()
method return successive items in the stream. When no more data are available aStopIteration
exception is raised instead
Practically, an iterator
is an object that implements the next()
method which returns next subsequent data item each time it is called. Once there is no data to be returned, the method raises a StopIteration
exception. Any such iterator
can be used to work with a for
loop.
In [15]: for n in [10, 20, 30, 40]: ....: print(n) ....: 10 20 30 40
In this example, the for
loop iterates over the list and prints each element in it.
In [16]: for n in 'hello world': ....: print(n) ....: h e l l o w o r l d
Here with the string object, the for
loop iterates through the individual characters. similarly, when used with a dictionary, the for
loop will iterate through the keys
and with a file object, it iterates through the lines of the file. All these of objects are called iterable
s and allow the for loop to handle individual items they have or generate. There are other functions that consume these iterables.
In [18]: map(lambda x: x.upper(), 'Hello world') # map and filter functions Out[18]: ['H', 'E', 'L', 'L', 'O', ' ', 'W', 'O', 'R', 'L', 'D'] In [23]: ', '.join({'a': 'A', 'b': 'Message'}) Out[23]: 'a, b'
An iterator
object can be generated from an iterable using the iter
function.
In [38]: next(b) # The `next()` function of python can be used Out[38]: 1 In [39]: b.next() # Or the `next()` method of interator object Out[39]: 2 In [40]: b.next() Out[40]: 3 In [41]: b.next() Out[41]: 4 In [42]: b.next() --------------------------------------------------------------------------- StopIteration Traceback (most recent call last) <ipython-input-42-573a563d926b> in <module>() ----> 1 b.next() StopIteration:
That is all good, but why should we bother about iterators? Consider this.
$ ulimit -v 102400 # This is executed in bash shell $ python # Start the python interpreter Python 2.7.12 (default, Nov 19 2016, 06:48:10) [GCC 5.4.0 20160609] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> for i in range(10000000): ... pass ... Traceback (most recent call last): File "<stdin>", line 1, in <module> MemoryError
The bash command ulimit -v 120400
sets the virtual memory limit of the bash shell and its children to 100 KB. When you start the python interpreter, it has the same limit as the shell. The for loop in the shell tries to iterate over integers from 0 to 9999999 and do something with them but fails due to limit on the memory. The range()
function returns a list of integers as a single object which does not fit in the memory. Now consider this example, continued in the same python shell.
>>> for i in xrange(10000000): ... pass ... >>>
This does not pose any problem with the memory. This is because the xrange()
function does not return a list but an iterator function (How do we know that? We will see how to confirm that at a later point of time). This example clearly shows the advantage an iterator can give. Values from iterator are generated dynamically and hence does not need much memory. This mechanism is useful when working with large data sets that cannot be completely stored in the memory or have to generated or read from other sources, and we need to iterate over them.
An iterator object must implement two methods:
__iter__()
: This method must return the object itself.next()
: This method returns the next item if present. It must raise the StopIteration
exception when there are no more items to return.Example:
class irange(): def __init__(self, upperlimit): self.limit = upperlimit self.current = 0 def __iter__(self): return self def next(self): if self.current >= self.limit: raise StopIteration i = self.current self.current += 1 return i In [10]: list(irange(10)) Out[10]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
This is the general protocol of creating (implementing those methods) an iterator. There is an easier way to do the same with generators.
Generators are functions that produce a sequence of data items when used consecutively with next()
function or method. They return an iterator object (to be accurate, a generator object with next()
method wrapper). It is a magical way of creating iterators with functions. The interpreter handles creation of iterator from the function.
A generator must use a yeild statement. Let us understand how to build a generator and how yield behaves with an example that creates an iterator that behaves exactly like our previous example.
In [3]: def irange(limit):
...: current = 0 ...: while current < limit: ...: yield current ...: current += 1 ...:
In [4]: r = irange(10) In [5]: r Out[5]: <generator object irange at 0x7f3bf8a2f500> In [6]: list(r) Out[6]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
This might make you wonder how does this even work. The whole magic happens with the yield statement. When executed, the yield statement save the context-stack (all the variables along with their values) of the function and suspends the execution of the stack. But that is not all. When the python interpreter encounters a yield statement in a function, it saves the function stack and the value that is being yielded and returns a generator object.
In [1]: def irange(limit): ...: current = 0 ...: while current < limit: ...: yield current ...: current += 1 ...: In [2]: d = irange(10) In [3]: d Out[3]: <generator object irange at 0x7fcb1d9dc190>
The value 0 is yielded (saved) on the first call of the function irange() and a generator object is returned. When the next()
method of the generator object or the next(iterator)
function is called, the interpreter executes until it executes the next yield statement. When it encounters the yield statement next time, it saves the new context and it returns the previously yielded value.
In [4]: d.next() Out[4]: 0 In [5]: next(d) Out[5]: 1
The next() method continues to return the previously yielded values. If it reaches the end of the function without encountering a yield statement, the method raises the StopIteration
exception.
In [13]: d.next() Out[13]: 9 In [14]: d.next() --------------------------------------------------------------------------- StopIteration Traceback (most recent call last) <ipython-input-14-814402345f13> in <module>() ----> 1 d.next() StopIteration:
Generators are designed to make creation of iterators easy. They are also used to create context managers and I have explained how to create context managers in my earlier post.
And finally, we can verify that a function is a generator with the inspect.isgeneratorfunction()
.
In [16]: import inspect In [17]: inspect.isgeneratorfunction(irange) Out[17]: True
Note: All examples here were tested on Python version 2.7.
We develop web applications to our customers using python/django/angular.
Contact us at hello@cowhite.com
Comments