Exploring iterators and generators in Python

(Comments)

Iterator

The Python's glossary defines an iterator as

an object representing a stream of data. Repeated calls to the iterator's next() method return successive items in the stream. When no more data are available a StopIteration exception is raised instead

Practically, an iterator is an object that implements the next() method which returns next subsequent data item each time it is called. Once there is no data to be returned, the method raises a StopIteration exception. Any such iterator can be used to work with a for loop.

In [15]: for n in [10, 20, 30, 40]:
....:     print(n)
....:     
10
20
30
40

In this example, the for loop iterates over the list and prints each element in it.

In [16]: for n in 'hello world':
....:     print(n)
....:     
h
e
l
l
o

w
o
r
l
d

Here with the string object, the for loop iterates through the individual characters. similarly, when used with a dictionary, the for loop will iterate through the keys and with a file object, it iterates through the lines of the file. All these of objects are called iterables and allow the for loop to handle individual items they have or generate. There are other functions that consume these iterables.

In [18]: map(lambda x: x.upper(), 'Hello world')  # map and filter functions
Out[18]: ['H', 'E', 'L', 'L', 'O', ' ', 'W', 'O', 'R', 'L', 'D']

In [23]: ', '.join({'a': 'A', 'b': 'Message'})
Out[23]: 'a, b'

An iterator object can be generated from an iterable using the iter function.

In [38]: next(b)  # The `next()` function of python can be used
Out[38]: 1

In [39]: b.next() # Or the `next()` method of interator object
Out[39]: 2

In [40]: b.next()
Out[40]: 3

In [41]: b.next()
Out[41]: 4

In [42]: b.next()
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-42-573a563d926b> in <module>()
----> 1 b.next()

StopIteration:

That is all good, but why should we bother about iterators? Consider this.

$ ulimit -v 102400 # This is executed in bash shell
$ python # Start the python interpreter
Python 2.7.12 (default, Nov 19 2016, 06:48:10) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> for i in range(10000000):
...     pass
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError

The bash command ulimit -v 120400 sets the virtual memory limit of the bash shell and its children to 100 KB. When you start the python interpreter, it has the same limit as the shell. The for loop in the shell tries to iterate over integers from 0 to 9999999 and do something with them but fails due to limit on the memory. The range() function returns a list of integers as a single object which does not fit in the memory. Now consider this example, continued in the same python shell.

>>> for i in xrange(10000000):
...     pass
... 
>>>

This does not pose any problem with the memory. This is because the xrange() function does not return a list but an iterator function (How do we know that? We will see how to confirm that at a later point of time). This example clearly shows the advantage an iterator can give. Values from iterator are generated dynamically and hence does not need much memory. This mechanism is useful when working with large data sets that cannot be completely stored in the memory or have to generated or read from other sources, and we need to iterate over them.

Building custom iterators

An iterator object must implement two methods:

  1. __iter__(): This method must return the object itself.
  2. next(): This method returns the next item if present. It must raise the StopIteration exception when there are no more items to return.

Example:

class irange():
    def __init__(self, upperlimit):
        self.limit = upperlimit
        self.current = 0
    def __iter__(self):
        return self
    def next(self):
        if self.current >= self.limit:
            raise StopIteration
        i = self.current
        self.current += 1
        return i

In [10]: list(irange(10))
Out[10]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

This is the general protocol of creating (implementing those methods) an iterator. There is an easier way to do the same with generators.

Generators

Generators are functions that produce a sequence of data items when used consecutively with next() function or method. They return an iterator object (to be accurate, a generator object with next() method wrapper). It is a magical way of creating iterators with functions. The interpreter handles creation of iterator from the function.


A generator must use a yeild statement. Let us understand how to build a generator and how yield behaves with an example that creates an iterator that behaves exactly like our previous example.

In [3]: def irange(limit):

...: current = 0 ...: while current < limit: ...: yield current ...: current += 1 ...:

In [4]: r = irange(10)

In [5]: r
Out[5]: <generator object irange at 0x7f3bf8a2f500>

In [6]: list(r)
Out[6]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

This might make you wonder how does this even work. The whole magic happens with the yield statement. When executed, the yield statement save the context-stack (all the variables along with their values) of the function and suspends the execution of the stack. But that is not all. When the python interpreter encounters a yield statement in a function, it saves the function stack and the value that is being yielded and returns a generator object.

In [1]: def irange(limit):
   ...:     current = 0
   ...:     while current < limit:
   ...:         yield current
   ...:         current += 1
   ...:

In [2]: d = irange(10)

In [3]: d
Out[3]: <generator object irange at 0x7fcb1d9dc190>

The value 0 is yielded (saved) on the first call of the function irange() and a generator object is returned. When the next() method of the generator object or the next(iterator) function is called, the interpreter executes until it executes the next yield statement. When it encounters the yield statement next time, it saves the new context and it returns the previously yielded value.

In [4]: d.next()
Out[4]: 0

In [5]: next(d)
Out[5]: 1

The next() method continues to return the previously yielded values. If it reaches the end of the function without encountering a yield statement, the method raises the StopIteration exception.

In [13]: d.next()
Out[13]: 9

In [14]: d.next()
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-14-814402345f13> in <module>()
----> 1 d.next()

StopIteration:

Generators are designed to make creation of iterators easy. They are also used to create context managers and I have explained how to create context managers in my earlier post.


And finally, we can verify that a function is a generator with the inspect.isgeneratorfunction().

In [16]: import inspect

In [17]: inspect.isgeneratorfunction(irange)
Out[17]: True

Note: All examples here were tested on Python version 2.7.

Comments

Recent Posts

Archive

2019
2018
2017
2016
2015
2014

Tags

Authors

Feeds

RSS / Atom