Streams

Following on from Invariant programming, here is a further development of the functional approach to programming.

There are many different ways to solve a problem. A good example (from Udacity’s Intro to Computer Science) is the task of removing markup from a document and returning just the text as a list of words. The suggested solution is as follows:

def remove_tag(string):
    start = string.find('<')
    while start != -1:
        end = string.find('>', start)
        string = string[:start] + " " + string[end + 1:]
        start = string.find('<')
    return string.split()

It involves repeatedly scanning the input string for the location of the first opening and closing tags and then creating a new string composed of the substrings that precede and follow the two tags.

This solution is perfectly good. However, it is in some ways inefficient. The find function requires repeatedly scanning the source document, discarding snippets of markup, and creating snippets of text in order to splice them back into some output variable.

But this problem can also be solved by seeing the text as a continuous stream of characters, being read sequentually, and being transformed character by character. The resulting function is neat and short and legible.

Programming everything as a stream was probably first made possible in the LISP programming language, created by John McCarthy in 1958, and has subsequently found its way into dataflow programming and new projects, such as NoFlo.

Steps
  1. Visualise the input to your function as a single stream, broken down into its smallest elements.

  2. Break down your function into a series of transformations, performed separately. If your function requires more than one transformation, break it into separate functions with the output of the first function being fed into the input of the next function.

  3. See each element of your input as capable of either producing a given output or of changing the current state of your function. In simple cases, this is not necessary, but try to identify which elements might transform the state of the function.

This is probably too abstract to make much sense of without a specific example. Let’s return to the problem above, and solve it without using find.

  1. Visualise the string of markup as a stream of characters.

  2. There are two transformations: First transform the string of markup into a string stripped of markup, then transform the output string into a list of words.

  3. The characters < and > change the state of the system. After the character < is encountered, no characters should be copied to output. After the > is encountered, copying should resume.

Here is a possible solution to the probem:

def remove_tags(markup):
    text = ''
    recording = True
    for ch in markup:
        if ch == '<':
            recording = False
        elif recording:
            text += ch
        elif ch == '>':
            recording = True
            text += ' '
    return text.split()

The first solution certainly has fewer lines, but arguably it is harder to read and harder to visualise. In the second solution, the variable recording is intended to convey the idea of a recording head that is examining the input stream. When the recording head encounters a <, it stops recording the input stream and when it encounters a >, it resumes again. The penultimate line replaces the removed tag with a single space to prevent words that were previously separated by tags running together.

In the final line, the split() function is used to turn the output into a list of words.

This problem can also be done using invariant programming:

def remove_tags(input, recording=True, output=''):
    if input == []:
        return output.split()
    if input[0] == '<':
        return remove_tags(input[1:], False, output)
    if recording:
        return remove_tags(input[1:], True, output + ch)
    if ch == '>':
        return remove_tags(input[1:], True, output + ' ')