Graph Search

In this guide, we’re going to explore graph search algorithms.

The graph we’ll use throughout this guide is the one from problem 107 in Project Euler, a popular series of programming problems. Please take a few moments to read the problem and download the text file containing the adjacency matrix. Note that the given graph is connected, meaning we can reach any part of the graph from any vertex.

If you’re curious about the problem’s solution, I’ve written up a guide here. I recommend working through it before beginning this one.

But here, we’re going to focus on implementation. You should own and follow along with a good algorithms textbook. I recommend Dasgupta’s text.

To follow along, create a new directory and place the adjacency matrix file in it. Then download Scala, open your command prompt, navigate to the directory, type scala to open your REPL, and enter the code line by line (resist the urge to copy and paste).

Why is this important?

Who are your Facebook friends of friends that share your political views? How can you contact them in the hopes of getting them to vote? The algorithm to implement this web crawler is a straight-forward breadth-first search that stops at a distance of two.

Google Maps needs to calculate the fastest route between two locations, given a large number of paths with different traffic times. It does this using A* search, an adaptation of Dijkstra’s.

What language are we using?

Scala! (Pronounced Scah-laa)

Like Python and Ruby, it allows imperative and functional programming, as well as exposes a great REPL. In addition, unlike those languages, it’s compiled, statically-typed, supports pattern-matching, runs on the JVM, and interacts seamlessly with Java code. It’s basically the perfect language.

If you’d like to learn Scala, I recommend Scala creator Martin Odersky’s Scala By Example. For a reference to quickly glance into if you just want to know something specific, I recommend Alvin Alexander’s Scala Cookbook.

Translating these algorithms into a language like Python should be trivial. I’ve chosen Scala for its expressiveness, and am reasonably confident that code of the same clarity would take more lines in any other language.

Before we begin

I assume you have some basic familiarity with graphs (the kind with vertices and edges, not the sort that track stocks). You should know what vertices, edges (weighted and unweighted), and adjacency matrices are.

I also assume you understand object-oriented programming and know a few basics of functional programming. You should know what a map does, for example, and what first-class functions are. If you know Java, you should be able to grok Scala easily enough.

Go ahead and navigate to the directory containing the adjacency matrix file, then enter scala at your command prompt to open your REPL. Let’s begin!

As a first step, we’d like to read our provided adjacency matrix into in-memory vertices and edges, and switch to an adjacency-list representation, where each Vertex keeps track of the edges it’s part of. For the rest of the guide, we’ll simply use the information in the objects we populate here.

Let’s start by defining our edge and vertex classes.

import collection.mutable.ArrayBuffer
class Edge(val weight: Int, val start: Int, val end: Int)
class Vertex(val id: Int, var neighbors: ArrayBuffer[Edge] = ArrayBuffer[Edge]())

Now, let’s read our adjacency matrix file into memory.

val adjMatrix = io.Source.fromFile("p107_network.txt").getLines.toVector

Next, we’ll define an array of vertices indexed by id.

val vertices = Vector.range(0, adjMatrix.size).map(new Vertex(_))

Now we populate our graph, and the fun begins!

for ((adjMatrixRow, i) <- adjMatrix.zipWithIndex) {
    val rowEdgeWeights = adjMatrixRow.split(",").map(_ match { case "-" => -1 
                                                               case x => x.toInt })
    for ((weight, j) <- rowEdgeWeights.zipWithIndex) {
        if (weight != -1) {
            vertices(i).neighbors += new Edge(weight, i, j)
        }
    }
}

Note that if we now close our REPL, we lose all information from the graph, and the above steps will need to be run again.

Algorithms

If you haven’t already, please complete the steps in “Before we begin”.

Let’s start by augmenting our vertices with a boolean field we can use to check whether they’ve been visited or not.

class VisitableVertex(val id: Int, val neighbors: ArrayBuffer[Edge], 
                      var visited: Boolean = false) {
    def this(v: Vertex) = this(v.id, v.neighbors)
}

val visitableVertices = vertices.map(new VisitableVertex(_))

First, a recursive solution to avoid needing an explicit stack.

def dfs(v: VisitableVertex) = {
    for (e <- v.neighbors) {
        val neighbor = visitableVertices(e.end)
        if (!neighbor.visited) {
            neighbor.visited = true
            dfs(neighbor)
        }
    }
}

If running this multiple times, make sure to reset all visited flags between runs.

visitableVertices.foreach(_.visited = false)

This is guaranteed to visit every vertex in a connected graph like our given one, starting at any vertex. However, the recursive implementation means we create a new stack frame for each vertex, forcing O(V) space. With forty vertices, this isn’t bad, but on larger graphs, we’re just begging for a stack overflow. We can mitigate this by using an explicit stack.

def dfs(v: VisitableVertex) = {
    var dfsStack = collection.mutable.Stack[VisitableVertex]()
    v.visited = true
    dfsStack.push(v)
    while (!dfsStack.isEmpty) {
        for (e <- dfsStack.pop.neighbors) {
            val neighbor = visitableVertices(e.end)
            if (!neighbor.visited) {
                neighbor.visited = true
                dfsStack.push(neighbor)
            }
        }
    }
    visitableVertices.foreach(_.visited = false)
}

If you haven’t already, please complete the steps in “Before We Begin”.

Let’s start by augmenting our vertices with a distance field to track how far it is from the starting vertex, and a predecessor field so we can reconstruct our path at the end of the algorithm.

class DistVertex(val id: Int, val neighbors: ArrayBuffer[Edge], 
                 var distance: Int = Int.MaxValue, 
                 var predecessor: DistVertex = null) {
    def this(v: Vertex) = this(v.id, v.neighbors)
}

val distVertices = vertices.map(new DistVertex(_))

We implement breadth-first search using a queue. Since we want to first consider all neighboring vertices before any neighboring the neighboring vertices, we’ll keep enqueuing neighbors yet to be processed and dequeuing vertices as we process them.

def bfs(s: DistVertex) = {
    s.distance = 0
    var bfsQueue = collection.mutable.Queue[DistVertex](s)
    while (!bfsQueue.isEmpty) {
        val v = bfsQueue.dequeue
        for (e <- v.neighbors) {
            val neighbor = distVertices(e.end)
            if (neighbor.distance == Int.MaxValue) {
                bfsQueue.enqueue(neighbor)
                neighbor.distance = v.distance + 1
                neighbor.predecessor = v
            }
        }
    }
}

Now, looking at our given graph, what is the shortest path from the vertex with id 31 to every other vertex? Thanks to breadth-first search, we can now answer this question!

def printAllPaths(s: DistVertex, searchFxn: DistVertex => Unit) = {
    searchFxn(s)

    def printPath(e: DistVertex) = {
        var path = ArrayBuffer[Int](e.id)
        var v = e
        while (v.predecessor != null) {
            v = v.predecessor
            path += v.id
        }
        println(path.reverse.mkString(","))
    }

    distVertices.foreach(printPath)
    
    def reset(v: DistVertex) = {
        v.predecessor = null
        v.distance = Int.MaxValue
    }

    distVertices.foreach(reset)
}

printAllPaths(distVertices(31), bfs)

Do you understand what we’ve discussed? Great! Now prove it in the language of your choice!

Dijkstra’s

If you haven’t already, please complete the steps in “Before we begin” and “Breadth-first search”. Dijkstra’s is just an adaptation that takes into account edge weights.

Remarkably, we can use the same vertex class as we did in breadth-first search. The only difference is that the distance field now refers to the sum of the weights of the edges on the path leading to the vertex instead of the number of vertices on the path to the vertex.

The trick to understanding Dijkstra’s is to see it as a BFS where we use a priority queue (in which the highest priority goes to the lowest distance) instead of a regular queue, and instead of just adding one to the distance field of each neighboring vertex, we keep checking all edges leading to that vertex and updating its distance accordingly. Otherwise, the code is virtually identical to breadth-first search.

def dijkstra(s: DistVertex) = {
    s.distance = 0
    val dijkstraQueue : collection.mutable.PriorityQueue[DistVertex] = 
    collection.mutable.PriorityQueue(s)(math.Ordering.by(v => -(v.distance)))
    while (!dijkstraQueue.isEmpty) {
        val v = dijkstraQueue.dequeue
        for (e <- v.neighbors) {
            val neighbor = distVertices(e.end)
            if (neighbor.distance > v.distance + e.weight) {
                neighbor.distance = v.distance + e.weight
                neighbor.predecessor = v
                dijkstraQueue.enqueue(neighbor)
            }
        }
    }
}

Please note that while this implementation of Dijkstra’s works, it’s not optimal. Unfortunately, limitations on Scala’s (and Java’s) priority queue implementation prevent us from exploring the real Dijkstra’s algorithm unless we implement our own priority queue, which I presently deem a bit beyond the scope of this guide. I may return to this and implement a proper Dijkstra’s later.

Now, looking at our given graph, what is the lowest-weight path from the vertex with id 31 to every other vertex? Thanks to Dijkstra’s algorithm, we can now answer this question!

printAllPaths(distVertices(31), dijkstra)

To demonstrate the difference between Dijkstra’s and breadth-first search, let’s examine their respective paths from vertex 31 to vertex 28.

First, to easily find the edge connecting two vertices, we’ll turn the adjacency matrix into a 2D array indexed by id.

val edges = adjMatrix.map(_.split(",").map(_ match{case "-" => 0 case x => x.toInt}))

Now, looking at our printed paths, note that BFS from 31 to 28 yields 31,28, while Dijkstra’s yields 31,15,8,34,28. Let’s evaluate the total weight of each of these paths.

val bfsWeight = edges(31)(28)
val dijkstraWeight = edges(31)(15) + edges(15)(8) + edges(8)(34) + edges(34)(28)

While BFS gives us a one-edge path of weight 505, Dijkstra’s finds us a four-edge path with a total weight of 287!

Created on December 4 2016, last modified on October 1 2017.


Twitter
Facebook