**Implementing A* on iOS** Kirk Spaziani # Introduction ## Background (Graph Theory) [Graph Theory](https://en.wikipedia.org/wiki/Graph_theory) is the study of structures that model pairwise relations between objects. _Graph Theory_ is applicable to many problems in Computer Science and aptitude in _Graph Theory_ is positively correlated with passing those pesky tech interviews. One very common problem in the _Graph Theory_ space is finding the _Shortest Path_ between two vertices. A _Vertex_ (sometimes node) is an abstraction of a point. An _Edge_ connects two vertices. A _Path_ between two vertices is a sequence of edges that connect the vertices. In Figure [sample-graph] below, there are 3 paths from `V1` to `V3`. One path, let's call it path `P1` consists of edges `{ e1, e2 }`. Another path, `P2`, consists of edges `{ e3, e7, e8 }`. The final path, `P3`, consists of `{ e3, e4, e5, e6, e9 }`. *********************************************************************** * .--. e2 * .----->| V2 +-------------------. * .--. e1| '--' | * | V1 +----' v * '-+' .+-. * | .------->+ V3 | * | |e8 '-+' * | | ^ * e3| .-+. |e9 .--. * | .---------------->+ V8 | '--+ V9 | * | |e7 '--' '+-' * | | | * | .+-. .--. .--. | * '--->+ V4 +------->+ V5 +------>+ V7 +---------' * '--' e4 '--' e5 '--' e6 *********************************************************************** [Figure [sample-graph]: Sample Graph] Of the paths `{ P1, P2, P3 }`, which is the _Shortest_? Each _Edge_ of a graph has a _Weight_, which is an abstraction for the time required or difficulty in traversing the _Edge_. The [Shortest Path](https://en.wikipedia.org/wiki/Shortest_path_problem) then, is the _Path_ where the sum of all the _Edge Weights_ is the smallest. ## Why is finding the Shortest Path Important? When you ask Google Maps to get you home, it will guide you through the _Shortest Path_. In this context, the _Shortest Path_ is the one that will take you the least amount of time. It will look at traffic, weather, construction, and other road conditions to determine the _Edge Weights_, the time it will take to traverse each road subsection. As described above, the _Shortest Path_ is the sequence of roads with the lowest total _Weight_ (time to get home). Now think of almost any video game. Bad guys need to get from where they are to where you are if they want to kill you. They could just wander around randomly but that wouldn’t be very effective. Bad Guys in “Classic” games often knew their position relative to yours and would attempt to move towards you. An enterprising gamer would soon be able to trick them into getting stuck behind an object. The bad guy was too stupid to navigate to your position because it couldn’t calculate a proper path. (Bad AI is Bad). Any game where you click on a location and expect a character to move there has to calculate the _Shortest Path_ to that location so the character can navigate there. Some games have hundreds or thousands of entities constantly trying to navigate. This forces us to consider efficiency as well as correctness. There are many algorithms to find _Shortest Paths_. Some are designed to find the _Shortest Path_ between two _Vertices_. Some will find all shortest paths from one _Vertex_ to all others. Some will even find the _Shortest Paths_ from all to all in a _Graph_. ## A* I would like to talk a little bit about one algorithm in particular, A* (A Star). One way to increase efficiency in finding the shortest path is by not exploring edges that are unlikely to be part of the actual shortest path. A* works by using a heuristic to determine what vertex to process next. A* is very useful for grid based games because the heuristic is very easy to write against a grid. “But Kirk!” you exclaim, “How is a grid like a graph?” If you consider each square of the grid to be a _Vertex_, the _Edge_ is the infinitely small border between the square and its neighbor squares. This even allows representation of edge weights. In a game you might expect walking through mountains to be more expensive than walking on roads. I’m going to describe a very high level description of the algorithm. There are many places on the web to learn about most of the specifics, one purpose of this article is to elaborate on an important detail that most (even very well written articles) seem to ignore. The algorithm traverses the _Graph_ beginning with the start point. For each _Vertex_ it visits, it adds to the _Closed Set_ and checks to see if it has reached its destination. If it is at the destination it builds the path and returns. If not it explores all the neighboring _Vertices_ that are not in the _Closed Set_ and adds them to the _Open Set_. Then it selects the best option from the _Open Set_ and repeats the visitation process. When the algorithm needs to choose the next node to visit, it consults the _Open Set_. A*’s advantage is that it chooses the node that is most likely based on the information that it currently has to be the next step on the _Shortest Path_. How does it do that? When the _Vertex_ is added to the _Open Set_, the estimated _Weight_ of the final path that includes that _Vertex_ is calculated by adding the current path weight with the value the heuristic returns for the remainder of the trip. When _Vertices_ are being added to the _Open Set_, the _Open Set_ might already contain that _Vertex_ from a different origin. At this point it needs to be _Reprioritized_. Conceptually that’s a simple operation right? You search the _Open Set_ to see if the _Vertex_ is there. You check to see if this path is better, and you update the relevant values. “Ok, So are we done?” Nope. Pathfinding is an expensive operation, obviously accuracy is important, but if the calculations bring your game to a screeching halt, then we have a problem. Do we want less accurate faster pathfinding? Or can we get more performance out of the algorithm I described? Let’s try first to keep the accuracy and find the inefficiency. # Performance The _Open Set_ I described supports these operations: * __Insert Vertex__ * __Remove “Best” Vertex__ * __Update Vertex__ (remove re-add) ### So what happens if we use an NSMutableArray for those operations? * __Insert Vertex__ is O(1), ok so far, so good. * __Remove "Best" Vertex__ is a search O(n), and a remove O(n). Actually the search is an always search everything so best case is n operations to find the index to remove. Then the removal will copy the items in the array after the removal. * __Update Vertex__ is a linear search, which means O(n). In our cases however, it is essentially free because we already located it. ### Can we use CFBinaryHeap? CFBinaryHeap is a bit unfriendly to use but we can wrap it if necessary to make it feel like a Cocoa object! Isn’t this a great idea? Let’s see: * __Insert Vertex__ has an average case of O(1) and a worst case of O(log n). * __Remove "Best" Vertex__ is O(log n) – that’s way better. * __Update Vertex__ - Remove items until the item to be modified is located. - Modify the item. - Re-add all the items O(n). This solution seems a lot better than the NSMutableArray, but the __Update Vertex__ is still a sore spot, being likely effectively slower, if not asymptotically. Can we make it better? ### Best Way is Best. Imagine if we could create a data structure for the _Open Set_ that had these properties: * __Insert Vertex__ has an average case of O(1) and a worst case of O(log n). * __Remove "Best" Vertex__ is O(log n). * __Update Vertex__ is O(log n). Wow! What data structure is it!? It's a custom Priority Queue implemented with a heap and a dictionary. CFBinaryHeap does not provide functions to do what we need so it's necessary to implement the heap from scratch. The heap must be implemented using an array so that each entry can be accessed by index. The dictionary is a mapping of the _Vertex_ (key) to index (value). Whenever the heap updates the location of a _Vertex_, that information also needs to be sent to the dictionary. # Implementations This article was originally published in August 2014, and it described the A* implementation used by [The Lonely Dwarves](https://goo.gl/bmtztf) as written in Objective-C. Times have changed a bit, and this article focuses on the Swift implementation. The legacy Objective-C Implementation may be reposted at a later date. [Swift A* Implementation](astar-swift.md.html)