Computer Science GRE Study Guide, Data structures and implementation techniques

Table of Contents

I.    Software Systems and Methodology – 40%

A.    Data organization

1.      Data types

2.      Data structures and implementation techniques

a.       Definition: A data structure is a way of organizing one or more data types into groups. For example, many languages do not have a string data type. To manipulate a string, you actually create a data structure that organizes multiple character data types into one string data type.

b.      There are times when we place items somewhere and then retrieve them later. If you always retrieve the first item you stored then we call it First-in/First-out or FIFO. If you always retrieve the last item stored, then we call it Last-in/First-out or LIFO.

c.       A linked list is a simple data structure that uses cells (or elements). Each cell has at least two data fields. At least one field holds the information that you want stored in the list. The other field holds a pointer to the next cell. We often call this simple linked list a singly linked list. A doubly linked list has two pointers – one pointer points to the next cell, the other points to the previous cell. If the last cell points to the first cell, then we call it a circularly linked list. Interesting and important properties of lists include:

i.        You cannot move backwards on a singly linked list

ii.      In a singly linked list, to see any given cell you must move through all previous cells

iii.    The first and last cells present unique problems and you must deal with them to avoid errors.

d.      In Pascal, we declare linked lists like this:
type Link = ­Cell;
Cell = record
Info : integer;
Next : Link
end;
The Link type is a pointer. The Cell is a record that has two parts – an information field and a pointer (field) to the next cell. To use linked lists, use the rules for pointers. If the list is sorted, then we call it a sorted list.

e.       We use a hash table to store elements so that we can search them in a constant average time. We do not sort hash tables like binary search trees. The table consists of an array of locations. To store an element, we hash the value of the element using a hash function.

f.       Hash functions use some property of the element to figure out where to store it in the hash table. For example, we could take the numerical value of the element (remember that all data in a computer is ultimately a binary number) and divide by the size of the table (the number of locations in the array). We then discard the dividend and keep the remainder as the result (also called modulo). We also may hash on the memory address of the element.

g.      When programming hash functions, remember that strings are very large numbers and will likely overflow at the modulo operator. You must take care to use a function and execution order that will produce valid and distributed results. The result of the hash function is an index to our array. We place the element in that location (i.e. Array[hash index value]). It is possible that more than one element will hash to the same location (a collision). To avoid this, we try to make the hash table sufficiently large. When two values do hash to the same location, we have many choices. The easiest choice is to place the second element in the next available space (linear probing). Quadratic probing is more complicated and yields better results by distributing the elements into more unique locations.

h.      Data structures that are dynamically allocated memory (e.g. linked lists) are given memory from the heap. From time-to-time, the memory area referenced by the pointer will become unreachable. When this happens, we would like the memory to be deallocated, but we have no way to make that happen (i.e. with code). The process of garbage collection is run by the system to deallocate this memory.

i.        Binary trees are hierarchal graphs that have exactly one root and no more than two branches per node. The height or depth of a tree is the number of nodes from the root to the leaf that is farthest away. The breadth of the tree refers to dealing with the nodes that are the same distance from the root. The top level of the tree is considered level 0. The maximum number of nodes you can have at level n is 2n.

j.        If a node has no sub nodes, then we call it a leaf.

k.      A strictly binary tree is a tree where all non-leaf nodes have two sub nodes.

l.        A complete binary tree is a strictly binary tree of height h where all of the leaf nodes are at level h.

m.    A binary tree is a binary search tree if all of the keys (values of the nodes) on the left side of a node are less than the node and all of the keys on the right side of the node are greater than the node.

n.      Three common methods for traversing the nodes of the tree are preorder, inorder, and postorder. The practice test lists the precise order of traversal for each in the instructions. It is useful to recognize that inorder traversal will properly order the values for a binary search tree. Preorder traversal starts at the top and ends in the lower right. Postorder traversal starts in the lower left and ends at the top.

o.      A balanced binary search tree is a search tree that has a maximum height of log n, where n is the number of nodes in the tree.

p.      An AVL tree is a balanced binary tree with an additional rule. From any node, the height of the left and right sub trees can differ by no more than 1.

q.      A priority queue is any data structure that is only concerned with the highest priority item. It only provides 3 operations: insert, locate the highest priority item, and remove the highest priority item.

r.        Since you do not need to organize all of the elements to make a priority queue, it makes no sense to use highly ordered data structures like binary search trees. A binary heap is a tree where the highest priority node is placed at the top of the tree. You can think of this as throwing the highest priority item on the top of the heap. Without going into painful detail, it is easy to implement a tree like this using an array. The elements in the array start at the root, then the first level, then the second level, etc. When you remove the highest priority item, you do not have to reorder all of the elements, just some of them.

⇒ Next section

Liked it? Take a second to support Hunter Hogan on Patreon!
Become a patron at Patreon!