Space analysis
The expected number of keys at level i is ∑_{j=1}^n (1/2)^i = n / 2^i.
Summing over all levels gives:
∑_{i=0}^h n / 2^i = 2n - n/2^h ≈ 2n
Time analysis
Worst case for search/insert/delete: O(n + h)
if nearly all elements are at the lowest level, but there is an element near the end at a some greater height h.
Expected height:
expect S_j to have n / 2^j keys.
expect 1 key at in level S_{lg n}.
So expected height is O(log n)
So the expected worst case operation time is O(n)
More refined analysis
We count the number of drops down to a lower level, and the number of advances to the right.
Idea: trace the path backwards. We start in some tower T at level i. How long is the expected reverse path before rising k levels?
Two possibilities:
1. T has height i, and the path goes to the left to a tower of height at least i.
-> must still rise k levels
2. The path rises a level in T.
-> must still rise k-1 levels.
Each of these occurs with probability 1/2. Now we can create a recurrence.
Let C(k) be the expected path length that rises k levels on its backward trajectory.
C(k) = 1/2 (cost of going back to a previous tower + C(k)) + 1/2 (cost of going up a tower + C(k-1))
C(k) = 1/2 (1 + C(k)) + 1/2 (1 + C(k-1))
C(k) = 2 + C(k-1)
but C(0) = 0, so:
C(k) = 2k
The expected height is O(log n), so the expected path length is also O(log n).
A skip list with n items has height at most 3 log n with probability at least 1 - 1/n^2.
So all of our operations are O(log n). Skip lists are also fast and simple to implement (unlike some balanced trees).
Move-to-front heuristic
When we find an element, we move it to the front of the list so that it will be quickly found next time. The keys before it are demoted by one position.
Theorem: C_{mtf} ≤ 2 C_{optimal} (if probabilities are static)
Keys k_1, k_2, …, k_n with probabilities p_1 ≥ p_2 ≥ … ≥ p_n.
Assume we are in a steady state C_{opt} = ∑ j p_j.
C_{mtf} = ∑ p_j (1 + number of keys before k_j)
Probability k_i before k_j = p_i / (p_i + p_j).
So if it's a more likely thing to search for, it's more likely to appear sooner.
(the most likely order in the steady state is the optimal one)
Expected number of keys before k_j.
∑_{i ≠ j} p_i / (p_i + p_j).
So
C_{mtf} = ∑ p_j (1 + ∑_{i ≠ j} p_i / (p_i + p_j))
C_{mtf} = ∑_{j=1}^n p_j + ∑_{j=1}^n ∑_{i ≠ j} [p_i p_j / (p_i + p_j)]
C_{mtf} = 1 + 2 ∑_{j=1}^n p_j ∑_{i<j} p_i / (p_i + p_j)
C_{mtf} ≤ 1 + 2 ∑_{j=1}^n p_j (∑_{i<j} 1)
C_{mtf} ≤ 1 + 2 ∑_{j=1}^n p_j (j-1)
C_{mtf} ≤ 1 + 2 C_{opt} - 2 ∑_{j=1}^n p_j
C_{mtf} ≤ 2 C_{opt} - 1
C_{mtf} ≤ 2 C_{opt}
Approaches steady state quickly, but adversely affected by a rare lookup request.