What is model of computation?
Model of computation specifies
- what operations an algorithm is allowed
- cost (time, space, . . . ) of each operation
- cost of algorithm = sum of operation costs
The followings are two kinds of models of computation.
Random Access Machine (RAM)
- It is modeled by a big array.
- For each word, it registers $\theta(1)$
- In $\theta(1)$ time, it can
- load word
- compute (+,-,*,/,&,|,^) on registers
- store register $r_j$ into memory
- What is a word:
- assume basic objects (e.g. int) fit in word
- It is realistic and powerful → implement abstractions
Pointer machine
- can dynamically allocated objects
- object has $O(1)$ fields
- field = word (e.g., int) or pointer to object/null (a.k.a. reference)
- weaker than RAM
Python model
Python lets you use either mode of thinking, e.g.
- “list” is acutually an array in RAM:
L[i] = L[j] + 5: this operation only costs $\theta(1)$ time (constant time) - object with $O(1)$ attributes (including references) is like a pointer machine
$x=x.next$ costs $\theta(1)$ time
The following are some operations and their costs in Python. To determine their cost, imagine implementation in terms of the above two models(RAM or Pointer).
- list
(a) L.append(x): $\theta(1)$ time. (It uses table doubling)
(b) L = L1+L2 ≡
L = []: cost $\theta(1)$ time to build a list
for x in L1: L.append(x) costs $\theta(1)$. Totally in L1 is $\theta(|L1|)$
for x in L2: L.append(x) costs $\theta(1)$. Totally in L2 is $\theta(|L2|)$
Therefore, L = L1+L2 costs $\theta(1+|L1|+|L2|)$ time
(c) L1.extend(L2) ≡
for x in L2: L1.append(x) costs θ(1). Totally $\theta(|L2|)$
≡ L1+ = L2
Therefore, costs θ(1 + |L2|) time
(d) L2 = L1[i : j] ≡
L2 = []: θ(1)
for k in range(i, j): L2.append(L1[i]) costs θ(1)
Therefore, costs $θ(j − i + 1) = O(|L|)$
(e) len(L): $θ(1)$ time - since list stores its length in a field
(f) L.sort(): $θ(|L|log |L|)$ - via comparison sort - tuple, str: similar
- dict: via hashing, costs $θ(1)$ time
- set: similar (think of as dict without vals)
- heapq: heappush & heappop - via heaps → $θ(log(n))$ time
- long: via Karatsuba algorithm
x + y → O(|x| + |y|) time where |y| reflects # words
x ∗ y → O((|x| + |y|)log(3)) ≈ O((|x| + |y|)1.58) time
Document Distance Problem — compute d(D1, D2)
The problem is acutually to find similarity in documents, and have application in detecting duplicates, plagiarism, and also in web search (D2 = query).
In this problem, we define word as the sequence of alphanumeric characters, and document as a sequence of words (ignore space, punctuation).
The idea is to define distance in terms of shared words. Think of document D as a vector:
If three axis are defined as three words: the, dog, cat
Then vector v1 could be “the cat”, vector v2 could be “the dog”, vector v3 could be “cat dog”
After looking them as vectors, then we can apply mathematical methods to calculate the distance between these vectors like angle
The algorithm can be formed as follows:
- split each document into words
- count word frequencies (document vectors)
- compute dot product (& divide)