recitation 24: the halting problem & garbage collection
------------------------------------------------------------------------
* is everything computable? what's the halting problem? (from sicp ex 4.15)
* can you define a procedure (halts? <program> <input>) that returns
#t if (<program> <input>) halts (returns a value) and #f if it results
in an error or an infinite loop?
(define (run-forever) (run-forever))
(define (halt-dummy x) 'done)
(define (run-forever-dummy x) (run-forever))
(run-forever) ==> <infinite loop>
(halts? halt-dummy 1) ==> #t
(halts? run-forever-dummy 1) ==> #f
ok, once halts? has been defined, let's do:
(define (try p)
(if (halts? p p)
(run-forever)
'halted))
* what happens when we evaluate (try halt-dummy)?
==> infinite loop
* what happens when we evaluate (try run-forever-dummy)?
==> halted
* what happens when we evaluate (try try)?
Well, we need to guess whether (try try) halts or not.
case 1: (halts try try) ==> #t (it halts)
so (try try) runs forever, a contradiction
case 2: (halts try try) ==> #f (it doesn't halt)
so (try try) halts, also a contradition
thus the procedure halts? must not exist, and cannot be written.
------------------------------------------------------------------------
* Not everything sitting in memory is useful. Garbage is anything
that cannot have any influence on the future computation. Consider
the following four procedures. When applied, will Scheme (with
garbage collection) run out of memory or go into an infinite loop?
(define (foo)
(foo)) ==> infinite loop
(define (foo)
(let ((x (cons 1 2)))
(foo))) ==> infinite loop
(define (foo z)
(let ((x (cons 1 2)))
(foo x))) ==> infinite loop
(define (foo z)
(let ((x (cons 1 z)))
(foo x))) ==> error! out of memory
------------------------------------------------------------------------
* Memory consists of many cells, each cell stores a number and a tag.
The tag is N for number, P for pointer or E for NULL. Using this
implementation, (cons A B) does this:
1. Look for the next FREE location
2. Store A at the-cars[i]
3. Store B at the-cdrs[i]
4. Return Pi, a pointer to the new cons cell.
* Draw the box-and-pointer diagram starting from P5:
0 1 2 3 4 5 6 7 8 9
the-cars N3 N4 P0 N3 N5 P2 N2 N3 P1 N4
the-cdrs E0 E0 P4 P5 P0 P6 P5 P3 P3 N5
so cells 1, 3, 7, 8, & 9 are garbage!
------------------------------------------------------------------------
* Garbage Collection Technique 1: Reference Counting
1. Attach a counter to each pair in memory.
2. When a new pointer is connected to that pair, increment the counter.
3. When a pointer is removed, decrement the counter.
4. Any cell with 0 counter is garbage.
* Consider the following example, and draw the box-and-pointer
diagrams below, keeping a "reference counter" with each cell.
(define a (list 1 2 3))
(set-cdr! a 4)
(set-cdr! a a)
(set! a 1)
+ reference counting is fast and incremental
- reference counting can't handle cyclical data structures!
? reference counting requires ~50% extra memory
(1 integer per cons cell)
------------------------------------------------------------------------
* Garbage Collection Technique 2: Stop and Copy
1. Split memory in half (working memory and copy memory).
2. When out of working memory, stop computation and begin garbage collection.
1. Place scan and free pointers at the start of the copy memory.
2. Copy the Root to copy memory, incrementing free. For each cell
copied, put a "Broken Heart" token in the car and the forwarding
address in the cdr.
3. Starting at the scan pointer, check the car and the cdr. If
either is a pointer, look at the location in working
memory. If it's already been copied (i.e. it has a "Broken
Heart"), update the reference. Otherwise, copy the location
and put the "Broken Heart" in the old location.
4. Repeat until scan = free.
5. Swap the roles of the working and copy memory.
* let's perform stop-and-copy on the following with Root = P5
working memory
0 1 2 3 4 5 6 7 8 9
the-cars N3 N4 P0 N3 N5 P2 N2 N3 P1 N4
the-cdrs E0 E0 P4 P5 P0 P6 P5 P3 P3 N5
copy memory
10 11 12 13 14 15 16 17 18 19
the-cars
the-cdrs
working memory
0 1 2 3 4 5 6 7 8 9
the-cars bh N4 bh N3 bh bh bh N3 P1 N4
the-cdrs P13 E0 P11 P5 P14 P10 P12 P3 P3 N5
copy memory
10 11 12 13 14 15 16 17 18 19
the-cars P11 P13 N2 N3 N5
the-cdrs P12 P14 P10 E0 P11
- stop-and-copy requires a long pause in program execution
+ stop-and-copy can handle cyclical data structures!
- stop-and-copy requires ~100% extra memory
(can only use half your memory)
+ stop-and-copy runs fast if most of your memory is garbage
(it only touches the cells reachable from the root)
+ data is clustered together and memory is "de-fragmented"
------------------------------------------------------------------------
* Garbage Collection Technique 3: Mark-Sweep
1. Add a mark bit to each location in memory.
2. Keep a free pointer to the head of the free list.
3. When memory runs out, stop computation, clear the mark bits and
begin garbage collection.
4. Mark
1. Start at the Root and follow the accessible structure
(keeping a stack of where you still need to go).
2. Mark every cell you visit.
3. Stop when you see a marked cell, so you don't go into a cycle.
5. Sweep
1. Start at the end of memory, and build a new free list.
2. If a cell is unmarked, then it's garbage, so hook it into
the free list.
* let's perform Mark-Sweep on the following with Root = P5
0 1 2 3 4 5 6 7 8 9 10
the-cars N3 N4 P0 N3 N5 P2 N2 N3 P1 N4 P5
the-cdrs E0 E0 P4 P5 P0 P6 P5 P3 P3 N5 N7
marks
0 1 2 3 4 5 6 7 8 9 10
the-cars N3 E0 P0 E0 N5 P2 N2 E0 E0 E0 E0
the-cdrs E0 P3 P4 P7 P0 P6 P5 P8 P9 P10 E0
marks 1 0 1 0 1 1 1 0 0 0 0
free list ==> P1 (which points to P3, P7, P8, P9 & P10)
- mark-sweep requires a long pause in program execution
+ mark-sweep can handle cyclical data structures!
+ mark-sweep requires ~1-2% extra memory
(just one bit per cons cell)
- mark-sweep runs the same speed regardless of how much of
memory is garbage (it must touch all cells in the mark-phase,
and must link together all garbage cells into a free list.
------------------------------------------------------------------------