Core data structure is a list of two-item lists, each giving a person’s name and the count of movies.
For example, after reading the first seven lines of our shortened hanks.txt file, we would have the list
[ ["Hanks, Jim", 3], ["Hanks, Colin", 1],
["Hanks, Bethan", 1], ["Hanks, Tom", 2] ]
Just like our solution from the sets lectures, we can start from the following code:
imdb_file = raw_input("Enter the name of the IMDB file ==> ").strip()
count_list = []
for line in open(imdb_file):
words = line.strip().split('|')
name = words[0].strip()
Like our list solution for finding all IMDB people, this solution is VERY slow — once again (“order of N squared”).
Association between “keys” (like words in an English dictionary) and “values” (like definitions in an English dictionary). The values can be anything.
Examples:
>>> heights = dict() # initialization 1
>>> heights = {} # initialization 2, only one or the other is necessary
>>> heights['belgian horse'] = 162.6
>>> heights['indian elephant'] = 280.0
>>> heights['tiger'] = 91.0
>>> heights['lion'] = 97.0
>>> heights
{'tiger': 91.0, 'belgian horse': 162.6, 'indian elephant': 280.0,
'lion': 97.0}
>>> 'tiger' in heights
True
>>> 'giraffe' in heights
False
>>> heights.keys()
['tiger', 'belgian horse', 'indian elephant', 'lion']
Details:
Just as in sets, the implementation uses hashing of keys.
Hand-write or type each of the following:
Form a dictionary called countries that associates the population with each of the following countries:
Assuming that all of this has been done, what is the output of the following, when typed into the Python interpreter?
>>> print len(countries)
>>> print countries
>>> print countries.keys()
>>> print sorted(countries.keys()) # can you guess what this does?
Even though our coverage of dictionaries has been brief, we already have enough tools to solve our problem of counting movies.
Once again we’ll use the following as a starting point
imdb_file = raw_input("Enter the name of the IMDB file ==> ").strip()
count_list = []
for line in open(imdb_file):
words = line.strip().split('|')
name = words[0].strip()
We will impose an ordering on the output by sorting the keys.
We’ll test first on our smaller data set and then again later on our larger ones.
So far, the values in our dictionaries have been integers and floats.
But, any type can be the values
Here is an example using our IMDB code and a set:
>>> people = dict()
>>> people['Hanks, Tom'] = set()
>>> people['Hanks, Tom'].add('Big')
>>> people['Hanks, Tom'].add('Splash')
>>> people['Hanks, Tom'].add('Forest Gump')
>>> print people['Hanks, Tom']
set(['Big', 'Splash', 'Forest Gump'])
Here is another example where we store the continent and the population for a country instead of just the population:
countries.clear()
countries['Algeria'] = (37100000, 'Africa')
countries['Canada'] = (34945200, 'North America' )
countries['Uganda'] = (32939800, 'Africa')
countries['Morocco'] = (32696600, 'Africa')
countries['Sudan'] = (30894000, 'Africa')
We access the values in the entries using two consecutive subscripts. For example,
name = "Canada"
print "The population of %s is %d" %(name, countries[name][0])
print "It is in the continent of", countries[name][1]