kanji koohii FORUM
Seeking advice from python 2.7 programmers - Printable Version

+- kanji koohii FORUM (http://forum.koohii.com)
+-- Forum: Learning Japanese (http://forum.koohii.com/forum-4.html)
+--- Forum: Off topic (http://forum.koohii.com/forum-13.html)
+--- Thread: Seeking advice from python 2.7 programmers (/thread-10560.html)



Seeking advice from python 2.7 programmers - Oniichan - 2013-02-26

EDIT: my goal is to learn python well enough that I can eventually write tools/plugins useful for learning Japanese. I'm using this thread as if it were a Lang-8 journal for the Python language. So if you find something you could improve, please let me know!


I started learning python 2.7 after being inspired by this thread: http://forum.koohii.com/showthread.php?tid=9929

After doing some exercises online I decided to write a simple script that randomly picks what subjects I'll study each day. Here is the first draft:

EDIT: RawToast, just fixed the formatting with 2 'code' blocks.

Code:
# -*- coding: utf-8 -*-

#Import the specific function from the random.py module
#An alternative would be to just import the entire module
#using 'import random'

from random import randrange

#First the program asks how many slots I want to fill
#use input() to save the variable as a number
#or use raw_input() to save it as a string

slots = input ("How many slots do you want to fill today? (max. 10) ")
while (slots > 10):
    slots = input ("Sorry, 10 slots is the maximum today. Please choose again. ")
print "Ok, %d it is then." % slots

#Here are my lists. Note that the first position is "0" not "1".

Subject_list = ['RSH', 'RTK', 'Guitar', 'Reading', 'Grammar', 'Programming'];
Order_list = ['first', 'second', 'third', 'fourth', 'fifth', 'sixth', 'seventh', 'eighth', 'ninth', 'tenth']

#This loop chooses the subjects

count = 0
Code:
j = 0
for count in range(0,slots):  #to iterate between 0 to chosen number of slots to fill
    if j < slots - 1:
        subject = Subject_list[randrange(0,len(Subject_list))]
        order = Order_list[j]
        print "The %s subject is %s" % (order, subject)
        count = count + 1
        j = j + 1
print "The final subject will be %s" % subject
print "Done."



Seeking advice from python 2.7 programmers - Oniichan - 2013-02-26

BTW, I've run it successfully on my windows 7 machine.

My first question is whether this code is decent/efficient? Or is there a more elegant way to achieve the same thing?

Second, what could I do to limit the number of repeats of a subject (ie, limit a subject to only 2 appearances in the result)?

Thank you!


Seeking advice from python 2.7 programmers - chamcham - 2013-02-26

There is an Android app that does the same thing.

It is called "Random".
Just choose the "Raffle Drawing" setting and specify how many random items you want.

Link:
https://play.google.com/store/apps/details?id=com.krauxe.random


Another option is http://www.random.org

I've only been reading about Python 3.
The for loop is not very Pythonic.
You're writing it as if you're writing in C/C++/Java.

Maybe

for count in range(0,slots-2): #to iterate between 0 to chosen number of slots to fill
subject = Subject_list[randrange(0,len(Subject_list))]
order = Order_list[count]
print "The %s subject is %s" % (order, subject)

I don't know if it works in Python 2.7. But count will automatically iterate
from 0 to slots. I haven't compiled the code.

I put "slots-2" because the if statement says "if count < slots-1".
So that's true from 0 to slots-2. Of course, you'll have to check if slots >= 2.

Unless you meant to say "if count <= slots-1".
In that case, the for loop is "for count in range(0,slots-1)"

I don't think you even need to increment "count" and "j".
In fact, get rid of the "j" variable, it's value is always the same as "count".
So it's not even necessary.


Seeking advice from python 2.7 programmers - chamcham - 2013-02-26

To make it even shorter, you can use:

Subject_list = ['RSH', 'RTK', 'Guitar', 'Reading', 'Grammar', 'Programming']
print(random.sample(Subject_list,2))

where "2" is how many random items you want.


Seeking advice from python 2.7 programmers - Oniichan - 2013-02-26

chamcham Wrote:I've only been reading about Python 3.
The for loop is not very Pythonic.
You're writing it as if you're writing in C/C++/Java.
This is exactly the sort of style advice I'm looking for, thank you!
After some tinkering I came up with:
Code:
count = 0
while (count < slots - 1):
        subject = Subject_list[randrange(0,len(Subject_list))]
        order = Order_list[count]
        print "The %s subject is %s" % (order, subject)
        count = count + 1
print "The final subject will be %s" % subject
print "Done."
chamcham Wrote:I don't think you even need to increment "count" and "j".
In fact, get rid of the "j" variable, it's value is always the same as "count".
So it's not even necessary.
You are correct. I dumped the 'for' and 'j'.

chamcham Wrote:But there is an Android app that does the same thing...
True, but my intent is to learn how to program in python. Next, I'd like to add some constraints on the output such as limiting the number of repeats so I don't end up with something like this: RSH, shadowing, RSH, RSH, etc.


Seeking advice from python 2.7 programmers - Oniichan - 2013-02-26

chamcham Wrote:To make it even shorter, you can use:

Subject_list = ['RSH', 'RTK', 'Guitar', 'Reading', 'Grammar', 'Programming']
print(random.sample(Subject_list,2))

where "2" is how many random items you want.
I'll give this a try. I think it would work, but I'm worried that it won't allow me to control the output. Thanks for introducing it though. I'm sure it will come in handy at some point in my python journey.

Actually, what if the output is "tested" for repeats before it is printed? Couldn't it regenerate the sample until the output meets my criteria? (I'm thinking: sample>test based on criteria>if it passes (print) /else create a new sample until it does


Seeking advice from python 2.7 programmers - Oniichan - 2013-02-26

Here is a rewrite that removes the Order_list list:
Code:
count = 1

while (count < slots):
    subject = Subject_list[randrange(0,len(Subject_list))]
    if count == 1:
        order = "st"
    elif count == 2:
        order = "nd"
    elif count == 3:
        order = "rd"
    else:
        order = "th"
    print "The %s%s subject is %s" % (count, order, subject)
    count = count + 1
print "The final subject will be %s" % subject
print "Done."
It only works for 1st-10th, which is all the program currently allows anyway.

EDIT 1: figure out how to do it with larger numbers
Code:
count = 1

while (count < slots):
    subject = Subject_list[randrange(0,len(Subject_list))]
    if count in [1, 21, 31, 41, 51, 61, 71, 81, 91]:
        order = "st"
    elif count in [2, 22, 32, 42, 52, 62, 72, 82, 92]:
        order = "nd"
    elif count in [3, 23, 33, 43, 53, 63, 73, 83, 93]:
        order = "rd"
    else:
        order = "th"
    print "The %s%s subject is %s" % (count, order, subject)
    count = count + 1
print "The final subject will be %s" % subject
print "Done."
Next will be giving it the ability to render any number without the need of a gigantic list. This is fun!

EDIT 2: Done
Code:
count = 1

while (count < slots):
    subject = Subject_list[randrange(0,len(Subject_list))]
    if count %100 in [11, 12, 13]:
        order = "th"
    elif count % 10 == 1:
        order = "st"
    elif count % 10 == 2:
        order = "nd"
    elif count % 10 == 3:
        order = "rd"
    else:
        order = "th"
    print "The %s%s subject is %s" % (count, order, subject)
    count = count + 1
print "The final subject will be %s" % subject
print "Done."



Seeking advice from python 2.7 programmers - Oniichan - 2013-02-26

Thanks chamcham for the script.
Code:
Subject_list2 = ['RSH', 'RTK', 'Guitar', 'Reading', 'Grammar', 'Programming']
number = input ("How many slots would you like?" )
while (number > len(Subject_list2)):
    number = input("Sorry, please try again" )
print(random.sample(Subject_list2,number))
Which, if I chose the number '4', would yield something like:

Code:
['RTK', 'Grammar', 'Guitar', 'RSH']
But, if i choose something greater than 7, I get an error because I am limited to the number of entries in the list. Also, the script in this form can only print each subject once. I'm guessing that I'd need to sample it a few times to get more than 7 results?

Also, instead of printing them, I want to assign each one to a new variable. That way I can compare them and look for repeats. How would I do this?

I was thinking:
slotn where n = 1 to number (eg, slot1, slot2,...)
but I don't know how to use a variable within a variable or how to generate a series of variables. Any thoughts?


Seeking advice from python 2.7 programmers - Oniichan - 2013-02-26

Here's a leaner version of the evolving script:
Code:
import random

#Define subject list
Subject_list = ['RSH', 'RTK', 'Guitar', 'Reading', 'Grammar', 'Programming']
L = []

#Ask for input
number = input ("How many slots would you like? " )
repeat = input ("How many repeats are permissible? " )

#Output loop

for count in range(number):
    item = (random.sample(Subject_list,1))
    L.append(item)
for item in L:
    print item
print "Done!"
Now I have a list that needs pruning. I'm thinking that while the list is being generated, the script will need to examine the items to see how many times they already occur in the list before adding them. And the number of iterations should be set to length of of list 'L', not to 'number'.

Something like... sample list to get item > check against list L for duplicates and count them to determine if they are below the threshold (repeat) > if they are, add the item to the list and check total number of items against the total requested (number)>etc


Seeking advice from python 2.7 programmers - headphone_child - 2013-02-26

You have the right idea. It could look something like this.

Code:
#Output loop
while len(L) < number:
    item = (random.sample(subject_list,1))[0]
    if L.count(item) < repeat:
        L.append(item)

for i in range(len(L)):
    print "%d. %s" % (i+1, L[i])
One issue with this approach is the potential for infinite loops. For example, if you pick 9 slots with a repeat limit of 1, you'd get an infinite loop with this code, since there are only 6 choices total. You could make sure that number * repeat <= len(subject_list) as a validation rule when prompting though. Even with this solved though, this method also suffers from lack of efficiency since you're throwing away calculations for each time the repeat limit is reached (most noticeable when number * repeat == len(subject_list)). In this case it doesn't matter at all since the list sizes are so small that the performance hit won't be noticeable. But if it did matter, here's an alternate approach that avoids that issue at the cost of using potentially a lot more memory ("repeat" times more). There are definitely better approaches, but I just wanted to demonstrate one alternative.

Code:
#Output loop (not a loop anymore)
L = subject_list * repeat
L = random.sample(L, number)

for i in range(len(L)):
    print "%d. %s" % (i+1, L[i])



Seeking advice from python 2.7 programmers - chamcham - 2013-02-26

If you don't want any repeats at all, then use a set.
Remove random items from the set each time.

Another approach is to randomize the list and just do the top N items.
So if there are 6 items. Randomize the whole list. And then do the first 2 or 3 (or whatever).


Seeking advice from python 2.7 programmers - tokyostyle - 2013-02-26

Semi-random thoughts from someone who sits in his Tokyo apartment all day and codes from home for a living.

First, if you want to live and work in Japan switch to Ruby and Rails. There's nothing wrong with Python at all but there is a huge culture of rails usage here and that is unlikely to change anytime soon.

Secondly get used to checking out lots of GitHub projects. You'll learn a lot more from the python community than you will from the one or two people who have some python experience here.

Finally pick a framework and go. There's no reason you couldn't have written your app in Rails or Django.

I've kind of been wanting something similar to your script but too much real work has kept me from pursuing it.


Seeking advice from python 2.7 programmers - Panta - 2013-02-26

Here's an alternative to headphone_child's script using dict comprehension to keep track of how many times the subject may appear. It looks rather Pythonic to me (for better or worse).

Code:
import random

subjects = ['RSH', 'RTK', 'Guitar', 'Reading', 'Grammar', 'Programming']
L = []

#Ask for input
number = input ("How many slots would you like? " )
repeat = input ("How many repeats are permissible? " )

#Make a dict via list/dict comprehension, {'RSH':repeat, 'RTK':repeat...}
#This will keep track of the number of times a subject may appear again
subjectDict={subject:repeat for subject in subjects}

#xrange is a generator, i.e. only makes sense in a for loop.
#range creates a list first and then iterates over it.
#in Python 3.x, xrange becomes the new range.
for slot in xrange(number):
    subject=(random.choice(subjectDict.keys())) #keys() to go through the subject names
    L.append(subject)

    #Entry may reappear one time less. Remove from the dict if it may not appear anymore.
    subjectDict[subject]-=1
    if subjectDict[subject]==0:
        del subjectDict[subject]


for item in L:
    print item



Seeking advice from python 2.7 programmers - Oniichan - 2013-02-26

Panta Wrote:Here's an alternative to chamcham's script using dict comprehension to keep track of how many times the subject may appear. It looks rather Pythonic to me (for better or worse)...
Thank you, that works perfectly. I especially appreciate the comments you added to the code. I added the following to keep it from breaking when the numbers are out of range:
Code:
while (number > len(subjects) * repeat):
    repeat = input ("I'm sorry, please choose a larger number. ")
Now, it's time to read up on Dict and range.


Seeking advice from python 2.7 programmers - Oniichan - 2013-02-26

tokyostyle Wrote:Semi-random thoughts from someone who sits in his Tokyo apartment all day and codes from home for a living.

First, if you want to live and work in Japan switch to Ruby and Rails. There's nothing wrong with Python at all but there is a huge culture of rails usage here and that is unlikely to change anytime soon.
That's great advice. While I was aware that Ruby was developed in Japan, I had no idea of its impact there. I'd like to learn it someday, but I plan on sticking with python for now since I'm really enjoying it. Would it be too confusing to learn them both together (slowly); writing code for one, then figuring out how to accomplish the same in the other (properly)?

tokyostyle Wrote:Secondly get used to checking out lots of GitHub projects. You'll learn a lot more from the python community than you will from the one or two people who have some python experience here.
Love that advice. It reminds of a saying writers often share 'If you want to become a better writer, become a better reader.'


Seeking advice from python 2.7 programmers - Oniichan - 2013-02-26

chamcham Wrote:If you don't want any repeats at all, then use a set.
Remove random items from the set each time.
I had to look up 'sets'. From Python docs:

"...(For other containers see the built in dict, list, and tuple classes, and the collections module.). . . Like other collections, sets support x in set, len(set), and for x in set. Being an unordered collection, sets do not record element position or order of insertion. Accordingly, sets do not support indexing, slicing, or other sequence-like behavior."

I guess that my next assignment will be to study/play with set,dict, list, tuple, etc. and learn how each works and how they differ from one another. Thanks again.


chamcham Wrote:Another approach is to randomize the list and just do the top N items.
So if there are 6 items. Randomize the whole list. And then do the first 2 or 3 (or whatever).
Yes, I could see how this would work. Especially if I added headphone_child's idea of defining a new list as a multiple of the original like so:
Code:
L = subject_list * repeat
I suppose the difference between this approach and his would be like the difference between a mediaplayer randomly picking a song the previous one has finished vs. creating a new list of songs by shuffling all the songs, then playing them sequentially (ie, one method shuffles the whole list first, then grabs an item from the top/bottom etc., while the other samples the original list and tests for repeats.)


Seeking advice from python 2.7 programmers - Oniichan - 2013-02-26

headphone_child Wrote:...Even with this solved though, this method also suffers from lack of efficiency since you're throwing away calculations for each time the repeat limit is reached (most noticeable when number * repeat == len(subject_list)). In this case it doesn't matter at all since the list sizes are so small that the performance hit won't be noticeable. But if it did matter, here's an alternate approach that avoids that issue at the cost of using potentially a lot more memory ("repeat" times more). There are definitely better approaches, but I just wanted to demonstrate one alternative...
More golden advice in this thread, thank you! As a novice, it would be easy for me to ignore/not notice the cost of using different modules and approaches when writing a script. That gives me two more things to consider as my studies progress. Namely, how to use memory and cpu processing efficiently and how to write readable script.


Seeking advice from python 2.7 programmers - Panta - 2013-02-27

Oniichan Wrote:I guess that my next assignment will be to study/play with set,dict, list, tuple, etc. and learn how each works and how they differ from one another. Thanks again.
Sets are not as essential as lists and dicts. In fact, you can emulate sets with lists by hacking together the same functionality, e.g. when trying to add an element checking if the element is found in the list already. Doing this check in Python however is rather slow compared to using the set functionality. However, unless for performance reasons, using lists is the better way (especially for a beginner).

E.g. I once did some analyzing of binary XML files found in a game (a custom format just for the game). Basically at the top of the file there was a list of all keywords in the file, but later on the actual entries used 4 byte hashes instead of the keywords. The game hashed each keyword and used that hash for the entries instead; e.g. "DataContainer" became 726c6ee4. Now as I had no clue about the hashing function I had a script run over all binary XML files and take note of the keywords and hashes in every file. The reasoning was that by going over several files I could filter out the correct hash for a keyword (as a common denominator of several files). So running through all 50k files for every keyword (about 9k) I initially compared the elements of each list for each file against each other, checking if one element was found in the other etc. which took about three hours for a single keyword. As I wasn't going to wait for three years, I replaced the lists with sets and the script was done with the other 9000 in just three more hours.

Long story short, use lists; if there's a performance issue that can be solved with sets, go for it. Tuples are pretty useless in Python to be honest (read-only lists at best). Also, in Python it's more important to have maintainability and get things done/scripted fast than having the script run as fast as possible. Optimize your script when you DO need the performance (e.g. because you don't want to wait three years for the script to finish). If you still want to optimize rather short snippets of code, keep in mind that Python puts so many layers between you and the actual machine code it's hard to judge whether a certain approach is fast or slow so you'll need to use cProfile.


Seeking advice from python 2.7 programmers - Oniichan - 2013-02-28

Well, I've been studying lists lately and stumbled upon a small problem when generating them with the csv module. It seems that whenever the source file contains Japanese text encoded in utf8, my script either outputs Chinese or breaks.

Code:
import csv

L = []
M = []
N = []

with open('test.tsv', 'rb') as csvfile:
    csvfile.seek(0)
    reader = csv.reader(csvfile, dialect='excel-tab')
    for row in reader:
        item = [row[0]]
        L.append(item)
        item = [row[1]]
        M.append(item)
        item = [row[2]]
        N.append(item)

for item in L:
    print item[0]

for item in M:
    print item[0]
  
for item in N:
    print item[0]
The script is supposed to store each of the 3 columns in the tsv file as a list. Then, print the items in each list. This version prints the Japanese containing column as Chinese.

If I add the following to the script:
Code:
import locale

locale.setlocale(locale.LC_ALL, 'jpn')
the script breaks when it tries to print the Japanese column.

Any ideas what is happening here? I've already skimmed about 20 pages I've found online that touched upon encodings, and I've tried various fixes.

Here is a little more information: the tsv file is encoded in utf-8; I've never edited it since it was generated; it displays properly in notepad, firefox, etc.; one time I was able to get the script tp read a different file containing Japanese, but it printed the characters as like this \x9a\x84 etc. Thank you.


Seeking advice from python 2.7 programmers - Oniichan - 2013-02-28

Hyperborea Wrote:...As for learning another language, that can be useful to give you a deeper understanding of things and to generalize what you already know. After you learned a few languages of one type (Python is a procedural object oriented language) then learning others of the same family doesn't take much time. However, at the beginning you would be better served by learning more of the one language and the underlying concepts (how to use a set, how to sort, input, output, etc.) before learning another language. Think of it like learning English. After a couple of weeks of study you would better served by learning more about English and how to study a language rather than going off and learning German.
Thank you. I'm going to follow your advice and stick with Python for awhile. It's providing more than enough challenges for me already.


Seeking advice from python 2.7 programmers - Panta - 2013-03-01

The csv documentation mentions unicode is not really supported. However, the format is simple enough, so you can write your own parser. To open files as unicode: http://docs.python.org/2/howto/unicode.html#reading-and-writing-unicode-data

From there it's just a simple split: http://docs.python.org/2/library/stdtypes.html?highlight=split#str.split