First of all we have to praise Julian for learning by challenge, submitting his code to github for review, starting out as a coder this takes tremendous courage. Quick round of applause please ...
The flip-side though is that he will learn super fast his way, he is on his way to become a Python ninja :)
As described in the previous article our first challenge was to:
Parse a copy of the loggedin practical JS course page calculating the total course time. We focussed on the html parsing, not scraping for now (we will use BeautifulSoup in a future post for sure!)
The submitted code is here.
Refactoring 1.0 - making it more Pythonic
Enter the code review.
Github's history link lets you browse all the commits. This is very convenient and it shows the benefit (best practice) of making git commits as small as possible. This actually made it very easy to show all refactorings in chronological order, which I will do in a bit.
Before we dive in ...
Note that starting 5a34e5a, I ran the script before each commit to see if my assert would work (regression).
For convenience I use this shortcut in my .vimrc:
nmap ,p :w<CR>:!python3.6 %<CR>
I love this shortcut. This way I can just press comma+p and it saves the script and runs it, this saves a lot of cycles as you do this over and over (run tests -> refactor -> run tests).
Ah and yes, you probably want to try 3.6 by now ;)
Secondly below could be picky (sorry), but I just want to point out as many things as possible to get the most out of this exercise.
$ git log --oneline --reverse 66fb7c7fe..9876f968b $ git show <commit-hash>
Commit by commit
- 1f4dc53 add jul challenge script
When you refactor always have your tests at hand to make sure you don't mess anything up. This was just a small script so an assert was enough, whatever you use (unittest, pytest, ...) the tests need to perform fast, because you run them often (every step).
assert str(total_hours) == '6.841944444444445'
(not the ideal to show a time delta, see update towards the end ...)
I added the Top-level script environment to prevent the prints to run if the module is imported:
if __name__ == "__main__": ...
The search_file() method was called with an argument called 'file', but the constant 'HTML_FILE' was already defined, so could just use that.
time_regex = re.compile(r'\(\d+:\d+\)')
This was defined in search_file, being a constant I moved it to the top of the file and used PEP8's uppercase convention:
TIME_REGEX = re.compile(r'\(\d+:\d+\)')
open(HTML_FILE).read() was used without close(). Best practice for reading files is using a with block (aka contextmanager):
with open(HTML_FILE) as f: ...
This is probably an inherited C (or other language) style:
for i in range(len(durations)): minutes, seconds = durations[i].strip('()').split(':')
No list indexing needed, in Python you can just iterate over a sequence very easily with:
for mm_ss in durations: minutes, seconds = mm_ss.strip('()').split(':')
sum_minutes = sum_minutes + int(minutes)
can be shortened to:
sum_minutes += int(minutes)
I renamed method search_file() to get_all_timestamps() to better express what it does.
time_list =  declaration at the top was redundant.
time_calculation() returned minutes and seconds, it's best practices to have one return value from a function, so I refactored it to count seconds. Of course I had to update the prints in main, but this commit made the design cleaner.
Renamed time_calculation() to calc_duration() which I found a bit more concise.
Deleted 'min(s)' (minutes) from comments as we went for second counting only since commit f862652.
Decided to strip comments completely because the code expresses well what it does.
Oops on the commit message. Renamed time_list to video_timings to better express what the variable stores.
sum_seconds += int(minutes) * SECONDS_IN_MIN sum_seconds += int(seconds)
was still happening twice, so made that a one-liner:
sum_seconds += int(minutes) * SECONDS_IN_MIN + int(seconds)
When I see any literal values, either numeric or strings, I extract them into constants:
MM_SS_SEP = ':'
Lastly I ran flake8, which we mentioned in our PEP8 article, to check for style violations, in this case only some whitespaces and a blank line.
Julian, very well done on the challenge man, you are making good progress.
I hope this inspires you and the readers to think about making code as Pythonic and clean as possible, because the extra time upfront saves a lot of time later on.
Any feedback or questions use the comments below, or if code specific: use the comment box Github has for each commit.
These refactorings are suggestions, I am learning too, so any improvements are welcome ...
About Code challenges
As Julian explained:
Bob and I thought it'd be interesting to do some code challenges. That is, Bob specifies the challenge and I complete it. Bob then goes through my code and makes any necessary edits/improvements to make it more Pythonic.
We plan to do a code challenge here once a week. Stay tuned.
If you like this subscribe below of follow us on Twitter. Thanks for reading.
Update 4th of Feb 2018
6.841944444444445 is not the best way to show a timestamp. We updated the script to:
- have the
assert expect 6:50:31,
divmod in our
calc_duration (OK, we also sneaked an f-string, sorry we could not resist the temptation),
calc_duration_improved to use
timedelta to calculate the total duration of the course.
See the new revision here.
Keep Calm and Code in Python!
See an error in this post? Please submit a pull request on Github.