Intro
First of all we have to praise Julian for learning by challenge, submitting his code to github for review, starting out as a coder this takes tremendous courage. Quick round of applause please …
The flip-side though is that he will learn super fast his way, he is on his way to become a Python ninja 🙂
As described in the previous article our first challenge was to:
Parse a copy of the loggedin practical JS course page calculating the total course time. We focussed on the html parsing, not scraping for now (we will use BeautifulSoup in a future post for sure!)
The submitted code is here.
Refactoring 1.0 – making it more Pythonic
Enter the code review.
Github’s history link lets you browse all the commits. This is very convenient and it shows the benefit (best practice) of making git commits as small as possible. This actually made it very easy to show all refactorings in chronological order, which I will do in a bit.
Before we dive in …
-
Note that starting 5a34e5a, I ran the script before each commit to see if my assert would work (regression).
For convenience I use this shortcut in my .vimrc:
nmap ,p :w
:!python3.6 % I love this shortcut. This way I can just press comma+p and it saves the script and runs it, this saves a lot of cycles as you do this over and over (run tests -> refactor -> run tests).
Ah and yes, you probably want to try 3.6 by now 😉
-
Secondly below could be picky (sorry), but I just want to point out as many things as possible to get the most out of this exercise.
-
Again see the history link or clone our blog_code repo and go through it with:
$ git log --oneline --reverse 66fb7c7fe..9876f968b $ git show
Commit by commit
- 1f4dc53 add jul challenge script
-
First I added the script to our katas folder of our blog_code repo and the copy+paste of the page content.
-
ce498d7 add assert for regression
When you refactor always have your tests at hand to make sure you don’t mess anything up. This was just a small script so an assert was enough, whatever you use (unittest, pytest, …) the tests need to perform fast, because you run them often (every step).
assert str(total_hours) == '6.841944444444445'
(not the ideal to show a time delta, see update towards the end …)
-
5a34e5a add calling code in main
I added the Top-level script environment to prevent the prints to run if the module is imported:
if __name__ == "__main__": ...
-
dad9b55 no need to pass file around, we have a constant
The search_file() method was called with an argument called ‘file’, but the constant ‘HTML_FILE’ was already defined, so could just use that.
-
9d682fa extract time_regex into constant
time_regex = re.compile(r'(d+:d+)')
This was defined in search_file, being a constant I moved it to the top of the file and used PEP8’s uppercase convention:
TIME_REGEX = re.compile(r'(d+:d+)')
-
57334a6 use open in with block to auto-close file handle
open(HTML_FILE).read() was used without close(). Best practice for reading files is using a with block (aka contextmanager):
with open(HTML_FILE) as f: ...
-
1147bd0 no need for range, can just loop over duration iterator
This is probably an inherited C (or other language) style:
for i in range(len(durations)): minutes, seconds = durations[i].strip('()').split(':')
No list indexing needed, in Python you can just iterate over a sequence very easily with:
for mm_ss in durations: minutes, seconds = mm_ss.strip('()').split(':')
-
bc6ee8a can use shortcut += for summing
sum_minutes = sum_minutes + int(minutes)
can be shortened to:
sum_minutes += int(minutes)
-
9ff89d1 better method name for getting all timestamps
I renamed method search_file() to get_all_timestamps() to better express what it does.
-
f5db013 no need to predeclare time_list at module level
time_list = [] declaration at the top was redundant.
-
f862652 one return value from time_calculation, so convert all to seconds
time_calculation() returned minutes and seconds, it’s best practices to have one return value from a function, so I refactored it to count seconds. Of course I had to update the prints in main, but this commit made the design cleaner.
-
752394b match method name last refactoring
Renamed time_calculation() to calc_duration() which I found a bit more concise.
-
026b9c5 update comments after last refactoring
Deleted ‘min(s)’ (minutes) from comments as we went for second counting only since commit f862652.
-
e4fad91 strip comments as code is pretty self explanatory
Decided to strip comments completely because the code expresses well what it does.
-
Oops on the commit message. Renamed time_list to video_timings to better express what the variable stores.
-
d026a7f do adding/summing on one line
sum_seconds += int(minutes) * SECONDS_IN_MIN sum_seconds += int(seconds)
was still happening twice, so made that a one-liner:
sum_seconds += int(minutes) * SECONDS_IN_MIN + int(seconds)
-
e45ce53 extract colon seperator into constant
When I see any literal values, either numeric or strings, I extract them into constants:
MM_SS_SEP = ':'
-
9876f96 removed whitespaces to comply with pep8 (used flake8 vim plugin)
Lastly I ran flake8, which we mentioned in our PEP8 article, to check for style violations, in this case only some whitespaces and a blank line.
Conclusion
Julian, very well done on the challenge man, you are making good progress.
I hope this inspires you and the readers to think about making code as Pythonic and clean as possible, because the extra time upfront saves a lot of time later on.
Any feedback or questions use the comments below, or if code specific: use the comment box Github has for each commit.
These refactorings are suggestions, I am learning too, so any improvements are welcome …
About Code challenges
As Julian explained:
Bob and I thought it’d be interesting to do some code challenges. That is, Bob specifies the challenge and I complete it. Bob then goes through my code and makes any necessary edits/improvements to make it more Pythonic.
We plan to do a code challenge here once a week. Stay tuned.
If you like this subscribe below of follow us on Twitter. Thanks for reading.
Update 4th of Feb 2018
6.841944444444445
is not the best way to show a timestamp. We updated the script to:
– have the assert
expect 6:50:31,
– use divmod
in our calc_duration
(OK, we also sneaked an f-string, sorry we could not resist the temptation),
– added calc_duration_improved
to use datetime
and timedelta
to calculate the total duration of the course.
See the new revision here.
Keep Calm and Code in Python!
— Bob