Bob and I thought it'd be interesting to do some code challenges. That is, Bob specifies the challenge and I complete it. Bob then goes through my code and makes any necessary edits/improvements to make it more Pythonic.
This will not only improve my Python and his code review skills but should also (hopefully!) provide you with something interesting or at least entertaining, to read.
Feel free to give any feedback or improvements of your own in the comments below!
The problem is that while each module/video displays its own duration, there's no course total time listed anywhere.
Enter the Challenge: Create a web scraper that parses the page and then calculates the total course time.
The main content page is behind a login. How the heck was I supposed to automate a scraper to log into the site with my creds and then pull the page?
I manually right-clicked and selected 'Save As' (on Windows) to save the page as an html file but when I tried to parse the file with BeautifulSoup I consistently hit an error.
I initially wanted to use BeautifulSoup for this but as I kept hitting the aforementioned error and was running out of time (sleep!) I decided to keep it simple, albeit a little manual.
I highlighted the entire page and saved it as plain text into a file titled "content.html".
The program is to be created in the same directory as the content.html file.
#Read in the HTML file and search it using my time regex def search_file(file)
#Strip out the brackets and the colon to calculate the mins and seconds def time_calculation(durations)
time_regex = re.compile(r'\(\d+:\d+\)') #Creating the regex
#For loop to strip brackets/colon and assign the mins/seconds for i in range(len(durations)): minutes, seconds = durations[i].strip('()').split(':')
The program eventually worked! I was able to calculate that the course took roughly 6.8hrs to complete.
I was bummed I didn't actually get the traditional web scrape working at the time. I would like to figure out where I went wrong with that so I can use BeautifulSoup to properly scrape the content.html file. (I'd already found the css I needed to search damnit!)
There are 30 lines of code in this program. I believe there are lines that can be refactored to do multiple assignments and calculations on a single line. Eg: The for loop that strips the brackets and colon also adds the mins and seconds - I'm sure it can be improved.
As annoyed as I got at certain points, I actually enjoyed this. Problem wise it's as simple as they come but it forced me to revisit the basics of regex and string manipulation.
As I write this I'm getting github commit notifications of Bob refactoring and commenting so I know he's hard at work making my code as Pythonic as possible. Tomorrow's post will be his feedback... go easy on me brother!
Keep Calm and Code in Python!
Do you want to get 250+ concise and applicable Python tips in an ebook that will cost you less than 10 bucks (future updates included), check it out here.
"The discussions are succinct yet thorough enough to give you a solid grasp of the particular problem. I just wish I would have had this book when I started learning Python." - Daniel H
"Bob and Julian are the masters at aggregating these small snippets of code that can really make certain aspects of coding easier." - Jesse B
"This is now my favourite first Python go-to reference." - Anthony L
"Do you ever go on one of those cooking websites for a recipe and have to scroll for what feels like an eternity to get to the ingredients and the 4 steps the recipe actually takes? This is the opposite of that." - Sergio S