I wanted to play around with a dataset and see what I could find out about it. I decided on analyzing the little bit of data that I could collect from Github without having to use an OAuth key, which limits it to just 300 events.
To Run All of The CellsΒΆ
You have the option of running each of the cells one at a time or you can just run them all in sequential order. Selecting a cell and either clicking on the Run button on the menu or using the key combination Shift+Enter will run the code in that cell if its code.
To run them all you will have to use the menu: Cell > Run All
import json
from collections import Counter
from pathlib import Path
import matplotlib.patches as mpatches
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import requests
import seaborn as sns
from dateutil.parser import parse
from matplotlib import rc
from matplotlib.pyplot import figure
data_location = Path.cwd().joinpath("data")
Retrieving and Importing the DataΒΆ
The following code will load the three event json files in the data directory if the data directory exists. If the direcotry is not found it will be created and the files will be pulled down from Github and then loaded into memory.
def retrieve_data():
if not data_location.exists():
data_location.mkdir()
url = "https://api.github.com/repos/pybites/challenges/events?page={}&per_page=100"
for page in range(1, 4):
response = requests.get(url.format(page))
if response.ok:
file_name = data_location.joinpath(f"events{page}.json")
try:
file_name.write_text(json.dumps(response.json()))
print(f" Created: {file_name.name}")
except Exception as e:
print(e)
else:
print(f"Something went wrong: [response.status_code]: {response.reason}")
def load_data():
if data_location.exists():
for page in range(1, 4):
file_name = data_location.joinpath(f"events{page}.json")
events.extend(json.loads(file_name.read_text()))
print(f" Loaded: {file_name.name}")
else:
print("Data directory was not found:")
retrieve_data()
load_data()
NOTE: If you want to work with the latest data, just remove the data directory and all its contents to have it pulled down once again.
events = []
load_data()
print(f"Total Events Loaded: {len(events)}")