EDA and Classification of Weather using Weather Station Data

Image for post
Image for post

For my third project at Metis Data Science Bootcamp I decided to attempt to predict the rain with classification. The outcome of these two weeks is called rainOne, a classification algorithm that compares my prediction against the weather prediction of OpenWeatherMap.org.

Meteorologists generally determine the forecast by looking at multiple sources of data. They look at movements of pressure systems, analyze cloud cover, and aggregate information from many different sensors and stations. What if they only had the information that one individual weather station provided? Can the weather still be predicted with only a station’s record of the past conditions? …


Using Linear Regression to predict Movie Domestic Gross

Image for post
Image for post

Have you ever watched a trailer for a movie and saw an actor or actress that made your desire for the film increase? What about seeing your favorite director on the roster for a film? For my second project at Metis, I tried to quantify that feeling to predict the success of a movie.

The Data

Data for this project was scraped from BoxOfficeMojo. For this project, I focused on domestic film rankings. I gathered links to each movie on each yearly list from 2000 to 2019 on BoxOfficeMojo’s yearly table. On an individual movie’s landing page, I used BeautifulSoup to scrape the basic details. To scrape the cast and crew information, I needed to use Selenium to click through the page to open those informations. …


This week I was fortunate enough to work with some peers on exploratory data analysis (EDA) looking at the publicly available Metropolitan Transportation Authority (MTA) Turnstile Data from New York City. Our mission was to analyze the data to provide insight to an imaginary entity WomenTechWomenYes (WTWY) on where to solicit email engagement for an upcoming Gala. Before we could do any recommendation, we had to engage with the data.

The Data

The dataset is reported in weekly updates containing roughly 200,000 rows of data. A quick trip to their site helps to decode the information from provided on each row.

head of the mta dataframe
head of the mta dataframe

Each row of the dataset represents the recorded by a singular turnstile. Individual turnstiles have unique identifiers built from a combination of its Station, Remote Unit, Control Area (C/A), and Subunit Channel Position. To enter or exit a subway station, one usually needs to pass through a turnstile. The turnstiles record a running tally of the number of entries and exits once every four hours. …

Andrew Auyeung

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store