Dragging and clicking

I forced myself to get rid of Windows after starting grad school. I read plenty of blogs and whatnot explaining why not relying on your mouse (point, click, drag) could make you much more productive while working as a data scientist. I thought only Software Engineers (SWEs) did stuff like that and coding in whatever the heck emacs or vim is. Regardless, I decided to give it a try. Switched to Ubuntu. [Read More]

Building a crappy personalized song recommender

Note: if you wanna skip straight to the repository, it’s here. This semester I learned about utilizing APIs and Oauth2 to get data from web services. Time to put this into the test and do something cool - build a crappy song recommender. Here are the steps I followed after some brainstorming. Shoutout to Spotify for having nice data about each song. Ask for several artists that you vibe to Via Spotify, get a few related artists for each artist that the user input Get all albums for each artist as well as their songs get song features for each song (valence/musical positiveness, tempo, danceability, speechiness, energy) Via Genius’s API, search for that song and artist combo for each song and get the song’s lyrics calculate the sentiment of those lyrics store it all in two database tables, one for songs and one for artists Now that I have the data, how should I go about recommending songs? [Read More]

Understanding OAuth

I should definitely finish the other parts of the NN posts, but here is a brief detour into OAuth (Open Authorization). Note that I am still a novice and so there might be explanations that are missing information or lacking depth. But this is a way for me to synthesize what I know, so please message me if there are any mistakes! So what is the point of OAuth? As I’ve seen it, it’s a way for web services or apps to exchange information without compromising the people using apps. [Read More]

GSOC III

This week was a busy one. Implemented most of the survey summary statistics and the jackknife to estimate their standard error. This was my first time learning about the jackknife and I thought it was an interesting topic to tell you all about. Within survey data, you tend to have strata and PSUs within the strata that make up subgroups. Now, let’s assume that we want to calculate the mean for your survey data - this entails something along the lines of [Read More]

GSOC II

Another blog post about my journey with GSOC. This week I’m implementing some summary statistics for survey data. Surrvey methodology quite resembles ordinary statistical methodology, but there are some pervasive subtleties there due to the importance of the study design for doing standard errors, tests and confidence intervals in a defensible manner. Learning what others have done to make robust analyses with their survey data is quite fun. The hard part is of course the implementation. [Read More]

GSOC I

This summer I’ll be working on implementing Survey Methods in StatsModels thanks to Google Summer of Code (GSOC). I’m excited because I took my first official programming course last semester (C++ seems to stand strong against the test of time) and get to learn more about open source software development. I still remember the first programming course I signed up for during my freshman year at Rice. I dropped after the first week. [Read More]