Wednesday, December 31, 2014

End of year post

For a long time now, I have been writing a post about the year that went by. It has become a  New Year tradition to take a deep breath, squint and try to recollect the hazy details of yet another year.

The only detail I remember about January is how terrifying and pleasurable it was to ski down the bunny hill at Seven Springs. Sitting in one of those ski lifts (after I progressed to the "green"), I was getting awestruck at the wide expanse of snow beneath me and when a friend of mine yelled "Get ready now and .... jump".  Two minutes later, I was jolted out of the lift, while a kid of about five years, gracefully slid off the ski lift behind me while amusedly looking at the tangled mess of arms and legs that I was.

This year, I happened to walk along Castro Street in Mountain View after two years. Castro Street is truly close to my heart in lots of ways. I have spent many evenings in the yester years, glumly walking the lengths of this bustling street thinking about the colossal problem of finding a purpose to make a statement out of, mainly to convince an admission committee to get me into grad school. Walking along in Castro Street somehow helped me think about what choices mean in the long run.

This was a year of clambering through brutal coursework. I took 10-701 - Introduction to Machine Learning. When I came to CMU, my friend chucked delightfully just thinking about me going through 10-701. Between Econometrics II and 10-701, I lived and died on a bleeding edge and was almost always exhausted. I remember waking up everyday after five hours of sleep with scorching eyes and feeling monumentally disgruntled with the universe. I shuffled between research, coursework, mid-terms, more coursework and grading strategy papers while looking wistfully at my bed for a day I can sink into to the sheets with nothing but a free day ahead. The day never really came in 2014.

One of the best things that happened this year was learning Pandas. I started analyzing data with the humble pivot tables in Excel a few years before. Somehow, I never felt the need to get beyond what I now call, the "click and drag" life. Though I dabbled in R and Python, I never really thought of either as a strong data analysis tool that will help me do wicked things to my data that seemed only possible in Excel. This year I figured out how deluded I had been and how much time I could have saved if I had just learned Pandas earlier.

The nice thing about Pandas is that it is extremely addictive - almost like binge watching House of Card. I would start doing something at 10:00 PM and realize by 11:00 that there should be a better way to do it. By 1:00 AM I would realize that there should be an even better way to do it and all I had to do was to frame my wishlist in a way that it can be Googled. For example, I cannot really search for "Hey, I need 1s in column A which should contain the max value corresponding to the index but it should not include anything that satisfies these conditions". Most of the learning came in figuring out what to search for. By 2:00 AM, I would find user1457 in stackoverflow who had the exact issue as I did.  I would get tickled beyond words to find a single line command in Pandas fulfilling all my needs that I could blearily think of at 3 AM, on a school night. I would then painstakingly write a script and send it to a friend for review.

This friend, while explaining the nuances of things I had missed out, would reflexively add spaces before and after the "=" as he talked, would convert my thirty lines of code filled with ifs and loops to a pithy little piece. He also always shuddered convulsively at the way I named the variables . I shrugged this off as a piece of programming snobbery.  I later learned, in a rather hard way, that naming variables like "junk_i", "junk_i_value" ,"test_k", or a plain "a" because I felt too tired to think of a name, leads to undoing days of work. It is astonishing how even a week can completely obliterate the memory of even having written some code let alone ponder the mystery of the recurring test_k which seemed to single-handedly reflect various thought processes that went in my head at various stages in the code.

As a reformed character, I now name my variables true to the spirit of someone with a 28 character name -  I kid you not, it goes like "data_positive_features_negative_adjusted_values_below_threshold_train", so as to make code "read like a story". My friend still scowls at an extra line that I could have done without but I leave that  for next year's character building.

Thus this year, I shuffled between Python, R, Matlab and Stata for the different courses, swearing every time I added brackets to a for loop in Python. In 2013, I wouldn't have imagined a scenario that required me to swear about adding brackets in Python. I count this as an improvement to life.

I made new friends, studied with a group for the first time in my life, drank gallons of tea and became a Pennsylvania licensed driver. This is how a day generally looked.




Life has never been tougher and my writing has never been this squigglier.  I also read this in Quora on what happens in a learning process. I can approximately mark my location on the graph. There are going to be tougher years.





But knowing that it is all upward slope after the tough bit takes my mind off the fathomless pit which is all the things that I don't know. What I do know is how the inflection point in the picture feels like - I glimpsed it briefly during my tryst with Pandas.


In retrospect, I am okay with the Miltonian notion of bearing the mild yoke to serve him the best.
I am going to like this gig as long there are moments like holding a paper written in LaTex that feels exactly like how I always thought good work should feel like - sturdy, warm off the printer and thanks to how Information Systems papers are written, very thick.

Here is to 2014 - a tizzy trip with insanity that helped me understand how a banana feels like inside a blender.