A Year With the Kindle

Disclosure: I work for an Amazon company and own Amazon stock. My views are biased.

Like many of us who manage a busy work-life schedule, I have very little time to read. For the last few years, I considered myself lucky if I finish a couple books in a year. Since I started doing a train commute, I bought myself a Kindle Paperwhite last year to see if having a dedicated reading device would improve my reading.

I ended up buying the Kindle Paperwhite 3G version. It costs $70 more than the Wifi-only version of the Kindle paperwhite, but the data plan has been paid for the life of the device. The data plan is supposed to work in 100+ countries, but I have not tried it outside the US. Please note that the bundled-in data plan is not applicable to personal documents like library books.

I used the device for over a year and thought I would write a long-term review.

What I Read

  • Obviously I purchase a lot of books on Amazon. The Kindle editions tend to be cheaper than than physical books – also several of the classics are free. Also if you are an Amazon prime member, you can loan a book a month from a pool of eligible books. 
  • Library books: If you live in Northern California, several of the local libraries make electronic copies of the books available via a company called Overdrive. You just need your library card to check out books. After you check out a book, you can add the book to your Kindle library with a click and Amazon will deliver the book  to your device for free.
  • Personal documents: Mainly Instapaper and PDF documents like research papers. You can email the document to a personalized kindle.com email address and Amazon will convert the document to the Kindle format and deliver it to the device.

What I Love About Kindle

  • Battery life: a single charge lasts weeks.
  • The e-ink screen: unlike LCD screens, there is no glare. You can read in bright sunlight.
  • The form factor: Kindle is really light and you could carry it in your back pocket. I take it almost everywhere and end up reading whenever I have some dead time (Train commutes, doctor’s appointments etc).
  • Reading experience. I love holding physical books and was skeptical if I would enjoy reading on an electronic device. I was wrong – personally I find reading on the Kindle Paperwhite to be a more immersive and enjoyable experience than reading a physical book. 
  • X-Ray and Dictionary: While reading the book, X-Ray will show the characters or terms in a page or chapter or the book and a brief description. This is very useful, for instance while reading books like Game of Thrones which have a lot of characters.
  • The apps and ecosystem. If I don’t have the Kindle device with me and want to kill a few minutes, I can open the Kindle app on my phone and start reading the book where I left off.

The Bad

While the Kindle is great for reading the typical fiction and non-fiction books, it is not that great for reading technical books which contain code or a lot of diagrams. For technical topics, I still prefer physical books. Also the e-ink technology still does not support color.

Books I read in the last year

Over the last one year, I read over 40 books on the Kindle. Some were small or medium-sized books, and some where really long books like the Game of Thrones series.  These are the books I read in the past year:

Anoopjohnson’s bookshelf: read

Kitchen Confidential: Adventures in the Culinary Underbelly
4 of 5 stars
A fun read into the culture of restaurant kitchens. Has some useful tips like “never order seafood on a Monday” because the seafood is the least fresh that day because of the way the seafood supply works in the US.
Freakonomics: A Rogue Economist Explores the Hidden Side of Everything
4 of 5 stars
Interesting book – provides a lot of data and insights. The last part is a bit too dragging – otherwise it would have been a solid 5 stars.
The Millionaire Next Door
2 of 5 stars
I did not find this book very useful. Really long advices which can be summarized in a few sentences.
Have a Little Faith: a True Story
3 of 5 stars
It was okay – not as great as Albom’s previous books like Tuesdays with Morrie.
The Automatic Millionaire: A Powerful One-Step Plan to Live and Finish Rich
3 of 5 stars
It was a pretty decent book – very simple, common sense advice. Most of the stuff you knew already.



Arithmetic in Emacs Regular Expression Search and Replace


You have a large number of files on which you want to do a regex search and replace. You want the replacement string to be an arithmetic expression of a part of the regex match. To give a concrete example, say you have a few files which contain times specified in a particular format and you need to do a format conversion. For instance, convert “PT30M” to “1800” (30 * 60 seconds)


You can use Emacs regex search and replace with simple arithmetic expressions. The key is to prefix the arithmetic expression with \, which tells Emacs that it’s a LISP expression instead of a string.

For instance to do the aforementioned conversion, type M-x replace-regexp. Then type PT\([0-9]+\)M RET \,(* 60 \#1)

In the above expression, you need to indicate that the regex group is numeric by specifying \#1 instead of just \1. (Otherwise you will get a type error)

If you have a large number of files on which you want to do the above search and replace, you can use dired. Type M-x dired. Then mark the desired files typing m. Then type Q. This will do a query-replace-regexp on each marked file.

Amazon CloudSearch Analytics

Amazon recently announced the availability of Analytics features for Amazon CloudSearch. I spent the last few months on this project (along with my awesome colleagues at A9), helping build this feature from the ground up, so I feel very happy to see it out there.

The Analytics features provides CloudSearch customers insight into the search activity in their search domains. Some of the metrics provided are:

Search Trends: The time-series metrics of the number of searches and number of searches which yielded no results.


Top Searches:  Most frequent queries, most frequent queries which produced no search results.


Top Documents: The documents most frequently surfaced in search results.


 So what are the practical use cases for these metrics? Here are some:

  • The Search trends give you high-level information about the footprint of the search domain over time. One of the cool features of CloudSearch is autoscaling – the search fleet is automatically scaled up or down to keep up with the demands of search traffic or data volume. 
  • The Top Searches gives you a flavor of what your customers are searching for in your application.
  •  The Top No-Result searches gives an indication of the recall of the search system. Maybe you have documents which matches barbecue grill, but your customers are searching for bbq grill. In this case, you could configure a bbq as a synonym for barbeque. In some cases, no-result searches could point to a lack of inventory. For instance, if you’re a content site, the no-result searches represent what your customers are looking for in your site, but you don’t have the content for.
  • Top Documents report gives you the opportunity to see if irrelevant documents are ranked high in the search results. It could be an indication that the rank functions need to be tuned to provide more relevant search results.

Installing GNU Octave on Mac OS X

  • Install Homebrew, the best package manager for OS X. Follow the instructions at the Homebrew site.
  • Install XCode (a 1.6 GB download) and XCode command line tools from the Apple developer site or using the Mac App Store. You need to install the command line tools even though XCode is supposed to be a super set because I ran into a bug in Brew causing it to not work properly.
  • $ brew install octave # Brew will download all dependencies and install Octave. Should take an hour.

Now Octave should be working, but the plotting functions will not be functional. For this, you need to install gnuplot separately.

$ brew install gnuplot

Even after setting up gnuplot, you may get the below error when you run a plot function:

octave:4> plot(k, x)

gnuplot> set terminal aqua enhanced title “Figure 1″ size[..]“
line 0: unknown or ambiguous terminal type; type just ‘set terminal’ for a list

For this, the workaround is to add the below line to ~/.octaverc

$ cat > ~/.octaverc
setenv GNUTERM x11

Now Octave should be able to plot correctly.

Aloha from Hawaii

We just got back from a vacation to Maui, Hawaii. It was a 5 hour flight from San Francisco. We had a great time there especially since Hawaii felt a lot like Kerala, our home state in India. There were a lot of similarities – warm weather, rain, the landscape, vegetation etc – they grow pineapples, coconut, banana, etc. Also Taro (Chembu as known in Kerala) is a staple food here. They have a saying that if someone said they liked taro, he has to be either a Hawaiian or a liar.

Black-sand beach in Hana.

Black-sand beach at Hana

Hump-back Whale

Hump-back Whale

Food at the Luau

Clouds over Haleakala volcano

Clouds over Haleakala volcano

OpenCV Performance and Threads

If you use OpenCV library, be aware that the library spawns threads for image processing. I found this while investigating a performance issue. It turns out that the default number of threads is equal to the number of CPU cores. So in my dual quad-core box, it was spawning 8 threads per web server process, resulting in very bad performance. Creating threads per request is very bad for throughput anyway and won’t scale for high-traffic applications.
Explicitly setting the number of threads as 1 gave a 15x speed boost for my application. Not bad for a one-line code change. Have a look at cv::setNumThreads() if you are using the C++ library and cvSetNumThreads() if you are using the Python wrapper.

If you use the OpenCV library, be aware that it spawns threads for image processing. I found this while investigating a performance issue in a web application I was working on. It turns out the default number of threads is equal to the number of CPU cores. So in my dual quad-core box, it was spawning 8 threads per web server process, resulting in poor throughput while serving concurrent requests. This default behavior of OpenCV is probably targeted towards desktop applications where it makes sense to use all the available CPU cores. The performance problem arose from the fact that even under 5 rps, there were 40 threads, all competing for the CPU, so the cost of context switching was significant. In any case, creating threads on the fly per request is not a good idea for a server-side application and it’s not going to scale for high-traffic systems.

Explicitly setting the number of threads as 1 improved the throughput and latency of my application several times. Not bad for a one-line code change. Have a look at cv::setNumThreads() if you are using the C++ library and cvSetNumThreads() if you are using the Python wrapper.

Yahoo! Buzz Topic Pages Are Live!

Topic pages are live in the Yahoo! Buzz U.S. site.  This is the project I’ve been working on for the last few months. The idea is to algorithmically generate topic pages for the buzzing topics of the day.  The popular topics can be accessed from the “top topics” nav bar in the Buzz site. For a sample, click here to see the topic page about Chelsea Clinton.  A screenshot:

Yahoo! Buzz Topics

Fun fact: We used these programming languages to build the back-end systems which power the site: Perl, PHP, Python, Java. Not to mention different storage and indexing systems, databases, servers, in-house and external frameworks, libraries etc.

This is the hard-work of a lot of smart and dedicated people who I have the privilege to work with.  Please check it out and let me know what you think. Do you find it useful?  Did you run into any bugs?  Send me an email.  Keep watching this space for updates.  There is lot more to come, I promise.

2 + 2 = 4

I’m nearing the end of a  two-week vacation to India.  The long flight + free time gave me the opportunity to read a few books.  There are a couple of ones I thought was worth mentioning:

High Fidelity is  a movie I enjoyed thoroughly. I watched it several years ago and finally read the book last week.  The movie closely follows the book, with some changes, for instance, the the story happens in Chicago as opposed to London. Like the movie, it was humorous and authentic.  It’s not often you read a book and laugh out loud on each page.

Then I read Orwell’s Nineteen Eighty-Four.   The story is futuristic and takes place in 1984, when the world is divided mostly into 3 superpowers which are at a permanent state of war.  The protagonist lives in a country called Oceania which consists of the Americas, British Isles and Australia.  The government is totalitarian and controls every single aspect of the citizen’s life.  Even thinking unorthodox thoughts is punishable by torture and death. The government is working on a subset of the English language called NewSpeak to make it impossible for people to think unorthodox thoughts. While reading it, I was reminded of the East German Stasi.  My favorite quote from the book:

Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.

It takes amazing foresight to write such a book  in 1949. It’s bitingly sarcastic and haunting.  Go read it if you haven’t already!

Biking to Work

I recently bought a bike and started riding it to work. Usually it takes me around 15 minutes to drive to work and it takes just 20 minutes to ride the bike to work. I guess biking to work is a convenient way to get in shape without spending too much time.

Cycling Commute Map

gbookmark2delicious 2.1 is Out

A new version of gbookmark2delicious is out. All the credit goes to Yang Zhang who implemented the new features. Some of them include:

  • Incremental synchronization capability for continuous mirroring of Google Bookmarks onto delicious.
  • Updates to work with current Google Bookmarks and delicious interfaces/formats.
  • Handle throttling and persistent retries for delicious’ REST API.
  • More flexibility via beefed-up CLI frontend (more options, etc.)
  • Local cache of the remotely pulled data.