Questions that seem like they have the potential to be asked frequently

What is this?

The U.S. Department of Agriculture provides a free, annually-updated database describing the chemical composition of a large variety of foods. To the best of my knowledge, this database is where every website that offers nutrition information gets its data. This site searches and displays data from release 28 (downloaded November 3rd, 2016) of the database. Here's the USDA's suggested citation:

US Department of Agriculture, Agricultural Research Service, Nutrient Data Laboratory.
USDA National Nutrient Database for Standard Reference, Release 28.
Version Current: September 2015, slightly revised May 2016.
Internet: /nea/bhnrc/ndl

What should I eat?

Beats me.

Can I get the data?

Sure. The USDA's dump (linked above) is pretty awful -- you can choose between Microsoft Access and a text database format invented by someone who should not have been allowed to invent their own format. To make things a bit easier if you want to do your own analysis without dealing with the USDA's database hierarchy and choice of formats, you can download a several-megabyte CSV file containing my processed version of the data (one row per food with the first row containing human-readable column names). I've also imported the CSV file into a publicly-shared Google Docs spreadsheet.

Some of the numbers look wrong.

The USDA database can include a huge amount of detail for each data point — there are columns for "number of studies", "minimum value", "maximum value", "degrees of freedom", "lower 95% error bound", "upper 95% error bound", and for good measure, a field for additional statistical comments. The use of multiple studies could be a cause of individual nutrients not summing to expected values.

Additionally, most foods completely lack measurements for at least a few nutrients. For example, "Plantains, raw" includes measurements for carbohydrate, fiber, and sugars (31.89 g, 2.30 g, and 15 g per 100 g, respectively), but none for starch. In this particular case, I believe that it's safe to assume that the remaining 15 or so grams are starch.

In some cases (in particular, fatty acids), the database includes overlapping fields. Consider food 12155, "Nuts, walnuts, english". It's commonly known that walnuts contain a large amount of ALA (alpha-linolenic acid, an omega-3 fatty acid). ALA is given the designation "18:3 (n-3)", as it has an 18-carbon chain with three double bonds ("n-3" is shorthand for "omega-3" and refers to the position of the double bonds in the chain). The USDA database contains three listings for 18:3 fatty acids: "18:3 n-3 c,c,c (ALA)" (F18D3CN3), "18:3 n-6 c,c,c" (F18D3CN6), and "18:3 undifferentiated" (F18D3). Of these, the first one is clearly ALA, the second is gamma-linolenic acid (an omega-6 isomer of ALA), and the third is... an 18:3 fatty acid of some sort? As best as I can tell, the "undifferentiated" value could be either ALA or GLA.

But back to walnuts. For F18D3 (the "undifferentiated" measure), the database lists 9.08 grams per 100 grams of walnuts. For ALA and GLA, no values are present. Given the commonly-cited 4:1 ratio of omega-3 to omega-6 fatty acids in walnuts, I've made the assumption that all of the 18:3 listed here is ALA, but I'm unsure if that holds in all cases. I've made similar assumptions when more precise information isn't present that undifferentiated 18:2 is linoleic acid, undifferentiated 20:3 is ETA, and undifferentiated 20:4 is arachidonic acid. The resulting numbers look plausible to me, but I'm a software engineer, not a biochemist. If you have suggestions for improvements, please let me know.

I'm a nerd. How's this site implemented?

There's a Ruby script that mangles the text files from the USDA into a bunch of JSON files that are downloaded by the browser (including a trie used for search autocompletion) and an 8-megabyte indexed data file listing the composition of each food as a JSON object. The script does some cleanup and aggregation (e.g. calculating omega-6-to-omega-3 ratios), too.

The index is loaded by a small Google App Engine app written in Go and is binary-searched to figure out which part of the data file needs to be read to answer a particular query.

On the client side, hacky JavaScript ties the whole thing together.

How can I contact you?

Feel free to email me at dan at eatnum dot com or visit my homepage.