How many lines of code are in Android?

If you want a number 2.5 million lines is a reasonable estimate. If you want to know more about it, continue reading.

The values from this blog post will refer the main branch of the Android open source project (AOSP) as of October 20, 2023, but the methology can be relevant for future versions.

This is not the full code that sruns on android devices as hardware and vendor specific code is not open source but it is the component that is generally considered to be "Android". There are also other components, namely the externel open source dependencies (contents of the ./external folder), that are not part of this figure. If you were to include them, the number goes up to 7 030 374 lines.

Line count by language
ExtensionWithout dependenciesWith dependencies
java2 263 9814 928 622
kt58 48875 222
cpp136 528335 740
c13 243427 773
rs33 13477 115
py5 3841 238 275
glsl192 547
gradle4359 605
cmake6773 368
Total2 522 7227 030 374

There are also some files included that don't run on actual devices like build files (.cmake, .gradle, ...) and various tests. Nevertheless, the code was written by AOSP developers and is vital for keeping Android build- and maintainable.

Methology

To perform the analysis I cloned the AOSP using the repo tool and wrote a small rust script.

The algorithm is relatively simple:

  1. Create a index of all files
  2. Read each file
  3. Count the number of lines per file type
  4. Interpret the data

Creating the index was neccessary to exclude directories that only contain binaries upfront and to count test files seperately. This seperation avoided running over irrelevant data that would invalidate results and take up performance. After all, the AOSP directory is over 170GB large. Indexing also allowed me to split the task into multiple threads, which enables me to utilize the full disk speed.

To create the index I used the already optimized and well established unix-era command line program "find".

find . -type f -name "*"" > files.index

To avoid getting stuck on extremly large files, I read the file chunk wise.

I count the lines by counting the amount of newline characters in the file("\n"). While the approach to line counting could be much more ellaborate (e.g. ignoring empty lines, counting comments / boilerplate seperately), it is very intuitive and works across languages and file types.

The analysis breaks down to manually selection some interesting file types (the ones you see in the table) and printing out there respective line counts as well as the sum of them.

If you want to have take a look at it, the code is public on GitHub

What I found helpful to determine wether and how to count a specific directory was this 2012 Stackoverflow answer.