« posts | «« home

Day 7: No Space Left On Device

View puzzle page »

Skip to the code?

If you’d like to read my solution first, the code is available on GitHub.

Puzzle 1

Today’s puzzle gives us a simulated history of shell commands as our input. The commands consist of directory traversal, file listing, and the resultant list of file names and sizes. The goal is to count the total file size of each directory (including subdirs) and then calculate the total of these, individually and cumulatively. The puzzle question asks us to identify the total filesize of all directories whose individual total is < 100,000.

Solution

I really like using Python’s newest functionality when I can, especially when it’s a feature that brings syntax that’s been really needed for a long time. In this case, I’m going to make use of Python’s Structural Pattern Matching - vaguely equivalent to case or switch statements in other languages.

The puzzle input has 6 types of line:

We can define cases for each of these in order. We use our usual file open and loop to get each line, then split the line - each line is simply whitespace delimited.

with open("day7_test.txt", encoding="utf-8") as f:
    for line in f:
        match line.split():
            case '$', 'ls': pass
            case 'dir', _: pass
            case '$', 'cd', '/': pass
            case '$', 'cd', '..': pass
            case '$', 'cd', d: pass

The first two are are the file listing and directory entry in the file listing output. The third and fourth are self-explanatory - traverse to root and traverse to parent. The fifth is when we have a $ cd a and we want to capture out the directory to a variable d.

Now we need to do something with them.

match line.split():
    case '$', 'ls': pass
    case 'dir', _: pass
    case '$', 'cd', '/': current_dir = ['']
    case '$', 'cd', '..': current_dir.pop()
    case '$', 'cd', d: current_dir.append(f'{d}/')

In the case of the $ ls command or dir a entry, we don’t need these for solving the puzzle so can just dump them.

When the root traversal is called, we initialise a current_dir variable to record where we’re currently at. As we traverse each directory, this list will record our location and, as the loop progresses, will look similar to:

['', 'a/']
['', 'a/', 'e/']
['', 'd/']

When we traverse to the parent dir, we pop the current_dir list to remove the rightmost directory and when we traverse to a child dir, we append it to this list. We capture for the root and parent directory traversals first because they have a defined third argument (..) and save us another conditional statement.

The last case is going to do our heavy lifting. The remaining line type is the file entry, prefaced by its size.

case size, _:
    for path in accumulate(current_dir):
        directories[path] += int(size)

We don’t care about the file name so we dump it to _ and capture the size. Then we need to record that size to the correct directory.

In essence, what we want to be left with is a mapping of each directory and its total size. Subdirectories need to be referred to in context of their parent directories so we can leverage Python’s accumulate function from the itertools library. When the argument to accumulate is a list of integers like [1, 2, 3, 4, 5], it will return an iterator containing 1 3 6 10 15. When the argument is instead a list of strings like ['', 'a/', 'e/'], it will produce 'a/e/'. This gives us a unique key for each directory.

Since I want to add to a dictionary without having to pre-define the variables I’m adding to, we can create a defaultdict with factory of type int.

directories = defaultdict(int)

Once the loop has completed, we are left with a dictionary resembling:

{'': 48381165, 'a/': 94853, 'a/e/': 584, 'd/': 24933642}

Now we just need to sum the values which are < 100,000 to get the answer to puzzle 1.

sum(f_size for f_size in directories.values() if f_size <= 100000)

Puzzle 2

Part two of the puzzle asks us to identify the smallest directory which, when subtracted from the outermost (root) directory, will push the value below 40,000,000 (70,000,000-30,000,000).

Solution

The calculation to perform is: find the size of the directory with the smallest size that, if deleted, would reduce the value of directories[''] below 40,000,000.

We can first get the size of all the directories that would satisfy the condition of reducing the value below 40,000,000.

[f_size for f_size in directories.values() if f_size >= directories[''] - 40000000]

This gives us a list of sizes of directories we could choose and now we just need to get the smallest of these. We can use Python’s min function to do exactly that, dropping the list comprehension.

min(f_size for f_size in directories.values() if f_size >= directories[''] - 40000000)

The product of this function is the answer to puzzle 2.

Source Code

View the code on GitHub.

Would you like to read something else by me?

« «

» »

« posts | «« home