Part 1: Invalid Product IDs
Always one to help out an elf in need, the gift shop clerk has asked for assistance identifying the invalid product IDs that were entered into the system by a young elf. I’m almost certain there is going to be a policy disallowing unauthorised use, maybe they missed the training on that one! I’ll let their line manager know if I come across them.
The remaining product IDs to check are in this format:
| |
Each value separated by , represents a product ID range, and each range gives its first ID and last ID separated by -.
| |
Now I’m looking at what I’ve written and once again thinking does this look like python?. I’m not familiar enough with all the pythonic ways to have them all at my fingertips, so it’s written a bit more like I’d write C# in my day-to-day (although I’d probably write it a bit more defensively in practice!).
So how can I make it more pythonic? My first thought is back to list comprehension again as I did in Day 1, but my return type here includes a tuple and I wasn’t entirely sure how to wrestle that into the comprehension. Enter tuple unpacking:
| |
The other key thing here is that there are now two list comprehensions; the first takes the split string data of each range which is unpacked into the start, end values of the second comprehension. This also has the advantage of ensuring that the split results in exactly 2 values, as it would otherwise fail to unpack. Now that looks a bit more like python!
Back to the elves; the task is to sum the invalid product IDs which appear within any of these ranges. An invalid product ID consists of a sequence of ints that occurs exactly twice, i.e. 11, 123123 and 5432154321 would all be invalid.
I chucked together this validate function to test a product ID:
| |
Edit: I initially seemed to fail to do a simple comparison using
id[:len(id)//2] == id[len(id)//2:]which is the solution I used in the end. Thecountsolution actually became more relevant for part 2.
Pretty ugly if I’m totally honest! I’ll come back to that later; does this solve the task at hand? I just need to sum the invalid IDs:
| |
I’m not getting right result here, so what am I missing? If I just print out all the values that contribute to the sum I don’t know if I’ll be able to spot anything, there’s quite a lot of data in the actual input, but there’s no harm in trying.
| |
How fortunate that the last product range points right at the problem, but oh do I feel silly - my validate function appears to count single character IDs as invalid when they don’t in fact meet the criteria. Where has this gone wrong, I explicitly check that occurrences is 2..
Ok so check the length is greater than 1 and it’s sorted. Invalid product IDs have been identified as best I can, let’s hope the clerk is happy.
What went wrong?
Well first things first, why on earth was I using str.count to begin with? Honestly I’m not sure; I first attempted to use something akin to substrings of str[len//2:] == str[:len//2], and trying that again now it gives the correct results - I must have typed something wrong originally which ultimately led me down this dark, dark path…
Let’s take a look at it though, I’m slightly baffled why that was going wrong regardless.
Sanity check:
| |
OK; counting the number of times the letter a occurs in the string "a" is 1. Good. But in my solution I’m indexing a substring; I’m taking the first half of the product ID as my value to count how many times it occurs within the full string.
The intention is to produce something like:
| |
But what happens when len(s) is 1? In that case len(s)//2 would be 0, so I’m effectively doing a count of s[:0] which is…
| |
Intriguing (it’s also worth noting python doesn’t throw any sort of out-of-range exception here either - you can index "123123"[:-9999] and it’s still ''!). So what I’m actually counting is the occurrences of '' within the product ID… but still, why is that 2?
| |
Right, so that starts to kinda make sense in a very weird way. "" occurs within "" exactly once, obviously. Counting "" in any string seems to result in len(x)+1.
But why? It’s like it is counting how many empty strings can be matched? How can there be any in a string that isn’t empty? My gut says it’s counting between the chars but I’m not sure why.
The empty string
""is found at every position between characters, plus at the start and end.For a string like
"abc"(length 3):
1 2 3"" a "" b "" c "" ↑ ↑ ↑ ↑ 0 1 2 3 → 4 positions = len + 1There are
n + 1“slots” where an empty string matches: before each character and after the last one.This is consistent with how slicing and other string operations treat empty strings as existing at every boundary position.
Well now I know.
More Pythonic
That solution was a bit ugly. Can I clean it up a bit and make it more pythonic?
Yes.
Is it better? I’m not sure… but I’ll go with it; sometimes my preferences don’t always match the language norms at first so I like to try and overlook them to get a feel for how people actually use the language.
| |
It’s a generator expression with multiple for clauses, flattening my original nested loop. At first it looks similar to a comprehension but is subtly different, it’s also lazily evaluated. Pass it to the built-in sum function and away you go.
I’ll admit, I asked Claude for the pythonic solution and this is what it suggested with an explanation as to how it works… and yet I still find it very hard to read compared to my original.
I’ll chalk that up to not drinking enough python koolaid, maybe I’ll get used it.
Maybe.
Part 2: More Invalid Product IDs
It turns out that there are still invalid product IDs in the system so the clerk has suggested a more thorough classification: any ID that entirely consists of a repeating sequence.
This is a slight complication on my original implementation, however I accidently touched on the str.count function which seems like the perfect fit for this now.
The repeating sequence can be of any length, so naturally I’m thinking I need a loop here from 1 to the maximum I need to test on a given product ID. There’s a nice optimisation we can throw in there, as we know a given substring can only possibly occur multiple times if it is less than half the length of the total string. That’s simple enough to start with.
| |
After that it’s very similar to what I did before, but with some additional tweaks to fit the new variable length. Essentially instead of checking substring count is 2, I now need to check if it is enough repetitions to saturate the full string.
How do I know how many times the substring can fit exactly? Divide the length of the full string by the length of the substring, nice and easy. It’s then just a case of covering the exact count, so checking remainder is 0.
I actually came across a nice pythonism while reading up on something else in some python docs: divmod. This function returns a tuple containing both the quotient and the remainder, exactly what I need!
| |
And because the range of substring lengths that I check starts at 1, I can thank the lord we avoided the issue I had in part 1 counting the spaces between characters by mistake!
Excellent! The invalid products have been identified, I’m sure the clerk will remove them in good time. And, for their own sake, I hope they remember to lock their computer next time they leave it unattended.
As always, full code is available at github.com/lordmoocow/aoc25.
It’s only day 2 and I’ve already hit some weird stumbling blocks, probably just through faults of my own! I’m starting to get more of a feel for some of the more pythonic idioms but I will admit to disliking it when the language “helps” you out by not erroring or throwing in certain scenarios. Ten more days to go!