This document reconstructs a six-month debugging and integration effort based on support transcripts, logs, and production behavior. It focuses on recurring event synchronization, vendor abstractions, and failure handling across multiple external systems.
There is a special kind of optimism that only software developers have. It is the optimism that says: this third-party API endpoint will work exactly as documented, forever, without surprises, and I should absolutely build my entire sync architecture on top of it.
The Developer had this optimism. He was not alone. He was wrong.
This is that story.
Every good working relationship starts with easy questions.
The Developer's first message to the Vendor's support channel was polite, specific, and reasonable: when doing a calendar PATCH, if notifyAttendees is set to true, do all attendees get an email? And can you filter which ones?
The Rep Dev answered within hours. The notifyAttendees parameter, she explained, is delegated to the underlying calendar provider. Google uses sendUpdates=all. You cannot control which specific attendees get notified — that behavior is handled entirely by the provider.
The Rep then added the finishing touch: Microsoft Graph doesn't even support notifyAttendees at all. It just notifies everyone. Always. Regardless of what you pass.
The Developer thanked them. He had learned something. The Vendor had been helpful. Everyone felt good.
A few weeks later the Developer came back with another one. He was trying to create a recurring Office 365 event and getting a 400 error. Invalid JSON value. The field in question was daysOfWeek. He was passing "friday".
The Rep Dev looked at it and delivered the news with the diplomatic precision of someone who has seen this before: the DayOfWeek field uses an enum. The enum expects uppercase. He should send "FRIDAY" and "SUNDAY".
She added that they should fix their docs. Or maybe add case-insensitive support. She thanked him for reporting it.
The Developer changed his strings to uppercase. It worked. He thanked her.
Two issues. Two fixes. Zero drama. The relationship was young and everyone was happy.
This would not last.
At some point — and this is where the story really begins — the Developer discovered the /series endpoint.
The /series endpoint was the Vendor's abstraction for fetching modified and deleted occurrences from recurring calendar events. It was exactly what the Developer needed. Instead of querying every individual occurrence to figure out what had changed, you could just call /series on a master event and get back a clean list of what was modified and what was deleted.
The Developer adopted it deliberately. The standard sync — the one the Vendor would later suggest he go back to — was not returning deleted occurrences reliably on initial pull. Not sometimes. Consistently. The Developer had tried it, it didn't work for his use case, and /series was the solution the Vendor's own support channel pointed him toward.
So he built his initial sync logic around it. He built his webhook handling around it. The /series endpoint became the load-bearing wall of the Product's calendar integration — not because he was reckless, but because it was the only wall that held.
You already know where this is going.
The first sign of trouble was subtle. deletedOccurrences was coming back as an empty array for Office 365 recurring events. The Developer reported it. The Vendor investigated. A fix was deployed within a week. The Developer confirmed it worked.
Fine. Good. Normal software things.
Then /series started returning 400 errors. The message was specific: "Your request can't be completed. The range between the start and end dates is greater than the allowed range. Maximum number of years: 5."
The Developer was confused because /series does not take a date range as input. He was not passing date parameters. He was just calling the endpoint.
The Rep explained what was happening under the hood. To fetch occurrences, the Vendor was calling Microsoft's /instances endpoint internally, and they were passing a startDateTime going all the way back to 1968. Microsoft Graph looked at the 60-year date range and said no thank you.
The Developer took a moment to appreciate this. Somewhere in the Vendor's codebase, a date was being calculated and the result was 1968. He did not ask follow-up questions. The Vendor said they would fix it. They fixed it by January 29th. The Developer confirmed the sync was passing.
He noted, privately, that he was now aware of what was inside the load-bearing wall. There was a call to Microsoft's /instances in there, making up date ranges. He filed this away and kept building.
March arrived and with it, a familiar error message.
The 400 was back. Same wording. Different edge case. This time it showed up in the production logs of a real Client user — a calendar that had been syncing fine until it wasn't.
The Developer opened a ticket. He embedded the log. The request ID in the log was the exact same one he had reported to the Vendor's support channel that same day, which is a level of traceability that would have been satisfying under other circumstances.
The Rep's response in the support channel was honest if not reassuring: "I thought we took care of this one last time. We'll investigate."
The Rep Dev found a new edge case and fixed it on March 12th. She sent a cheerful message: "We found an edge case we hadn't accounted for earlier. It's now fixed."
The Developer replied: "thanks."
The next day he was back in the support channel.
Same endpoint. New error. A 500 this time:
"Could not convert data to the Vendor's internal model: fieldMissing - recurrence_info - Could not get recurrence rule from the event."
Worth pausing on that error message. It wasn't failing to talk to Microsoft. It wasn't a network error. The data was coming back from Graph — it just couldn't be parsed into the Vendor's own data model. The Vendor's model. The abstraction layer that was supposed to make this easier. The thing the Developer had named vendorId in his own codebase because the Vendor's name was baked into his variable names at this point.
He posted the error as a reply directly into the Rep Dev's "it's now fixed" message. Whether this was intentional or accidental, it was extremely well-placed.
What happened next is worth documenting carefully because it says something about the Developer's character — specifically that his instinct when faced with a broken third-party system is to debug it himself, thoroughly, until he understands it.
He confirmed the failing event was a series master by checking recurrenceType on the sync object and then verifying "type": "seriesMaster" via the /direct API. He tried the endpoint with and without the exceptions query parameter and determined that exceptions was causing a 400 — a separate issue — but that recurrence and cancelledOccurrences returned fine. He ruled out the data being wrong.
Then he found the pattern.
If you called /series and waited long enough before calling it again, the first call would succeed. Then every subsequent call would fail with the 500. Every time. Not randomly — every time after the first.
He documented this. He ran it repeatedly. He captured side-by-side screenshots of identical requests — one returning a full successful response, the next returning the conversion error. He formed a hypothesis: a missing await before a parse operation, or a cache race condition in a deployment fork. In his ticket he wrote: "so yes this is probably in javascript terms — missing await — before a parse operation."
He also wrote, at one point: "so am I debugging for the Vendor now?"
In a DM the Rep shared what was actually happening. The /series endpoint internally tries to get a special MS Exchange property that describes recurrence and deletedOccurrences. For some reason, Graph was not sending this property for this particular account. The Vendor was going to investigate using Graph properties as a fallback, but those also needed to be computed from the same MS Exchange property — so they might be missing too.
The Developer read this and did what any reasonable developer would do when told the upstream system is unreliable: he wrote a fallback.
let overrides: Override[];
try {
overrides = await generateOverrides(vendorId, requester.providerSyncToken!)
} catch {
overrides = occurrences[vendorId] ?? [];
}Note that vendorId. The Vendor's name, living in the variable. The abstraction had leaked all the way into the identifiers.
He tested it. The fallback kicked in. The logs said "error getting series falling back to incremental" and then "creating [30] activities." The Client's calendar moved again.
Then he checked what the fallback actually returned versus what Microsoft Graph said should be there. The incremental sync brought back the modified occurrences. The deleted occurrences — the three cancelled entries going back to late 2024 — were not there.
The fallback did not work. Not really. The sync did not fail visibly, but the deleted occurrences were simply gone. No error. No warning. Just missing data, moving forward quietly.
This is where the Developer made his most honest mistake. Rather than letting the failure be loud — rather than surfacing an error that said "we could not fetch deletion data for this event, here is everything we know about it" — the fallback absorbed the failure and kept going. A calendar that should have had three missing entries now had them silently present with no trace.
The correct fix, which came later, was to catch the error, store it with the full event data for traceability, and skip those events entirely rather than pretend the sync had succeeded. Honest failure. Recoverable state. Something you can query later and know exactly what was missed and why.
But that came later. In the moment he flagged it in the ticket, documented the gap, and shipped the fallback — because the alternative was leaving the Client user completely broken while waiting for a vendor fix that had already been "fixed" twice.
At this point the Rep made a suggestion. He noted that the standard sync plus incremental webhooks should bring all exceptions — the Developer didn't need to call /series at all. The syncUpdated and syncDeleted endpoints bring everything.
The Developer's response was measured but firm. They had moved TO /series specifically because the standard sync was not reliably delivering deleted and modified recurrences on initial pull. This had been discussed in the support channel months earlier. The Rep himself had been part of that conversation. Using /series was not an accident — it was the recommended approach at the time.
The Rep, to his credit, did not remember this conversation. He checked, double-checked, and eventually confirmed that yes, the initial sync and incremental syncs should bring all exceptions. He still believed this was the better path.
The Developer's response to this was the most important line in the whole saga:
"Can we fix the /series endpoint?"
Not "fine, we'll refactor." Not "ok you're probably right." Just: can we fix it? Because whether or not there's a better architectural path, a broken endpoint is a broken endpoint and someone else is going to hit it. And by the way — he had already tested the incremental sync. It did not return deleted occurrences. The suggestion the Rep was offering as an exit ramp was a road the Developer had already driven down and turned back from.
The Vendor deployed a debug update March 24th. By March 25th the Rep reported they were seeing a few cases and could now think about how to deal with it. April 3rd the fix was in production. April 6th the Developer closed the ticket.
"so there was a bug xD"
"moving it to done."
Six weeks. One Client user affected. One fallback shipped. One architectural debate that went nowhere. One fix that actually fixed it.
Three days later.
The Developer was reviewing the Product's Google Calendar integration — apparently in preparation for a Google approval process — when he noticed that /series was returning empty deletedOccurrences for Gmail accounts. Not for Office 365. For Google.
He had just spent six weeks on a Microsoft problem. Now Google.
To his credit he did not complain. He opened a new ticket, noted it was related to at least three previous tickets he could find and possibly two more he couldn't locate, and started debugging.
The sequence was by now familiar. He called /series. Empty deletedOccurrences. He called /direct with the event ID. Got sequence: 0. He called /instances. The deleted occurrence for April 9th was not there. The daily recurrence had entries for the 8th and the 10th. The 9th simply did not exist as far as Google's /instances endpoint was concerned.
He reported this. The Rep responded the next day: "It seems Google's /instance method has some bug. Maybe introduced recently. It does not seem to return some modified occurrences either."
This was notable. The Vendor was not saying "we'll look into it." They were saying: Google is broken.
The Developer, who at this point had seen enough broken things that nothing surprised him, accepted this and kept going. He tried showDeleted=true. Same result. He shared the URL. The Rep tried an alternative endpoint — GET /events?iCalUID={guid}&showDeleted=true — and asked the Developer to test it.
The Developer tested it. The April 9th occurrence came back with status: "cancelled".
"that endoiing did return the 9 ocurrnce with status cancelled"
One day later the Rep filed a bug with Google: issuetracker.google.com/issues/XXXXXXX. Title: "Events.instances endpoint fails to return modified and canceled occurrences."
Then he told the Developer: "feel free to add a comment."
This was the moment the relationship became something more than support tickets. The Vendor was not just fixing an issue — they were inviting the Developer to co-report a bug directly to Google engineering. The Developer showed up at the Google Issue Tracker under his own professional email and left his confirmation: "I confirm the discrepancy. Was able to replicate by just deleting an instance of a daily recurrent event."
A small software team and a middleware vendor, together, filing a bug against one of the largest technology companies in the world. Peer review at scale.
The Vendor switched the /series implementation from /instances to /events on April 20th. The Rep Dev sent the notification directly: "We've switched /series from /instances to /events. Please take a look and let us know if it works correctly for you."
The Developer looked. It worked. He confirmed it in the support channel.
The QA tested on preview.
He put a recurring event on his Google calendar, deleted two occurrences, synced with the Product. The deleted events showed up anyway. He reported this in the ticket.
The Developer went into full diagnostic mode. He documented the starting state — screenshots of the unconnected account screen. He connected the account, triggered the sync, photographed the modal. He documented the calendar state before and after. He deleted an occurrence after sync to verify that webhook-driven updates were also working. They were. He asked the QA to follow his exact steps.
The QA restarted his computer.
It worked fine.
"After a computer restart, this worked fine. I am moving it to ready for Prod."
Browser session. Client-side cache. Stale state. Nothing to do with Google, Microsoft, the Vendor, the /series endpoint, MS Exchange properties, date ranges going back to 1968, deletedOccurrences, sequence: 0, or any of the infrastructure that had consumed the previous six months.
The Developer, who had just documented a complete sync flow with nine screenshots across three calendar views, processed this information and moved on.
Then he tested modified occurrences. Nobody asked him to. The ticket was already closing.
"i just tested modified ocurrences and they work also — i confirmed to them the fix worked."
The /series endpoint works now. It calls /events instead of /instances, which handles deleted occurrences correctly for Google. The Microsoft edge cases are fixed. The date range going back to 1968 is no longer a problem for anyone.
The fallback is gone now — replaced by the honest version. When /series fails, the error is caught, the full event data is stored, and the event is skipped cleanly with a traceable record. You can look it up later. You know exactly what was missed and why. That's the right shape for a failure. It took a few iterations to get there, but it got there.
The relationship between the Team and the Vendor came out the other side intact and arguably stronger. The Rep shared internals he didn't have to share. The Rep Dev fixed things quickly when she had enough information to work with. The Vendor filed a bug with Google and brought the Developer along. These are not nothing.
The Developer made real mistakes. He built critical infrastructure on an endpoint because the alternative didn't work — and then kept building until the Vendor's name was living in his variable identifiers, which is how you know the abstraction had already collapsed. His first fallback absorbed failures silently instead of surfacing them, which is the category of bug that shows up three months later as a mystery with no trail. He spent weeks debugging a vendor system that was not his responsibility to debug. He went deep on the QA's issue without first asking "have you tried turning it off and on again."
But he also found the replication pattern nobody else found. He documented the failure mode clearly enough that the Vendor could fix it. He pushed back when told to abandon the endpoint instead of fix it. He tested modified occurrences after the ticket closed.
Good developers are not the ones who never go down rabbit holes. They're the ones who come back up with something useful.