Networking issue producing a delay in tracking
Incident Report for ClickMeter
We are aware of an incident today that affected ClickMeter's ability to track down click and conversion traffic in real-time, causing a substantial delay in the processing of metrics. Incident started 2022-07-04 21:42 UTC and was solved at 2022-07-05 06:00 UTC as we started our working day. Due to technical limitations, no email/sms alarm is set to cover this happening yet.

Redirection of tracking links and pixels was not affected at all. Postback/piggyback notification servers also notified events with delay as a direct consequence of the downtime of the Tracking layer, as ClickMeter infrastructure is designed in a way that events are notified to 3rd-party servers only after they are committed to the system and are successfully tracked.
No data has been lost: all traffic has been processed and reports can be deemed fully reliable.

This incident is unfortunately connected with the happenings of Jun 21, 2022.
We still did not receive an answer from our Cloud provider clarifying the origin of these overnight disconnections.
As ClickMeter started doing business before 2015, part of the ClickMeter infrastructure is relying on the EC2-Classic environment, which has recently been scheduled for termination by Amazon Web Services. Although we are preparing for this sensitive migration to get rid of the former networking solution and approach AWS VPC, following these latest 2 incidents we realise that a number of changes are likely being operated on the Cloud provider which are inadvertently breaking networking components in a way that is not predictable nor verifiable by us. We observe that server instances happen to enter a problematic state where they completely lose connectivity with AWS resources such as DynamoDB.

We will do our best to discontinue EC2-Classic in the upcoming weeks and to install most recent version of AWS SDK in our codebase to make sure that our tracking cluster is equipped with ways to timely detect and react to these connectivity issues. We also commit at designing a proper alarm to anticipate resolution of cases like this.
Posted Jul 04, 2022 - 21:30 UTC