horizontal lines
  • Steve Schohn

Analyze Telecom Call Detail Record (CDR) Flat Files

Meet Magnus, former Project Manager and Consultant for COLT Telecom in Sweden, who coordinated billing systems divisions for most European nations in the early 2000s.


He walks us through how local and regional telecoms use Gigasheet to take big flat files of call detail record (CDR) data, look for misconfigurations, troubleshoot customer complaints, and identify duplicate billing issues.


Analyze big CDR files

For those unfamiliar, CDRs contain multiple columns of data related to a telecommunication transaction, like phone numbers of both the calling and receiving parties, the start time, and the duration of that call. In addition, CDRs will often contain other metadata related to the call, but it is important to note that CDRs do not include recording or transcription during the call. CDRs are important for teleco’s as they provide billing records, and the providers rely on these CDRs for revenue. Law enforcement investigators may also use call records to help track or identify suspects or establish an individual’s location. However, in this post we’ll stick to analysis of telecommunication network billing, troubleshooting and optimization.


Here’s how it works: "In general, when I make a phone call, the call goes to my operator here in Sweden. And that call will route to a switch. The call would then be switched over to a network like Verizon. And then that call will come to you." he shared on a recent video chat, "Anytime a switch is involved, that switch will spew out a CDR, call data record. This record will include information about where it came from, the originating number, the terminating number, and the mobile operator would be within that record."


These call data records are tens of millions of rows of data that include intricate parts of information, making them perfect for analysis in Gigasheet.


"Running the telecom itself will get all these call data records from your switch and the outgoing switch. And then, you have to put them in a system called a mediation device. This mediation device connects all of the switches. Then that mediation device tries to figure out all the semi-raw call data records. Which ones should I send to the billing system for billing?” he says, “The problem is the device would spew up more records than you want to bill for. There could be five, six records for one phone call. But you only want to bill one."


Imagine having to upload this file to Excel and, even if you got it opened (which you wouldn't), having to group all of these different rows manually would take days on end.


"Sometimes the calls drop, or it could be a three-party call. I call you, and I add another person to the same call. Now I'm going to be billed for two calls in the same call. To sort out, telecom companies needed some sort of logic."


A database expert or someone with deep knowledge of SQL (structured query language) would be able to figure this out relatively quickly. But what about a customer support representative or analyst? Having to work through this type of puzzle is a nightmare.


"To remedy this, I would put these records in a relational database using SQL. The thing is, I was one of the only ones who knew SQL. And it was something more like a manual process. Excel was just, as you say, way too little [capacity]. So we'd have 50 million records and trying to find out what went wrong, why, et cetera."


Excel just isn’t the correct tool to handle tens of millions of call data records.




The problem with Excel is that when you start to push its limits—such as working with large amounts of data while running complex calculations—it will ultimately crash and burn.


Sometimes this happens right away; other times, after a few hours or days (or weeks) of trying to force Excel into doing what you want, your computer will give up the ghost and leave behind a smoking machine with a file that is no longer usable.


"With all of these systems talking to each other, there would always be billing issues. There would be a misconfiguration, double billing, customer complaints, and problems. The customer would call Customer Care and say, ‘I got billed twice for this call. What's going on?’ That becomes a ticket. And if that happened to one customer, you are pretty sure it's happened to many more. So you have to work all over and see the actual issue."


Solving complex data problems like those Magnus outlines is why Gigasheet exists. Gigasheet allows anyone who can use a spreadsheet to open and analyze gigantic CSV files with millions of rows to solve your troublesome telecom challenges - without knowledge of databases or SQL. Once opened, you will be able to work with them just as quickly as you'd work with a much smaller file in Excel or Google Sheets.


It's a fact that the only thing more frustrating than wrangling big data is not being able to even open the file. So if you're facing down a massive, complex data project and all you've got is an “Excel is not responding” error message, we'd like to help you out - for free.


With Gigasheet, we want to ensure you have all the tools to master your big data problems and a community of fellow citizen data scientists and analysts who can help you. And if you'd like to help us with this mission, create a free account!