Personal project to measure detailed internet usage stats

Started by Tom, October 10, 2015, 06:55:17 PM

Previous topic - Next topic

Tom

Now before anyone suggests mrtg or cacti, or others like it, they can not, and will not actually get me the information I require. I may however write a tool to connect mine to cacti :o Though I think I'll still need to write my own reporting tool so i can get actual totals and not just averaged time series data.

That said, I've been writing this tool to measure my internet usage statistics. What makes it different from other's is that it gathers detailed data, per ip (both ipv4 and ipv6) and per layer7 protocol (eg: SSH, HTTP, Gmail, Youtube, SSL/TLS, BitTorrent, Facebook, IMAP, etc).

I'm utilizing a little library called nDPI that was forked from OpenDPI, and a bunch of linux specific networking code to get the kernel to slap raw packet data from a given device directly into a mmap'ed range of memory in my address space. It's rather slick, even if the code is horrendous.

No front end yet, but it can do some pretty shmick stuff:


IPv4  150 : xxx.xxx.xxx.xxx: 37:BitTorrent(5.91797 KiB) 81:ICMP(9.19922 KiB)
IPv4    2 : 192.168.1.1: 0:Unknown(37.5752 KiB) 5:DNS(1.43476 MiB) 9:NTP(20.1562 KiB) 14:SNMP(57.8477 KiB) 18:DHCP(2.74316 KiB) 37:BitTorrent(202 B) 70:Yahoo(4.4668 KiB) 81:ICMP(78.5293 KiB) 92:SSH(15.6304 MiB) 119:Facebook(54.7178 KiB) 120:Twitter(9.05371 KiB) 122:GMail(3.09375 KiB) 124:YouTube(51.0117 KiB) 125:Skype(11.0566 KiB) 126:Google(189.024 KiB) 176:Wikipedia(1.57031 KiB) 178:Amazon(3.79492 KiB) 179:eBay(7.99902 KiB) 195:Twitch(5.44824 KiB) 211:Instagram(4.59961 KiB)
IPv4  151 : 192.168.1.2: 0:Unknown(2.20312 KiB) 5:DNS(16.8086 KiB) 6:IPP(61.8789 KiB) 7:HTTP(1.24023 KiB) 8:MDNS(4.41992 KiB) 9:NTP(4.59375 KiB) 10:NetBIOS(7.48828 KiB) 92:SSH(41.6045 MiB) 178:Amazon(914 B)
IPv4  152 : 192.168.1.3: 9:NTP(19.0312 KiB)
IPv4  153 : 192.168.1.4: 5:DNS(4.3125 KiB) 9:NTP(576 B)
IPv4    3 : 192.168.1.24: 0:Unknown(157.8 KiB) 5:DNS(69.7012 KiB) 7:HTTP(2.32561 MiB) 9:NTP(672 B) 91:SSL(7.05304 MiB) 119:Facebook(55.7539 KiB) 124:YouTube(176.608 MiB) 125:Skype(9.93945 KiB) 126:Google(1.24096 GiB) 178:Amazon(171.733 KiB) 195:Twitch(198.894 KiB)
IPv4    4 : 192.168.1.26: 0:Unknown(50.4502 MiB) 5:DNS(173.401 KiB) 7:HTTP(55.5479 KiB) 9:NTP(1.125 KiB) 10:NetBIOS(5.89258 KiB) 18:DHCP(628 B) 28:VMware(18 B) 37:BitTorrent(213.389 MiB) 69:Oscar(846 B) 81:ICMP(188.78 KiB) 125:Skype(61.5566 KiB) 144:Viber(3.54687 MiB) 161:CiscoVPN(770.122 KiB) 163:Tor(15.3887 KiB) 188:Quic(493 B) 218:BitTorrentCustom(90.8869 MiB)
IPv4    5 : 192.168.1.28: 0:Unknown(566.117 KiB) 2:POP3(90.6621 KiB) 5:DNS(484.077 KiB) 7:HTTP(17.7601 MiB) 8:MDNS(288 B) 9:NTP(4.59375 KiB) 51:IMAPS(7.08139 MiB) 70:Yahoo(1.39648 KiB) 81:ICMP(2.69922 KiB) 91:SSL(71.1717 MiB) 92:SSH(15.6635 MiB) 119:Facebook(13.5687 MiB) 120:Twitter(482.492 KiB) 122:GMail(50.0059 KiB) 124:YouTube(14.9639 KiB) 125:Skype(328.65 KiB) 126:Google(950.811 KiB) 131:HTTP_Proxy(20.3252 KiB) 169:UbuntuONE(691.167 KiB) 176:Wikipedia(1.57031 KiB) 179:eBay(257.493 KiB) 211:Instagram(4.59961 KiB)
IPv4 2577 : 192.168.1.207: 12:SSDP(28.7109 KiB)
IPv4  154 : 192.168.1.209: 0:Unknown(108.294 KiB) 5:DNS(22.5615 KiB) 7:HTTP(65.2402 KiB) 8:MDNS(24.3955 KiB) 18:DHCP(321 B) 51:IMAPS(81.4756 KiB) 70:Yahoo(3.07031 KiB) 81:ICMP(2.04199 KiB) 91:SSL(583.567 KiB) 119:Facebook(28.0322 KiB) 125:Skype(70.5049 KiB) 126:Google(46.4707 KiB) 131:HTTP_Proxy(57.0764 MiB) 219:GTalkCusto(628 B)
IPv4    6 : 192.168.1.226: 0:Unknown(11.3564 KiB) 5:DNS(66.4102 KiB) 7:HTTP(96.1094 KiB) 81:ICMP(71.2031 KiB)
IPv4    7 : 192.168.1.243: 217:UBNTAC2(171.782 KiB)
IPv4  155 : 192.168.1.245: 9:NTP(2.34375 KiB) 12:SSDP(28.7109 KiB)
IPv4  156 : 192.168.1.248: 0:Unknown(1.12793 KiB) 8:MDNS(568 B) 91:SSL(13.3301 KiB) 126:Google(34.4131 KiB)
IPv4  157 : 192.168.1.253: 0:Unknown(29.1006 KiB) 8:MDNS(1.19531 KiB) 12:SSDP(83.7539 KiB) 18:DHCP(297 B)
IPv4  158 : 192.168.2.9: 0:Unknown(11.1162 KiB) 5:DNS(2.8916 KiB) 7:HTTP(578.166 KiB) 81:ICMP(76 B) 91:SSL(3.71094 KiB)
IPv4  159 : 192.168.2.10: 5:DNS(129.527 KiB)
IPv4    8 : 192.168.2.113: 0:Unknown(109.767 KiB) 5:DNS(18.0586 KiB) 7:HTTP(92.0898 KiB) 81:ICMP(155 B) 91:SSL(184.146 KiB) 126:Google(812 B)
IPv4    9 : 192.168.2.115: 5:DNS(1.89746 KiB) 9:NTP(18.4688 KiB) 18:DHCP(621 B)
IPv4  160 : 192.168.2.117: 0:Unknown(1.13379 KiB) 5:DNS(471.197 KiB) 14:SNMP(57.8477 KiB) 18:DHCP(621 B) 81:ICMP(6.93066 KiB)
IPv4  161 : 192.168.2.119: 5:DNS(4.25 KiB)
IPv4 3713 : 192.168.2.121: 5:DNS(4.10156 KiB)
IPv4   10 : 192.168.3.255: 0:Unknown(107.062 KiB) 6:IPP(61.8789 KiB) 10:NetBIOS(13.3809 KiB)
IPv6   11 : 2001:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx: 0:Unknown(186.45 KiB) 102:ICMPV6(22.3125 KiB)
IPv6 9319 : 2001:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx: 8:MDNS(243 B)
IPv6  162 : 2001:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx: 0:Unknown(591.798 KiB) 9:NTP(96 B) 102:ICMPV6(34.4453 KiB)
IPv6  163 : 2001:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx: 9:NTP(5.01562 KiB) 102:ICMPV6(6.64062 KiB)
IPv619627 : 2001:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx: 0:Unknown(808 B) 102:ICMPV6(136 B)
IPv6  164 : 2001:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx: 0:Unknown(280.052 KiB) 102:ICMPV6(3.1875 KiB) 188:Quic(23.5469 KiB)
IPv6   12 : fe80::200:24ff:fece:8135: 102:ICMPV6(138.008 KiB)
IPv6  165 : fe80::225:90ff:fe56:fe82: 103:DHCPV6(3.00781 KiB)
IPv6  166 : fe80::225:90ff:fedc:903e: 103:DHCPV6(2.95312 KiB)
IPv6  167 : fe80::3285:a9ff:fe3b:5ba1: 102:ICMPV6(41.5703 KiB)
IPv6 1504 : fe80::3285:a9ff:fe4d:50d9: 102:ICMPV6(4.11719 KiB)
IPv6  168 : fe80::5054:ff:fe5a:124e: 102:ICMPV6(13.0156 KiB)
IPv6  169 : fe80::c2ee:fbff:fe21:b7ac: 0:Unknown(76 B) 102:ICMPV6(20.2188 KiB)
int:36 ext:4997 down: 14.2588 KiB/s (1.68757 GiB) up: 8.42578 KiB/s (420.138 MiB) totalPkt: 2261455 droppedPkt: 0
int:36 ext:4998 down: 3.95605 KiB/s (1.68758 GiB) up: 8.9248 KiB/s (420.147 MiB) totalPkt: 2261540 droppedPkt: 0

That's current output from the monitoring "daemon".

The deep packet inspection code isn't 100% accurate, it does ok. There are bugs, and some of it is just heuristics. It is pretty reasonable though. The "Google", "Youtube" and similar protocol types are generally detected by IP range or host name, so it's not a protocol as such, but probably more useful than "HTTP" or "SSL" as you'd get from many of those sites otherwise. It does detect the underlying protocol in most cases. I currently just store the top level protocol, but I may switch to storing both.

Performance isn't ideal, but it's pretty good considering I can handle 100mbps on a single core atom @1ghz with only about 30% cpu use (compiled in debug mode no less). going up to Gb line rate would kill my little firewall, so I'll actually be mirroring the firewall-lan port on my smart switch, and feeding it to a VM on my server.

To get /something/ working I may just start off by making it a data source for cacti, but I do eventually want a decent interface allowing me to dig into traffic on a host and protocol level. ie: which hosts on a given month used the most data, which used the most of a given protocol, maybe check for hosts doing "weird" things, which countries (by GeoIP) show the most connections, throughput, or total usage, etc.

TL;DR: I done did a cool thing. kthx. bai.
<Zapata Prime> I smell Stanley... And he smells good!!!

Thorin

Yeah, that does sound pretty damn cool, glad you're continuing to code in your spare time.

I gotta ask, though...  What's the impetus for this?
Prayin' for a 20!

gcc thorin.c -pedantic -o Thorin
compile successful

Tom

Quote from: Thorin on October 10, 2015, 09:04:01 PM
Yeah, that does sound pretty damn cool, glad you're continuing to code in your spare time.

I gotta ask, though...  What's the impetus for this?
Always wanted it? :D i initially started working on it over three years ago. but was either too busy or tired/depressed to work on it at all. And the past little while I just felt like working on it. I intend to switch back to something a little more important soon ;D
<Zapata Prime> I smell Stanley... And he smells good!!!

Lazybones

Looks similar to the layer 7 stuff starting to appear in the Asus router firmware from trend micro or the snort open applicationID stuff projects like pfsense have started to integrate.

Our new PaloAlto firewalls at work do a really good job of this using their own methods.

Tom

There was l7 stuff for linux a while back, but it was unmaintaned for a long time :(  bsd/pfsense has had decent or ok layer7 classification for a while afaik. Though those were mostly just for handling firewall rules, not actually capturing the data for stats.


Hmm, I'll have to look at that snort stuff. looks interesting. looks like it has much finer granularity than nDPI. But then i bet its a lot slower. its dissectors are written in lua it seems.
<Zapata Prime> I smell Stanley... And he smells good!!!

Lazybones

Also if you really want to get granular you will want to capture domain name in SSL/TSL handshakes as well as pull in info from IP lists that identify apps.

A lot of top sites encrypt by default.

Tom

Quote from: Lazybones on October 10, 2015, 10:22:02 PM
Also if you really want to get granular you will want to capture domain name in SSL/TSL handshakes as well as pull in info from IP lists that identify apps.

A lot of top sites encrypt by default.
Yeah, nDPI actually looks at SSL. It doesn't grab the "application id" from it yet afaik, since it doesn't support SPDY (and that seems to be the only reliable way of detecting spdy :o) It does however use the SSL/TLS certificate hostnames and ip addresses to try and match against some hard coded ip ranges.

I wonder if it's possible to decrypt since I get to see both sides. Or if I'd have to actually MITM and use my own cert authority... don't really want to do that however.
<Zapata Prime> I smell Stanley... And he smells good!!!

Lazybones

You can get the domain / FQDN from the handshake. Sounds like your library already does.

To decrypt you need to MITM, and for that you need your client to trust what you sign. Squid can do this, if you are lazy you could probably setup squid to do that for you and write your project as an add on to squid which would actually be really nice as a reporting tool for people that run squid proxy already

Tom

It might be interesting but squid is mainly an http proxy. This tool slurps up all packets and disects them. All without copying them. I may play with that as well mind you. I also want to look at an ids and feed it via a mirrored port. Passing through my onboard NICs is annoying though.. They share the same iommu domain and I don't think you can split a domain between VMS. :(  so I may have to get an extra dedicated card for that :(
<Zapata Prime> I smell Stanley... And he smells good!!!