Wednesday, November 18, 2015
Hash Value Tool (Or “Digital Fingerprint”) Increasingly Noted In Cases Involving Electronic Evidence
Hash Value Tool (Or “Digital Fingerprint”) Increasingly Noted In Cases Involving Electronic Evidence
Over the past few years, an increasing number of cases have discussed the role of "hash values" (mathematical algorithms) used to identify electronic images, records, files or other evidence; hash values (commonly referred to as "digital fingerprints") have unique identification capabilities that have a high degree of accuracy to confirm whether two records or files are a match or are dissimilar, such as in United States v. Cartier, 543 F.3d 442, 444 (8th Cir. 2008) (No. 07-3222) ("Every digital image or file has a hash value, which is a string of numbers and letters that serves to identify the image or file.") (footnote omitted)
As we have previously noted, “hash” values are an important tool used to identify and authenticate digital evidence. See Using “Hash” Values In Handling Electronic Evidence; see also Hash Values Used To Confirm Seized Video Clips And Images; Federal Judicial Center, Managing Discovery of Electronic Information: A Pocket Guide for Judges, at 24 (2007) (“‘Hashing’ is used to guarantee the authenticity of an original data set and can be used as a digital equivalent of the Bates stamp used in paper document production.”) (quoted in Lorraine v. Markel American Ins. Co., 241 F.R.D. 534, 546-47 & n.23 (D. Md. 2007) ("Hash values can be inserted into original electronic documents when they are created to provide them with distinctive characteristics that will permit their authentication under Rule 901(b)(4).")).
A recent review of cases referring to the use of hash values highlights the growing acceptance of this tool on forensic issues involving electronic evidence. As summarized below, hash values are commonly referred to as "digital fingerprints" or "digital DNA" and have been described as having more than a 99 percent level of accuracy to confirm two files or records match.
Using Hash Values
A “hash value” is an algorithm that can be used to confirm that two digital files or objects are either the same or different. As the Fourth Circuit recently summarized:
A "hash value" is an alphanumeric string that serves to identify an individual digital file as a kind of "digital fingerprint." Although it may be possible for two digital files to have hash values that "collide," or overlap, it is unlikely that the values of two dissimilar images will do so. United States v. Cartier, 543 F.3d 442, 446 (8th Cir. 2008) (No. 07-3222). In the present case, the district court found that files with the same hash value have a 99.99 percent probability of being identical.
United States v. Wellman, 663 F.3d 224, 226 n.2 (4th Cir. 2011) (No. 10-4689) (identifying suspected child pornography by hash values), cert. denied, 132 S.Ct. 1945, 182 L.Ed.2d 800 (2012); see also United States v. Farlow, 681 F.3d 15, 19 n.2 (1st Cir. 2012) (No. 11-1975) (defining hash value as “a short, unique set of numbers and letters produced by running the complex strings of data that make up a computer file through a mathematical algorithm”); United States v. Henderson, 595 F.3d 1198, 1199 n.2 (10th Cir. 2010) (No. 09-8015) (“A SHA value of a computer file is, so far as science can ascertain presently, unique. No two computer files with different content have ever had the same SHA value.”) (quoting United States v. Klynsma, No. CR 08-50145-RHB, 2009 WL 3147790, at *6 (D.S.D. Sept. 29, 2009)).
Generally, there are two common types of hash values that are used:
Secure Hash Algorithm Version 1 (or SHA-1): As some cases have noted, “SHA-1 stands for Secure Hash Algorithm Version 1 — a digital fingerprint of a computer file. It is a 32-digit number that is calculated for a file and unique to it.” United States v. Glassgow, 682 F.3d 1107, 1110 n.2 (8th Cir. 2012) (No. 11-2611); see also United States v. Miknevich, 638 F.3d 178, 181 n.1 (3rd Cir. 2011) (No. 09-3059) (“A SHA1 (or SHA-1) value is a mathematical algorithm that stands for Secured Hash Algorithim used to compute a condensed representation of a message or data file.”).
Message-Digest Algorithm 5 (MD5): “An MD5 hash value is a unique alphanumeric representation of the data, a sort of ‘fingerprint’ or ‘digital DNA.’” United States v. Crist, 627 F. Supp. 2d 575, 578, 585 (MDPA 2008) (No. 07-cr-211)). An MD5 also generates a unique, but shorter alphanumeric value than the SHA-1 for a particular file or object.
Digital Fingerprints and Digital DNA
Hash values have unique identification features. Recognizing this role, a number of cases refer to hash value determinations as “digital fingerprints,” including the following cases:
United States v. Chiaradio, 684 F.3d 265, 271 (1st Cir. 2012) (No. 11-1290) (referring to hash values as “essentially, the digital fingerprint” used to compare files)
United States v. Cunningham, 694 F.3d 372, 376 n.3 (3rd Cir. 2012) (No. 10-4021) ("Each hash value 'is an alphanumeric string that serves to identify an individual digital file as a kind of "digital fingerprint."’") (quoting Wellman, 663 F.3d at n.2)
United States v. Farlow, 681 F.3d 15, 19 (1st Cir. 2012) (No. 11-1975) (defendant suggesting how investigators could “have employed a limited search” by “using the image's ‘hash value’ — a sort of digital fingerprint tied not only to a specific file but also to that file's precise location on a computer”)
United States v. Richardson, 607 F.3d 357, 363 (4th Cir. 2010) (No. 09-4072) (describing how the AOL Image Detection and Filtering Program “recognizes and compares the digital ‘fingerprint’ (known as a ‘hash value’) of a given file attached to a subscriber's email with the digital ‘fingerprint’ of a file that AOL previously identified as containing an image depicting child pornography”)
See also United States v. Miknevich, 638 F.3d 178, 181 n.1 (3rd Cir. 2011) (No. 09-3059) (noting how a SHA1 mathematical algorithm “can act like a fingerprint”)
As another means of describing this identification role, some cases have also referred to hash values as a form of “digital DNA”:
United States v. Crist, 627 F. Supp. 2d 575, 578, 585 (MDPA 2008) (No. 07-cr-211) (“An MD5 hash value is a unique alphanumeric representation of the data, a sort of ‘fingerprint’ or ‘digital DNA.’”) (“By subjecting the entire computer to a hash value analysis — every file, internet history, picture, and ‘buddy list’ became available for Government review” and the “examination constitutes a search.”) (granting motion to suppress warrantless search of computer which ultimately had been provided to law enforcement after the defendant failed to pay his rent)
United States v. Beatty, No. 1:08–cr–51–SJM, 2009 WL 5220643, *1 n.5 (WDPA 2009) (in denying motion to suppress evidence seized from the defendant's computer, noting agent's affidavit described "the SHA1 'digital fingerprint' as “more unique to a data file than DNA is to the human body"), aff'd, 437 Fed.Appx. 185 (3rd Cir. 2011) (No. 10-3634)
United States v. Wellman, No. CRIM A 08CR00043, 2009 WL 37184 (SDWVA 2009) (noting investigator described "a hash value or algorithm is '[a] digital fingerprint or a DNA of a file'”), aff'd, 663 F.3d 224 (4th Cir. 2011), cert. denied, 132 S.Ct. 1945, 182 L.Ed.2d 800 (2012)
Degree Of Accuracy
Many of the cases have noted the high degree of accuracy of hash values. In fact, few other evidence matches are as precise. Hash values have been said to be more precise than a match for DNA evidence. State v. Mahan, 2011 Ohio 5154, n.2 (Court of Appeals, 8th Appellate Dist. Ohio 2011) (noting investigator testimony that “that SHA1 values are accurate in identifying a file to the 160th degree, which is ‘better than DNA’”).
The following cases involved evidence suggesting the accuracy of a hash value match exceeds 99 percent:
United States v. Glassgow, 682 F.3d 1107, 1110 n.2 (8th Cir. 2012) (No. 11-2611) (noting “there was a 99.9999% probability that exhibit 1 contained the same video clips that Glassgow possessed”)
State v. Mahan, 2011 Ohio 5154, n.2 (Court of Appeals, 8th Appellate Dist. Ohio 2011) (“There is a certainty exceeding 99.99 percent that two or more files with the same SHA1 value are identical copies of the same file regardless of the file name.”)
United States Nelson, No. CR. 09-40130-01-KES (DSD July 12, 2010) (“When two files have the same hash value, there is a 99.99 percent chance that they are the same file.”)
See also United States v. Cartier, 543 F.3d 442, 446 (8th Cir. 2008) (No. 07-3222) (in challenge to probable cause supporting search warrant, rejecting argument “that it is possible for two digital files to have hash values that collide or overlap”)
Theoretically, it is possible for two different files to have the same hash value (referred to as a collision). But this theoretical possibility has yet to be demonstrated in the real world and is extremely unlikely. For the MD5 hash value, the likelihood is 1 in 340 billion billion billion billion. See, e.g., Richard P. Salgado, “Fourth Amendment Search And The Power Of The Hash,” 119 HARV. L. REV. F. 38, 39 n.6 (2006) (“The range of values generated from commonly used hash algorithms is huge. For example, the prolific algorithm MD-5 can generate more than 340,000,000,000,000,000,000,000,000,000,000,000,000 (that’s 340 billion, billion, billion, billion) possible values. The widely used SHA-1 algorithm generates a range of values over four billion times larger than that. Thus, although there is a finite number of possible hash values and an infinite number of possible data inputs, the odds of a collision are infinitesimally small.”); see generally Data Validation Using The MD5 Hash (“There are actually 3.402 x 10^38 or 340 billion billion billion billion or a little more than 1/3 of a googol possibilities. When you consider that most people have never seen a million of anything the actual number becomes really difficult to conceptualize.”); HashCheck Shell Extension - FAQ (“For 128-bit checksums (MD4, MD5), the probability [of a collision] is an unfathomably small 1 in 340 billion billion billion billion, and for SHA-1, it is even smaller.”).
Generally, courts have rejected questions about the authentication or admissibility of evidence based on remote possibilities unless there is an articulable probability that the validity of the evidence should be doubted. See, e.g., Cartier, 543 F.3d at 446 (while theoretically “hash values could collide ,” accepting government view “that no two dissimilar files will have the same hash value”); see also United States v. Safavian, 435 F.Supp.2d 36, 41 (D.D.C. 2006) (“The possibility of alteration does not and cannot be the basis for excluding e-mails as unidentified or unauthenticated as a matter of course, any more than it can be the rationale for excluding paper documents (and copies of those documents).… Absent specific evidence showing alteration, however, the Court will not exclude any embedded e-mails because of the mere possibility that it can be done.”), rev’d on other grounds, 528 F.3d 957 (D.C. Cir. 2008).
Identification Of Suspected Child Pornography Images
Once a hash value is obtained for a particular file, record or image, it can be used to confirm or locate other matches. In this manner, hash values are commonly used to identify suspected child pornography images. A known library of child pornography images can be used to determine whether suspected child pornography images are used or possessed. If a match in hash values between the known and suspected images is confirmed, law enforcement has used this information in support of a search warrant. See, e.g., United States v. Brown, 701 F.3d 120, 122 (4th Cir. 2012) (No. 11-5048) (hash values of downloaded files used to obtain search warrant in child pornography investigation); Cunningham, 694 F.3d at 376 (hash values were used to identify child pornography images and used to show probable cause to seize the defendant’s computer); Chiaradio, 684 F.3d at 271 (hash values used in an "enhanced peer-to-peer software" to “compare the hash value (essentially, the digital fingerprint) of an available file with the hash values of confirmed videos and images of child pornography”; information was used to obtain a search warrant to seize the defendant’s computer); United States v. Cartier, 543 F.3d 442, 446 (8th Cir. 2008) (No. 07-3222) (hash values were used to identify child pornography images and used to show probable cause to seize the defendant’s computer).______________________________
A few years ago, there were not many cases noting the application and use of hash values. As this review shows, the acceptance and use of this tool for electronic evidence has become more common and widely applied.______________________________
Hash Values Used To Confirm Seized Video Clips And Images
Hash value algorithm was used to show "a 99.9999% probability" of a match between seized video clips and images with known evidence (child pornography images); in this manner the hash value provided "a digital fingerprint of a computer file," in United States v. Glassgow, 682 F.3d 1107 (8th Cir. June 28, 2012) (No. 11-2611)
As we have previously noted, “hash” values are an important tool to identify and authenticate digital evidence. See generally Using “Hash” Values In Handling Electronic Evidence. An Eighth Circuit case demonstrates the use of hash values to confirm electronic evidence at trial.
In the case, the defendant was prosecuted for receipt of child pornography after an investigation led to the identification and seizure of his computer from his residence. Thumbnail images of child pornography were found on his computer. At trial, he challenged the admission of this evidence, arguing that the images "were not expandable for viewing and that the government’s exhibits were only 'similar' to the thumbnail pictures." Glassgow, 682 F.3d at 1109. The type of hash value used in the case is known as "Secure Hash Algorithm Version 1" or SHA-1 which is a 32-digit alphanumeric algorithm. It is considered "a digital fingerprint of a computer file" which is "unique" to the particular file. Glassgow, 682 F.3d at 1110 n.2. After his conviction by the jury, the defendant claimed error in the introduction of this evidence.
The Eighth Circuit affirmed, noting that expert testimony authenticated the images. Law enforcement had confirmed the images found on the defendant's computer with known images from a law enforcement data base. As the circuit explained:
A government expert, however, verified that the images in exhibits 3 through 17 were the actual enlarged images from Glassgow’s computer. To the extent Glassgow is challenging the government’s exhibit 1 (a DVD compilation of three video clips from a law enforcement database), the SHA-1 values of these videos matched the SHA-1 values of the files offered for distribution from Glassgow’s computer. According to the expert, there was a 99.9999% probability that exhibit 1 contained the same video clips that Glassgow possessed. The admission of exhibit 1 (which was not published to the jury, only described to it) was not unfairly prejudicial. Cf. United States v. McCourt, 468 F.3d 1088, 1092-93 (8th Cir. 2006) (published videos were not found to be unfairly prejudicial).
Glassgow, 682 F.3d at 1110 (footnote omitted).
While the case arose in a child pornography prosecution, it demonstrates the reliability and use of hash values to confirm a match for seized digital evidence. The 99.9999 percent probability standard certainly is not required to be satisified to authenticate evidence under FRE 901 which is generally considered not to impose a high hurdle. See, e.g., United States v. Gagliardi, 506 F.3d 140, 151 (2nd Cir. 2007) (noting that “[t]he bar for authentication of evidence is not particularly high”). As the case illustrates, the hash value determination can be an effective tool for the identification and authentication of evidence.______________________________
Investigating Child Exploitation Cases - Getting to Critical Internet Evidence Faster with IEF by, Jad Saliba and Jamie McQuaid, Magnet Forensics
Investigating Child Exploitation Cases - Getting to Critical Internet Evidence Faster with IEF
by, Jad Saliba and Jamie McQuaid, Magnet Foren
VIDEO located at;
Michael Petrelli: Good afternoon and good morning to everyone. My name is Michael Petrelli, with Magnet Forensics, and I’d like to welcome you to our webinar today, entitled Investigating Child Exploitation Cases.
Today we’re joined by Jad Saliba and Jamie McQuaid, also from Magnet Forensics, who will lead us in our discussion. In this webinar, we’ll take you through the steps of obtaining a search warrant to recovering internet forensic artifacts from a suspect’s computer and mobile phone to producing an understandable report that can be passed off or presented in court. Jad and Jamie will also answer your questions in a live Q&A session after the presentation. So please submit your questions into the Q& box in the WebEx client during or after the presentation, and we’ll answer them in order.
This presentation is being recorded, and will be available for viewing at magnetforensics.com shortly after our session today. With that, I’ll turn it over to Jad to begin our discussion.
Jad Saliba: Thanks, Mike. Jamie and I will be taking you through the webinar today, so let’s just get right into it.
The case study that we made up for this webinar starts off with an undercover officer doing an online investigation, and locates the suspects that he engages in conversation online, gets them to start chatting on Skype. The reason for that is that, as we’ll show later on, Skype stores IP address information for people that you’ve chatted with on Skype. It stores that on your local machine, so the investigator is using that means of communication to obtain some information on the IP address of the suspect to further an investigation.
So there’s some chatting that occurs on Skype, then the investigator runs IEF on their own computer, runs it against the Skype files that are on their workstation that was used to conduct the chats, locates IP addresses for the suspect, and through that is able to obtain a court order, may be called a production order in Canada or a subpoena in the United States and other countries. But in any case, a court order that allows us to obtain the name and address of the suspect from their ISP.
Now that we’re armed with that information as well as other evidence that’s been gathered through this investigation, we’re able to obtain a search warrant now, for the residence of the suspect. The search warrant is executed for residence, and there’s a computer on, and we use IEF triage at this point to gather some further evidence. So we’re wanting to confirm the Skype chats and confirm some other information as well as look for any illegal material or illicit images on that machine.
By running triage … and we’ll go through some of the evidence that’s found … in later slides, we’re able to find and corroborate the Skype conversations that occurred, the IP address information, and also identify some illicit images by using hashsets of known child exploitation images. With that information, the investigator is able to effect an arrest at the scene, arrest the suspect, and from there, seizes the computer and also locates and Android device in the residence, that’s also seized.
So we’re going to show some evidence that was recovered from these devices. Some of it will be related to Google searches from web browser history, there’s going to be some torrent files, and some Kik Messenger chat messages that’ll provide some more supporting evidence, as well as the images that were found and matched on hashsets of known child exploitation images.
Jamie McQuaid: Let’s start with the evidence we’ve collected for this case. Obviously, the first thing we’re going to be looking at is the investigator’s computer that contains the initial Skype conversation as well as the IP address for the suspect. This is going to be a piece of your evidence, whether it’s an IEF report or any other tool that you were using. This is going to be your first piece of evidence that you’re going to be relying on. Based off of that evidence, you can get your court order for the ISP, for the name and the address of the suspect.
The next piece of evidence we’ll be discussing is the live analysis triage. Jad will do a demo of triage and show you how live analysis and some of those techniques can help while you’re executing a search warrant at a person’s home or place of residence. Then finally, we’ll discuss the computer and Android devices that were seized at the scene.
Just to recap, the artifacts we’ll be discussing, first up – Skype, obviously, the chat and IP addresses are a big part of this. Then we’ll be getting into the pictures and the hashing techniques and categories that investigators commonly use. We’ll then show some Kik Messenger artifacts that were recovered from the Android device, as well as some Google searches as well. Finally, we’ll discuss some torrent files that we’re able to carve out of unallocated space, and tie that back to the suspect’s activity.
Let’s jump into Skype. The main db and chatsync folders are your primary sources of artifact evidence that you’re going to be looking for in doing Skype analysis. You can see the user profile location or the Skype profile for the main db located there on the main slide. This is for a Windows 7 machine, but the locations are relatively similar whether you’re using another version of windows or even if you’re using an iPhone or Android device. The paths are different, but the main db and chatsync folders are your main sources of evidence.
The chatsync folder contains some additional information that is also reliable, but it was mainly created to help synchronization between one account using multiple devices. Basically, the chatsync folder is saying if you’re running Skype on a mobile device as well as a PC, if your Skype phone device rings on your mobile device and you answer it on the mobile device, it won’t continue to ring on the PC. The technical reasons for it being included aside, it still contains some valuable forensic information for us. Specifically to the IP addresses – they’re stored in the chatsync folders, and dat files, and under the shared.xml. That information again can be pulled out through IEF. Those IP addresses in there will contain both the internal, NAT’ed IP addresses as well as the external IP addresses for both users of the conversation. So for our case scenario, it would be the suspect and the investigator. This will help correlate data with ISPs to determine names and physical addresses.
Jad: The really cool thing here that you’re probably used to getting in your peer-to-peer investigations where you’re sharing a file with a suspect and it will get their IP address, and then, once you have the evidence you need, some sort of court order to get the physical address and name of the person. But from chatting, it’s not something that we’ve been used to getting as part of our investigative evidence up front.
So what Skype is doing here is for users that are being chatted with as well as the local user, it’s saving, like Jamie said, the private address, which could be useful if you are investigating someone that’s doing this from their workplace. So maybe they’re not doing anything illegal per se from their workplace, but they’re chatting, and then at home they’re doing some other illegal activity. If you’ve got the public IP address only, and it went back to workplace, you’d be in a bit of a tough situation to figure out exactly who that was. But combined with their private address, now you’d be able to pinpoint a person and then continue your investigation from there. But the really key thing here is it’s getting that public address. So in the screenshot we’ve got from IEF – hopefully you can read that – Kristy Cooper is our undercover officer, and Reggie Dunlop is our suspect. So this information is from Kristy’s computer, and we’re able to get Reggie’s public IP address, so that we can do that production order or a subpoena to get his actual name and address.
So there’s a date and time associated with each entry, and the way you can read this is just: you’ve got a username at the far left, and then whatever else in that record is associated to that user. So the first record, we’ve got a local IP address there for Reggie. It’s indicated in the third column that it’s a local address. And then the Date/Time that that address appeared for that user. So you can combine that date and time with the IP address to fill your court order. And then further down we have … second row down we’ve got the public IP address, and that’s indicated as well.
Doing the live analysis piece of this case study that we put together as part of this [indecipherable] the case study we’re executing the search warrant, and we’ve got the suspect’s PC that we want to do some initial triage work on. So a couple of things we want to do – as you all know, it’s confirmed that we’ve right place, we haven’t kicked in the wrong door, and that we can get some evidence off the devices that are in that residence to validate all of our work. So we want to make sure that we’ve got some of the Skype conversation that we conducted and identify any known illicit images using some hash analysis.
So in version 6.3 of IEF we added the ability to load in hashsets of your own, just text files with one hash value per line, and they you’re able to assign a category to that hashset. I’ll demonstrate this in a minute.
We’ve also added support for [Project Vic], and great initiative that’s being led by [indecipherable] and [DHS]. I don’t know if [Rich Brown] or [Jim Cole] are on the call right now, but they’ve done some great work here with helping consolidate lot of these hash databases and kind of improve the workflow, to ensure that we’re not missing anything when dealing with these types of cases. So if you’re on that project and you’ve got a hash that’s through that, we can also import those now. And for all these you can assign alerts. So if you want to know immediately when a picture is found that matches on a hash value, that’s either in a [Project Vic] hashset or one that you’ve created and imported, you can either get an audible alert, and you’ll get a window popping up with all the items found that match that criteria, or an email, which doesn’t really apply for the triage scenario, but just something that you can use in the lab.
So we’ll just jump out into IEF here, and just quickly demo-cam what I talked about there. This is the main screen of IEF, for anyone who hasn’t used it before. From here, we can first set up our hashsets. We go to the Tools menu, we go down to the second-last item there, called Hash Sets. This is where we can set up all of the hashsets that we want to use. Anything that I load in here now is going to be saved. So I can do this once, preload a number of hashsets, and then the next time I run IEF triage, all this is already good to go, I don’t have to do it twice or every time that I run IEF.
So if I just have a text file with some hash values, I’ll use this bottom part of the screen here to import, remove any that I want to remove down the road, or remove all. So I’ve got a sample list here of Category 1s. These are just some made-up hash values that we … we used just pictures of bears for this case study and created our hash values according to that. So it’s asking here what category do you want to set all these hashes to – so I’m going to call these category 1s for the purpose of this example. If you had number of other hashsets for category 2, 3, whatever you use in your region, you could import those as well, and give them a different hash category.
So I’ve imported that now, it’s in the program. If I want those alerts, I can just check a few of these boxes off, and then I will get the alerts as the search is conducted and it finds any matching values or pictures from this file, it’ll pop up, alerting me with an audible sound and another window showing the pictures that matched on that hash value.
For [Project Vic], very similar – Import, select a file, and then just as you get more delta files from updates on [Project Vic], you can just import those and they’ll get added in. We’re going to add some features showing how many records are currently stored, and the last time you updated. There’s also some other helpful features here.
That’s basically it. We’ve got that all set up now, we’ve got it enabled here, at the top. If I uncheck that, everything stays, but it just won’t use any of these hashsets during the search, and then I can just turn it back on later on if I’d like to. So that again saves you time, but lets you be flexible in how you want to do your searches.
So in this scenario, we would probably be searching the C: drive [from the live] machine, we want to search the operating system drive. Default is the Full Search. I can do a triage search here, and that’s just going to search the common areas and folder locations. Depending on how much time you have, you can do whatever works best for you here. Obviously, the more comprehensive search that you select, the less likelihood of missing anything. But if you’re under some time constraints, this is your fastest search right here. If you have a bit more time, you can do the Quick Search, or I would recommend going with a Custom Search, and then maybe unselecting everything and then just going with ‘All Files and Folders’ – so not going into unallocated, but just grabbing all the live files and searching through all of them.
We’ll leave that on there for now, got our search added, go to the next screen. Again, those familiar with IEF will know this is where you can select all the artifacts to be searched, all the different artifacts that we support are listed, and by default everything is checked, but you can uncheck anything if you want to speed up the search. Again, in this sort of situation, that may be the case. So if you’re very specific on what you need to find during this preview, you could unselect everything and then just select certain groups. So I can uncheck all at the bottom here, and then say I’m really interested in web browser history and run the related evidence, I can double-click on this heading here, check the entire category, and then it’s just going to search for that. And then maybe I’ll add pictures and video.
So you’ve set that up however you like, and then on your last screen, just asking where you want to save the data. So it’s defaulting to where I’m running triage from right now. That way it just defaults right to the [indecipherable] that you’re running triage from. You can change the folder name, enter a case number, and enter an evidence number for the drive that we’re searching. Then click ‘Find Evidence’ and off we go.
So just bringing up a completed case for this case study – we’ve got pictures here, we’ve got about a thousand pictures that were recovered. At the top here, I can select which hash categories that have been identified that I want to see. So right now it’s showing everything. If I deselect everything, all the pictures go away, and then I can just select a specific category.
So I’m going to go with Category 1, just to see which pictures matched on that hashset that I imported. These are our sample Category 1 pictures. If I click on one, scroll down, I can see all the metadata. This doesn’t happen to have any, but if there was a make or model, GPS information in the picture, we would display it there. And then we’ve got our hash values. So we’ve calculated MD5, SHA1, and a PhotoDNA hash here.
The Skype evidence that we want to look at I can jump into by clicking on ‘Chat Threading’. And I can see here some chat messages between our suspect and our UC officer. Click on that, and we’ve got a nice, threaded view of all the messages, similar to what they look at when you’re actually in Skype, and using the Text Messaging feature, the chat feature. So we’ve got all our messages there. We quickly confirm that. We can go to Skype chat messages, to look at them in the traditional view, get some more details, and here’s where we can also look at the IP addresses that we saw in the screenshot.
So we’ve got our usernames again, our IP addresses, the address type, and the Date/Time that they were noted. So we’ve confirmed all our information here, which is really great to be able to do that in potentially just a few minutes, by running triage on the suspect’s live machine.
As I mentioned, IEF supports hashing pictures on any device or any data that you throw at it. We can import multiple hashes, and we support the MD5, SHA1, and a PhotoDNA algorithms for doing matching and for calculating those values for you for all the pictures. And we can integrate with Project Vic fairly well right now.
Michael: Alright. Thanks, Jad. Based on all of the information found, the Skype conversation, and the known illicit images, the investigator is able to make an arrest on the spot. The suspect is then arrested, and the PC and mobile device are seized for further analysis, so that we can take those devices back to the lab and dig a little deeper on both of those, spend a little bit more time on it than you could at the suspect’s house.
The first artifact off the mobile device that we want to talk about is Kik Messenger. Like a lot of mobile chat programs, it uses an SQLite database that’s very common for mobile databases, and it’s named kikdatabase.db. The location is listed there, for the Android device, and you’re able to pull off quite a few details, including contacts, messages, timestamps, as well as the status, and any attachments from Kik Messenger. We can see from the screenshot below – this is from our example, but these databases aren’t encrypted or anything like that, so they can be viewed with any SQLite viewer that you choose to use. But the challenge is always just pulling the database out. Once it’s extracted you can view it with anything you like. So let’s look at the actual evidence.
Jad: So we’ve got … in this case, what we did with the data was we did a reset, and a sector-level search on this phone to show how much data that is still available on the phone, even after doing the data, resetting the phone, and so on. So these are all carved messages for Kik Messenger. You can see we’ve got the message type, whether it was sent or received, the partner, so the remote user that this person who was on the phone was chatting to, the status – if it’s been sent or read – and then the message body itself, and time. So lots of great information to be found there, and just jumping back, you can see the letters ‘a3k’ after the username – _a3k. That seems to be some sort of identifier, potentially what kind of device the user was on. If you’re on a PC you’ll see something different. Just something worth noting there.
Jamie: The next artifact that we found for this case is some Google search results. This is typically your regular browser artifacts, but you can look at the URL from a Google search, and pull out the search terms that the suspect was looking at. Again, the browser data is stored in an SQLite database called browser2.db, at the location below for the Android device, and again, it’s very similar to any browser type artifact that you would be looking for, but it’s specifically coming from Google.
Jad: As Jamie was mentioning, you can find the search query right inside the URL, and I have pulled those out for you to make it easy for you to identify those without having to look at these long URLs with a lot of metadata and try to piece out the search terms. So you can see there, you’ve got the search term pulled out, the search engine identified, and you can see a bit of the URL itself, and you click on that item, you’ll see the full details for the URL and be able to verify what IEF has pulled out of there. The tricky thing with Google searches is that it is potentially possible to create a link that is a Google search, and someone clicks on the link thinking that it’s one thing, and it sends them to a Google search of something else. And then that would come up in their history. So just something to keep in mind – being able to state definitively that the user typed in the search term requires a little bit of extra work just to validate that it wasn’t some strange website that created that web history on their computer.
Jamie: The next artifact that we found on the suspect’s computer are some torrent files. Most people understand what a torrent file is. It’s files containing information, metadata around the files and folders that are shared across P2P networks. This is often used for legitimate and illegitimate purposes. They’re identified with the .torrent extension and they come along with any sort of media sharing or file sharing service on the P2P networks. Specifically for us here, these are searched by, like I said, the torrent extensions as well as the headers, which are slightly unique, that you can carve out of unallocated space or search allocated space for as well.
Jad: [indecipherable] again show you an example of some of the data you can find. You can see a bit of the data [in that raw view] on the last screen, but this is what IEF has parsed out of the data. And again, we’re just carving this out of unallocated space. So this is just a torrent file that we doctored up to have a specific name, and then inside you can see the files that are included in the torrent, filenames, sizes, and obviously, in your actual investigations, there’s quite a few torrent files that could be downloaded that would have some very indicative filenames of the torrent itself as well as the files that are included in the torrent, which could be good supporting evidence. And then we’ve got the created time there as well.
Another thing you can do within IEF – once you’ve located a number of files, you may have had some hash matches, and then you may also find, through your own review, a number of files that are of an illegal nature that you now want to report or put into a database, and you can do this through IEF by using the Export feature which we recently added. You can take a number of pictures, export them to a folder, and then either hash those pictures separately or submit them to an appropriate agency. You can use tools such as C4All on that folder, or [NetClean], and do further analysis from there, or categorization, and then submit your categorizations to the centralized hash database that you use or something like Project Vic. So just another thing to note that you can use from within IEF when recovering pictures.
Michael: Okay, so let’s bring it all together and bring this into a quick demo of using IEF Timeline. I find doing timeline analysis really helps show the entire case, and helps visualize it for both the investigator and any stakeholders that are involved, whether those stakeholders for our case here might be a judge or a jury, or if it’s a corporate type case, they would be more of management or a legal team or HR. It doesn’t really matter, but it really does help visualize for not only the investigator but any stakeholders involved.
We can see here we’ve got Timeline up, and I’ve prepped this up a little bit. I’ve selected – we’ve got pictures, torrent file fragments, Skype chat messages, and Kik Messenger messages. You can see at the top, all results, there’s a lot of activity that was going on during those events, but if we spread it out and look at the specific artifacts that we’re looking for, we can see a pretty good timeline of events and how it occurred, starting with the pictures as we take a look here, pulling it up. We can see the pictures of bears here, which, like Jad had mentioned, we’re using as representation of the illicit images. There are a number of pictures here pulled down on the 6th of May. So this is several days prior to our initial engagement.
So we’ve got the pictures, then we’ve got next, on the 8th, the torrent file fragments a few days later, again some questionable behavior, and definitely illegal, then we’ve got the Skype chat conversations. So we can pull in here and we can see this as the initial engagement for our case study, where Reggie Dunlop, our suspect who was speaking with the undercover officer/investigator – and we can see that conversation happening here, on the 14th. And on the same day as well, we can see there’s the additional Kik Messenger conversation that the suspect had with another third party that wasn’t our investigator. So bringing all this together, it really helps show a timeline of events of how things occurred, it shows that our suspect was engaging in this activity well prior to the investigator engaging him over Skype. This really helps showcase all that information.
Jad: Yeah, and this can be great to either find activity on data you didn’t expect to find, or if you do have certain data you’re very interested in, you can just zero in on that timeframe and then see all the different activity in a chronological order that happened on that timeframe. In this case, this kind of represents the escalating behavior of someone that’s looking at child exploitation images, downloading some torrents relating to illicit images and videos, and then engaging in some conversations online that are inappropriate, illegal conversations, where they’re attempting to lure someone over the internet. So a lot of great information that you can find through timeline analysis, and sometimes find some activity or behavior that you weren’t expecting to see.
Just to summarize what we found: we went through an investigation where an undercover officer was engaging with someone online, got them on Skype so that we could get the IP address of that suspect. We got the IP address, got some court order to get their address, name, did a search warrant on the address, found some more evidence, confirmed what we already knew, and then took that back to the lab and did further examination, confirming illicit pictures, some other conversations, Google searches that are relevant to the case, and torrent files.
And techniques that we used here are live system triage – so if you’ve got a live system and you’re executing a search warrant, it’s great to be able to grab some of that live data and have some information right away that you can either use in bail court or also capturing the live RAM to take back to the lab to do further analysis on, where you can recover a lot of data that wouldn’t otherwise be available on a hard drive if you just simply shut down a machine and take it back to the lab. So that’s really key there.
The hash analysis that we went through to make it easy to quickly identify known child exploitation images, and then chat threading and timeline analysis to help filter through the data, make sense of it, present it in court in a meaningful manner that’s understood by a judge and jury or investigator who you’re passing the data off to.
So that kind of summarizes what we had here in this case study that we made up here. Hopefully a lot of this information has been useful to you, and maybe it’s given you some ideas for your future investigations, or even ones that you’re currently working through now.
Michael: Great. Thanks a lot, Jad and Jamie. We’ve attempted to show everyone how IEF has helped find critical artifacts in this case and sped up the recovery process, ultimately saving the investigator’s time. So these are the benefits that are experienced by thousands of forensic professionals as they use IEF to recover and analyze hundreds of artifacts on computer, smartphones, and tablets. So if you haven’t already tried IEF, we’d like to offer you a free, 30-day trial, and encourage you to try it on new or existing cases. You can visit magnetforensics.com/trial to download your free copy today.
We’re now going to jump into our Q&A session. So if you have any questions, please submit them into the Q&A box in the WebEx client, and we’ll answer them in order. There are a few that have come in already throughout the presentation, guys, so we’ll ask a few of these to Jad and Jamie, and those two guys will help us answer these here ourselves.
One of them is here: Do you have hashlists or sets that you can make available to the audience?
Jad: Unfortunately we don’t. We’re working on building in some freely available hashsets that allow us to whitelist certain files, not related to pictures, but more things like operating system files, and that’s something that we would include, and what that’d let us do is skip over files that are known good files, that won’t have any user data or anything like that. We don’t have our own hashlists and we haven’t been able to get access to any. What I would direct your towards is any regional labs that are in your area that maintain their own hashlists, Project Vic of course, which, if you’re eligible to join that project, they’re building a really clean set, well organized set that’s been verified and re-verified. And what they’re building there is a consolidated list, so that instead of having multiple silos of lists around the country, we can have one list that everyone uses, everyone contributes to, and that just helps make the identification proess much more thorough and streamlined. So those are my suggestions to get access to some hashlists.
Michael: Okay. Thanks, Jad. Next question here: Do you know if that contains remote port information for connections through carriers such as [indecipherable]?
Jad: I don’t believe that the port information is stored. It’s something we can double-check, but we’d have to reverse-engineer all the data that’s in these chatsync dat files, and all we’ve been able to find so far, beyond some of the other information like messages and so on, related to IP addresses, is just a date and time, and whether it’s a local or public IP address.
Jamie: Just to add to that, Skype uses its specific ports on installation, so it’s set up to know what port it’s going to be looking on. So it’s not a variable that changes too often, but yeah, the IP addresses are what’s mostly there, in terms of valuable evidence anyway.
Michael: So on the Skype theme there, there’s a question: Is the Skype IP address there by default? Does the user have to send a file to capture this info?
Jad: The information comes through messages that were sent, so as long as you can engage in a few messages … the chatsync files are a bit of an anomaly. Sometimes you may create chatsync files … definitely over time you will, but if you had a really quick chat, one or two messages, you may not find chatsync files in your folder. So if you can engage in at least, I would say – and this is just a guess – 10 to 15 messages, you should have some chatsync files that will contain that information. And there’s no setting to turn on or off those IP addresses in the chatsync files, it’s just something that’s built into Skye. So that’s kind of the nice thing, that if someone is aware of these IP addresses, there’s no way for them to turn them off.
Michael: Okay. I think in the same vein here, there’s a question: Can you just drag and drop the main db to get all of the IP information?
Jad: The main db, while it does contain a lot of very useful information, is not the file that stores the IP addresses. So for every user account that’s on a hard drive for a Skype user, you’ll have a folder name that’s named after their Skype user name, and under that folder name, there’s another folder called ‘chatsync’, and under that folder, more subfolders with dat files, and those dat files that are in the chatsync file format are the ones that contain the IP addresses. So if you did just want to look just for that, you could point IEF at the chatsync folder and recover all the information just from those files, if that’s what you wanted to do.
Michael: Okay. Thanks, Jad. Another question on IP addresses: How is the IP obtained if the user is behind Tor or another service?
Jad: If the user is on Tor, you are going to get the IP address of the exit node that they are currently using on Tor. So if someone is using Tor, as long as they are using it correctly and everything is configured properly so that all the Skype traffic is going through the Tor relay, you will only get the IP address of the last Tor relay or what’s called the exit node on Tor. If they’ve misconfigured it or it’s not quite working properly, which is quite possible with something like Skype – they do have bundles for Tor that boot a preconfigured vision of Firefox that’s configured to use Tor very safely… I don’t believe there’s one for Skype, so it’s something that they’ve had to set up themselves, and it’s very likely that they may not set it up correctly.
Michael: Okay. Next question: Is it possible to see how many times a torrent file has been uploaded from the user? Now, there’s a big difference on downloading child pornography and distributing it where this user lives.
Jad: Within the torrent files that we discussed today, those are just the files that you would download to get access to the files containing that torrent. There’s no upload information stored in those files. Where you may find upload information is from configuration files for the torrent client that that user is using – so something like µTorrent, bittorrent and so on. They contain configuration files that may show you information on what the user has shared out themselves. And if they’ve downloaded a torrent, they’ve likely shared at least a piece of it out, because that’s how the torrent network works. But that’s something that we’ve got on the list to add to IEF, to give you more context around information like that. And I understand what you’re saying – it’s a much more different thing to prove that they’ve uploaded child pornography than just downloaded it.
Michael: Okay. So we’ll move on to our next one here, dealing with web surfing. How would you determine that someone actually went to the web page and it just wasn’t a page that accidentally opened.
Jad: Depending on the browser you used, there’s metadata with the history record that can tell you some information around how they got to that record – whether they clicked on a link, they were redirected, which would be similar to a popup. So depending on the browser, you may be able to show that it was something they clicked on or even typed in themselves, versus a redirect. If it’s a browser that doesn’t store that kind of context, then it’s a little bit more difficult to definitively say that it wasn’t just a popup window, and you’ll have to look at either using timeline analysis or sorting by dates and times on the web history, look at the context around that, that link, and see where were they just before that, maybe visit the site and see what kind of activity is there, are there popups and so on, to do essentially a further investigation on that activity yourself.
Michael: Okay. Now, we mentioned Project Vic a few times through the presentation. Are you able to clarify what that is for our audience?
Jad: Yeah. I certainly am not an expert on all the details of the project, but my understanding is that it’s a project to consolidate all the different hash databases that exist within law enforcement for child exploitation images, get rid of any false positives, any duplicates, and create one single hash database that everyone that’s investigating child exploitation can use. So the benefits of that is it’s one area to get your hash database from, and then one area to share your categorized images to, so that everyone can benefit from what everyone else is finding. And what you can use that for, if you’re not used to using hash databases in your child exploitation investigations is that you can use that hash database to pre-categorize a number of images.
So if you’ve recovered, just as an example, a million images from a hard drive, instead of having to go through every single image, manually determining if it’s legal material or not, you can use these hash databases to pre-categorize a number of pictures, which may save you quite a bit of time and effort having to go through all the pictures manually, having to view them again, and some of the trauma associated to that, and then quickly just finish off the uncategorized images yourself, saving a lot of time.
I believe that’s the main goal of Project Vic, and it’s also, I believe, [kind of the term] No Child Left Behind is associated to it as well, where making sure that all child victims are identified in these investigations and no child is left unfound because a picture was missed or the caseload was too high and you weren’t able to do as thorough an examination as you would have liked to on that case.
Michael: Okay, thanks. Good explanation, Jad. Can IEF detect hidden images such as steganography?
Jad: Depending on how, what kind of steganography is being used, we do carve through files for images, so even if it’s named something else or unallocated space and so on … so if they’ve hidden it in a file that doesn’t appear to be a picture, but the picture itself is still intact inside that file, using whatever steganography tool, we would still carve that file looking for pictures, recover it, and tell you that that picture was found within that file. So it really depends on how complex or advanced that steganography technique is, but if it’s a simple placing a picture inside another file, we should be able to recover it.
Michael: Okay, good. The next question is about [right-blocking] capabilities of IEF … and in the initial part of the presentation, where we were connecting to the suspect’s computer, were we inserting a dongle to which evidence is downloaded to? And can we also comment on the [right-blocking] capabilities in that scenario?
Jad: Yeah. Triage is run off a dongle, so plugging it in does create some [USB] entries associated to that dongle. As far as [right-blocking] goes, what we do is, the way that we access files is by going a layer underneath the file system and manually parsing the NTFS or FAT file system from within IEF. So we’re not using Windows process or APIs, the typical way that a program would open a file and potentially change the last access time or even potentially cause some data to be changed. We’re just doing it through a read-only method internally that doesn’t even access the file system APIs, and then we just, using the MFT on the NTFS file system, locate the file, read the raw sectors for that file, and that avoids anything being changed, any metadata like access times being changed.
And we also have a feature in triage that if you are doing a covert search of this computer – so this is related more to some of the military customers we have that are doing operations where they’re searching a computer and have a savvy suspect that may check their computer for activity later on, and they don’t want any traces of activity to be left behind – we have a feature called stealth mode that will remove those USB traces and [pre-fetch] files that are created by inserting the dongle and then running IEF. So it’ll remove those traces to remove any indication that IEF was run.
Now, in a law enforcement scenario that’s not probably something that you want to do. It’d probably be better just to explain in court that those artifacts were created by using IEF, they don’t impact the data to any extent, and then just explain it that way. I think that’s probably easier to deal with in court than talking about deleting files or registry entries.
Michael: Thanks, Jad. Next question: Is there a way to tie pictures to Kik Messages, sent and received pictures?
Jad: Yeah. We didn’t touch on it in this scenario, but there’s another database that contains Kik attachments. In our last release we added the ability to correlate those messages to a user, dates and times, and show those attachments, whether they’re pictures or something else associated to the user’s [indecipherable] the attachments.
Michael: Okay. Next question around picture hashing: Can a quick search for one picture hash be done?
Jad: Yes. Yes, you could import a file with a single hash, give it a category or no category, and then just run a search just for pictures, and every picture would just be compared to that one, single hash, and only identified and given a category if it actually matched on that one hash value.
Michael: Okay, good. What do the skin tone values in IEF mean or represent?
Jad: For every picture that we recover – and it’s an option you can turn on or off prior to starting the search – we calculate a skin tone percentage. So using a number of algorithms that look at the colors in the picture and compare them to skin tones and multiple calculations that identify if there’s skin tone present in that picture, and then given a percentage. So if there’s quite a bit of skin tone, you’ll have a higher percentage there. So typically, pictures of nudity will have a high percentage of skin tone calculated. And then what you can do is, there’s a skin tone filter at the top of the report viewer that you can set to a certain percentage as a minimum.
So if you set that to 55%, you’re only going to see pictures that have at least 55% of skin tone calculated for that photo. And that can help you quickly, again in a triage type situation, get to the pornography pictures, whether they’re illegal pornography or legal, and review them that way. Because of how skin tone works, you’re also going to get a number of pictures that have tones in them, colors in them that are similar to common skin tones out there. So just a caveat there.
Michael: Okay. On hashing again – will the hash change if the photo is cropped or resized?
Jad: Yes, it will. So if you change one byte or one pixel in the picture, the MD5 or SHA1 hashes that are mathematical hashes will change. And that’s the great thing about photo DNA. So we’ve recently added that new kind of fuzzy hashing ability that Microsoft developed and has provided to people that are providing software that assists investigators that are combating child exploitation. So the photo DNA hash, there’s a number of operations done on the picture, so that you’ll still get a match saying that these pictures are similar even if you crop the picture, if you change it to black and white, even if you change a piece of the picture or resize it, most of these operations will still result in a hash match using photo DNA. And that’s kind of the problem that was seen by a number of people in law enforcement that spoke to Microsoft, who were good enough to develop a solution for that. So photo DNA hopefully will be used more and more in these types of cases to help combat those kinds of issues where pictures are being resized or slightly modified to avoid detection.
Michael: Okay, another time zone question here: how do time zones themselves and the differences affect the timeline?
Jad: I believe in the timeline, everything by default we’ve tried to set to UTC time, and then you can set a timezone in IEF, which will apply a plus or minus amount to those dates and times, and then if there’s daylight savings involved, depending on what time of the year that date falls, will apply that daylight savings offset as well. You can set that if you’d rather see things in your local time zone, or you can leave things in UTC.
Michael: Okay. Next question: Does this work on Linux or Unix platforms?
Jad: Triage is Windows-only right now. We’re working on making it bootable, so that if you do have a Linux or Mac machine, you can either shut it down, or if you come up to a machine that’s already shut down, including a Windows machine, you could boot up using the triage thumb drive, and then run IEF on that machine in a forensic manner that would mount everything read-only, giving you additional reassurances that nothing’s being changed, but that would also allow us to run on Linux machines and Mac, because we can handle the … because we do all the file system parsing and interpretation manually and natively, we can handle the Linux file systems like the [EXT 3 and 4] and HFS file systems on Macs. So we would still be able to interpret the file system and do the search which need to run on a Windows platform. So once we get that bootable option working, it’ll extend IEF’s ability to those other platforms.
Michael: Similar vein: What file systems does this read? For example, FAT, NTFS.
Jad: Yeah – ext 2, 3, 4, HFS plus HFSX, FAT, FAT32, and NTFS … XFAT, and [indecipherable] file systems, which are found on Android devices. I’m probably missing some, but most of the common file systems are supported.
Michael: Okay. A question on interoperability here with other platforms – are we still working with [guidance] such that all IEF results can be imported to [EnCase] for reporting? If so, is this as easy as a drag-and-drop or is it importing results one at a time?
Jad: Yeah, we’re still working with [guidance] that way. We’ve got a couple [end scripts] that you can run. We’ve got an import script, so it’s a script that, if you’ve already run IEF on a case, created an IEF case and you want to import those results into [EnCase] you can run the script and just point it at the IEF case, and it’ll pull all the results into [EnCase]. The other scripts, which are for version 6 and version 7 of [EnCase] will just launch IEF, run it against your images, and then bring the results back in, or export them to Excel files if that’s what you prefer. And then we also have a case processor module for [EnCase7]. So if you’re using that pre-processing module ability, you can include IEF in that pre-processing, and that can have it run against all of your items as well, bringing the results back into [EnCase]
Michael: Okay. Thanks, Jad. The [parse-search] queries are a useful tool, but there are often junk terms found such as search terms. Is there a good method of weeding through these junk terms?
Jad: Yeah, we’ve been looking at that, trying to find a way of removing the junk terms without getting rid of anything legitimate. That’s the difficulty, it’s we don’t want to remove any positive hits while we’re trying to remove the false positives. I think sometimes just sorting by the search term can be helpful, and you can find things that were typed, kind of grouped together. Or sorting by date and time can also help, you can see the searches that were done in a certain timeframe altogether. But that’s something that we’re looking at to see if there are certain ones we can put in a list that we can always filter out. That’s something that we’re going to try to do.
Michael: Okay. Very good. So we’ve come to the top of our hour here, and we’d like to thank Jad and Jamie again for the presentation and for taking us through the Q&A session here. We’d like to thank everyone online for attending and participating in the discussion today. If you do have any further questions, feel free to email them to Jad or Jamie at the addresses we have listed here. The session was recorded, and it will be available for viewing at magnetforensics.com in our Webinar page within the next 24 hours. We’ve also captured all of the questions that have been sent through. So we’ll be getting Jad and Jamie to answer those, and we’ll post those to the same site by the end of the week.
So thank you again, everyone. This concludes our presentation today.
Jad: Thanks, everyone.
End of Transcriptsics