Tuesday, September 25, 2012

Lots of duplicate contacts in CRM 2011

Just discovered this problem. There are thousands of duplicate contact records in our CRM 2011. At first when i discovered there were duplicates i thought it was a data entry error, but there are literally thousands of them, so it must be systematic.

It turns out there's a new setting in CRM 2011 that automatically creates contacts if you track an email using CRM for Outlook. The problem is we were missing duplicate detection rules which would have prevented the duplicates from being created. I read somewhere that this may possibly cause the email tracking to not work though, so i'm opting to have users turn off that setting.

1. Turn off the "Auto Create Contact" setting. This is in File->Options->Email tab, then at the bottom there's a checkbox called "Create". Uncheck it

2. Create a Duplicate Detection Rule for Contacts. I created mine with the conditions "Account is exact match, full name is exact match, email is exact match"

3. Run the Duplication Detection job on Contacts. This will show all potential duplicates based on the rule you just created.

4. In my scenario i have to Merge records, because duplicates have been created for several months now. So users have probably used the duplicate contacts for cases, activities, etc...

Security Alert - Outlook - the name on the security certificate is invalid or does not match the name of the site

Whenever i open Outlook i always get this Security Alert. 

When i go to View Certificate it is the public CA issued cert. This name at the top (the whited out part) says the Exchange server's FQDN.

Here's a very good explanation of why this error occurs and how to fix it.

Monday, September 17, 2012

How to retrieve Dell service tag from command line

1. Start->run->cmd
2. wmic bios get serialnumber

The alphanumeric serial number returned is the Dell service tag.

Tuesday, September 4, 2012


This will create a directory with YYYYMMDD date format appended to it. For example, C:/Backup 20120904/, then it will use robocopy to copy all directories in the source to the destination folder

for /F "tokens=2-4 delims=/ " %%i in ('date /t') do set yyyymmdd=%%k%%j%%i
set "myDir=C:\Backup %yyyymmdd%"
mkdir "%myDir%"
robocopy C:\Source\ "%myDir%" /E

Stick this in a batch file and run it using task scheduler.

1. http://social.technet.microsoft.com/wiki/contents/articles/1073.robocopy-and-a-few-examples.aspx
2. http://superuser.com/questions/39377/script-to-create-folders-in-multiple-directories-using-yyyymmdd-date-as-the-fold

Monday, September 3, 2012

De-duping a huge amount of files (theoretical solution)

My math-major friend sent me this link: http://hardware.slashdot.org/story/12/09/02/1223201/ask-slashdot-how-do-i-de-dupe-a-system-with-42-million-files

we discussed the most efficient solution.

Problem: We have 4.2 million files, which takes up 4.9TB of space. I want to get rid of duplicate files.

Solution (theoretical):

Step 1: Check the filesize of every file. If a filesize is unique to file we know it cannot be a duplicate.

we have an inverted index, map[filesize, list[filepath]], this is called fsIndex

  1. for each directory
    1.  for each file in the directory
    2.  look in fsIndex for filesize
    3.  if filesize exists, put the filepath in fsIndex
    4. else, put the filesize, new list(with filepath) in fsIndex
  2. for each entry in fsIndex
    1. if there is only 1 filepath for this filesize, delete the entry 
so now we have an index with potential dup files.

N = number of files
1. reading all N file descriptors = O(N)
2. iterating through the index = O(N)

therefore O(N)

Step 2: Compare the CRC32 checksum of every potential dup file. Same checksum = duplicate

from step 1 we have an inverted index with all possible dup files in it, this is still called fsIndex
we have another inverted index, map[checksum, list[filepath]], this is called csIndex

  1.  for each filepath in fsIndex
    1. run CRC32 on file, this outputs the checksum
    2. if checksum exists in csIndex then put the filepath into this entry's list
    3. else, put the checksum, new list(with filepath) in csIndex
  2. for each entry in csIndex
    1.  if the entry's list has more than 1 element, for each entry in the list after the 1st element
      1.  (do whatever)* to the file at the filepath, because it's a duplicate
*(do whatever) = delete, or output the filepath somewhere so the user can manually decide which files to delete.

N = number of files
1. reading all N files from the hard disk = O(N)
2. iterating through the index = O(N)

therefore O(N)

Overall big-O
Step 1 gives us O(N), Step 2 gives us O(N), therefore the overall O(N) for this solution is O(N), where N is the number of files.

Note: each file is only read from the disk once, this minimizes disk reads (which is the bottleneck)
Additionally we could multi-thread the checksum gathering in Step 2 part 1. This would maximize processor throughput.