| 
 | 
 | 
 | 
Copernic 2001 Pro (Version 5.0)
Light Version from: http://wwww.copernic.com/
[Use it to find its bigger brother ;)]
W32Dasm 8.93 - Recommended HexWorkshop - Essential Tool Filemon - Essential Tool C Compiler - Language for Tool Writing
I have been on a quest to find the query URL's and structure of queries as part
of my quest for data for my local search bot. After my last essay was finished
and the targets data has been extracted. With a fresh set of data in my hands, 
I sat down and started writing a converter to put the data into a common file format.
This was where this essay begins, I had decided on a basic subset of the data
to use, but thought I should check it against other sources (in other bots),
first on the pile was webferret, a search-bot about which 
Laurent has written and essay that you will find 
here.
As is my usual trend I did not let the software within wire distance of the
internet, so did not get the updates and the dataset provided as standard is
pretty poor - so threw it in the bin.
Laurent had mentioned to me that I might find copernic interesting. Umm
Could this be a good target, I had heard of it, but had until recently steered
clear of all these search-bot programs. This was because I know you do not get anything
for nothing, and the thing that makes them money is knowing your searches, and
being able to make you sit through advert after advert after advert...
So off to the web, do a search for copernic and read some reviews. Seems like 
another of these local search bots, where the main advantage is it knowing how 
to talk to the search engines and co-ordinate the replies and present them to 
the user in a nice simple way. This sounded interesting and it seemed to support 
a large number of search engines but no specific numbers were given. I went to
some lengths to avoid visiting any of the copernic sites, for reasons, which will
become apparent later.
So the target was picked, next step was to go find it on the web.
So off to the web and Grabbed the Pro version, did not even go near their
site, so if they are busy checking logs you will not find me ;)
The Pro version came with a key - nice!
Out came the clean PC. This machine was not connected to any network or the internet, 
after all we did not want any uncontrolled data to go out ;). Filemon was started 
and left running and then copernic was installed on the pc. After the installation 
the program was not run, and the installation process finished. The filemon log 
of installation was then saved for later reference. So now to clear the Filemon log 
and leave it running, to log files accessed by program.
Next step is to run the program and  set it to point to the local proxy. Right - first 
thing it does it ask you some registration details, when all data has been entered and 
proxy set up it
tries to connect to get an update. [This is very optimistic of the company - that
all people who install and run it first time will be connected to the internet]
Right, so look at logs on proxy and there are a number of requests to "updates.copernic.com"
Now lets try a search, for 'searchlores' . At this point I know it is not going to get 
any results, as the proxy does not connect to the internet, just returns 404 for every 
request, as though routing was broken. So did the search. Look at proxy logs and in 
amongst the requests for search engine pages, there is one that stands out to 
"regcards.copernic.com".
Now follows an explanation of these requests, as they are quite interesting. They go 
to the copernic.com domain so they must contain some user data or be used to track 
users of this program in some way.
Firstly lets look at the update requests:HEAD http://updates.copernic.com/copernic2001upd/copernic2001plus.cui HTTP/1.1
This is the request sent:
HEAD http://updates.copernic.com/copernic2001upd/copernic2001plus.cui HTTP/1.1 Host: updates.copernic.com Accept: */* Connection: close User-Agent: Copernic Pragma: no-cacheSecond it does a : GET http://updates.copernic.com/copernic2001upd/copernic2001plus.cui HTTP/1.1
GET http://updates.copernic.com/copernic2001upd/copernic2001plus.cui HTTP/1.1 Host: updates.copernic.com Accept: */* Connection: close User-Agent: Copernic Pragma: no-cacheWhy do a HEAD, if when it fails you go on to do the GET anyway, why not simply do a GET, this seems very pointless ;)
GET http://www.copernic.com/cgi-bin/nph-osnvs2.pl?ns=##########################&iu=%7B********-****-****-****-************%7D&lo=http://updates.copernic.com/copernic2001upd/copernic2001plus.cui&cl=0 HTTP/1.1 Host: www.copernic.com Accept: */* Connection: close User-Agent: Copernic Pragma: no-cacheThe field marked with '*'s will be explained in the next request as it is a common parameter which is passed in both requests. The field marked with '#'s also seems to be a number of some form to be sent to their server.
Now lets look at the regcard information: POST http://regcards.copernic.com/cgi-bin/regcard HTTP/1.1
This is the request sent:
POST http://regcards.copernic.com/cgi-bin/regcard HTTP/1.1 Host: regcards.copernic.com Accept: */* Connection: close User-Agent: Copernic Content-Type: application/x-www-form-urlencoded Content-Length: 129 %5Ejohndoe%40mort.somewhere%5EUnited%20States%5E12345%5E0%5E0%5EENGPRO%5E5001%5E%********-****-****-****-************%7D%5EFrom%20web%20site%5E%5E0%5EJohn%20DoePlain text of last line: ^johndoe@mort.somewhere^United States^12345^0^0^EENGPRO^5001^{********-****-****-****-************}^From the web site^^0^John Doe
| Value | Description | 
| johndoe@mort.somewhere | Email Address | 
| United States | Country | 
| 12345 | Zip Code | 
| 0 | Unknown | 
| 0 | Unknown | 
| ENGRPRO | Version of Software | 
| 5001 | Registration Card Version | 
| {********-****-****-****-************} | GUID | 
| from web site | Referrer for Product | 
| Unknown | |
| 0 | Unknown | 
| John Doe | Username | 
"http://regcards.copernic.com/cgi-bin/regcard" "http://updates.copernic.com/copernic2001upd/" "http://www.copernic.com/cgi-bin/nph-osnvs2.pl" "www.copernic.com"The first ones can be nullified by writing "http://127.0.0.1/" at the start of the strings. This then will prevent all accesses to their servers. This is a good alternative to the hosts file, as the program seems to bypass the hosts if using a proxy and just sends the requests straight to the proxy.
So next step is to close the program, save the filemon log and have a look around my system.
I had a browse through the install filemon log file and made a note of the location of files
added to my system. The first thing that hit me was a load of '.csf' files which had
the names of search engines, and a list of '.ssf' files which seemed to represent
categories.
The next thing is to look at the run filemon log, it seems to read the .ssf and .csf files
and then create a set of files, under the directory 'data' which seems to be a user profile
with the users name as the folder name. Ummm, so some kind of translation or copying going
on, but a lot fewer files get written than read.
So to open up the main executable in our favourite hex viewer and have a quick browse, but 
first to extract all the strings from the file. Had a browse through the strings and it 
looks like it was coded in DELPHI. This was just a hunch and I remembered having a copy of 
DFM-Explorer around , so tried it on the file and sure enough out came all the resources, 
so it is for sure delphi. so the task is now to find a delphi decompiler. My thinking here 
was that even though it might not be needed, if it is then it might make the program code 
a bit easier to understand. Also better to check this option to start with rather than 
later. As a teacher once told me "Always get all your tools ready before starting any task!"
The catch is : this is a delphi application, warning bloatware imminent. I had thought that
the executable was a bit on the large side for something so seemingly simple, and this explained
it. No extra DLL's or files, so the delphi libs must be statically linked. I remember when 
applications used to fit on a floppy, now the icon files will not ;(.
First step is to grab ye ole webbrowser and search for a delphi decompiler (I must admit shame
and say I had never used one before). Right the one that pops up the most in the list when
ranked is 'DeDe' by DaFixer!. Ok so lets grab it and let it rip.
A few sips of my drink later and it has finished downloading, so lets run it and see what
it comes up with. DeDe recognises the file and does its stuff, and yes it is delphi because
I now have the forms and pascal code nicely disassembled on my HD. So a quick browse through
them to get an idea of the structure. umm
I noticed that DeDe also supports exporting all its references to a W32dasm project. Since
one of the steps I was going to do was to disassemble the file, I ran Wdasm and generated 
a project file, then pointed DeDe to it and let it do its stuff. Hopefully when it finishes
it will leave a nice big file with the combined references, so that should make life easier 
later on. Being able to see the references to the Pascal and Delphi bits should make the code 
a bit easier to follow.
While that was running (it takes some time) my next step was to search all the .pas files 
for references to 'ssf' and 'csf' to find where it loaded the data files, I did not find 
any references of these strings in any of the .pas files. Ok time to load up the W32Dasm 
project and have a look in that file. OK PROBLEM! - the project is still being accessed
during the combining of references, so that option is out for an hour or so, as it seems
to take quite some time (35Mb File to process).
So lets have a look around, there are some DLL's in the directory, so lets check them out:
c4dll.dll is Database Engine Library (Sequiter CodeBase Components for Delphi)
xcdunz32.dll is a Zip Library [Xceed Zip Compression Library]
SSCE5253.dll is the Sentry Spelling-Checker Engine [Wintertree Software]
Zip Library - is this just there for the installation or unpacking updates, or might it
be used on the data files? Time to check, if the data files are zipped then they should
be fairly easy to unpack. That would make life very easy ;)
So lets look at the files that were generated when the program was run, the files in 
what looked like a profile directory. 
channel.ctb seems the most likely candidate, and matches (by some coincidence) roughly 
the size of all the .ssf and .csf files. (1,158,690 bytes)
All .ssf - category files (73,718 bytes). All .csf - engine files (1,131,657 bytes)
This seems a strange coincidence, as opening up this file shows it does have the engine
names and the category names (from filenames) but also contains a LOT of space characters, 
so given this is in a directory called after the user, this should be the users preferences 
for searches or something similar.
Back to the data files, as the only files looking good candidates are the '*.*sf' files 
which fit the bill perfectly. So opened one up in notepad and it looks unreadable.
So right, copied three .ssf and three .csf files of different sizes to a temporary 
directory to start looking at them. Opened the first one in a hex viewer and noticed 
that it is not plain text, ok so it was expected they would be packed or encrypted 
in some way, they would not leave their whole product out in the open. But one thing 
that did jump out was the pattern of the characters.
Here is an excerpt from one of the files: (Boxes are unprintable characters)
Sssx?y[SSsS3SrQSSSSSSSSsx;SSss=
SS'3rrrQPSsS3SrQsS3rpSSssx;[yzys3|
xySSss\_yX[yyx;yxSSss?[[ۜ
SSss=yzX|SSss;x3SSSSSssx
;xSSss}۸[ySSss=Xy?X|S
Sss=Xy?3X|SSs"SSs|xSSss?9X
[;y9Xs3xxy99xx{zyٛ;99xSSss;
;__ysSSSss;}xyӐSQP9[yx2Q?|Q
ӐSs2Qxٸ;QrRpSSss;Pyp?yy8ظ98{
'SSss;Pypy8ҙQy90=XP3p0}
xy0=XP3p0Q;x0=XP3p0s0=XP3p0'SSss;P
};;p
Notice the repeated 'SS','SSs' and 'SSss' sequences. Instinct at this point says 
that this is not a packed file as these repeats would have been eliminated 
by the compression process. There are other repeated sequences present in 
the encoded text.
This is the header common to the 1K category files: Auctions and Buyhardware
9D9D5373F41473F414DF78F8F93FDB79
F85BF213535373F05333F073F3515353
125353125353F414535373F31FF9B978
3BDBF91BF41453537373BE3DB8989BF2
11D3535313F0923372727251505373F0
5333F073
. . . . (more data)
F414
This is also the same in Buysoftware which is a 2k file, apart from one byte
9D9D5373F41473F414DF78F8F93FDB79
F85BF213535373F05333F073 72 [changed F3 to 72] 515353
125353125353F414535373F31FF9B978
3BDBF91BF41453537373BE3DB8989BF2
11D3535313F0923372727251505373F0
5333F073
. . . . (more data)
F414
This seems the only difference but is not the same in all 2k files...
in the copernic.csf file it is:
9D9D5373F41473F414DF78F8F93FDB79
F85BF213535373F05333F0 53 [changed 73 to 53] 72 [changed F3 to 72] 515353
125353125353F414535373F31FF9B978
3BDBF91BF41453537373BE3D
. . . . (more data)
F414
different after this..
So this looks like they are all encoded with the same method, and this is some kind of 
common header to the files.. Also all files seem to end with 'F414'
This does not look like an xor'd pkzip.. as the header is wrong. IF this was a zip 
file with a zip header, you would expect more bytes to be different, if this was a 
zip file with the header removed then the data would not show the same repetitive
patterns at such regular intervals. This lead me towards thinking they were just 
encrypted in some way. This was backed up by the observation that they are all sizes 
from 926 bytes to 3,000 bytes (in all steps) so they are not a fixed structure. 
(but they do have a header and a footer which seems to be common, could just be some 
text at start of file, or could designate something else - seems to me like it would 
be a constant bit at the start of the decoded file, rather than being a packed header
or else more of it would change.. so it looks like they are just mildly
encrypted and are not packed? hopefully anyway. ;)
The 'F414' sequence bothered me as soon as I saw it, the spacing throughout the file 
and also the positioning of it, together with the fact that it appeared in the header 
made me think that this could be '0d0a' or a newline in a text file. This fits with the 
decoded file being plain text. So made a little tool which copied the file and just 
changed those bytes over - the result was a file with what looked like reasonable line 
lengths for a text configuration file. So I was on the right track, or so it seemed.
Here is a snippet of the above file: (with line splits inserted)
Ss
s
x?y[SSsS3SrQSSSSSS
SSsx;
SSss=SS'3rrrQPSsS3SrQsS3rp
SSssx;[yzys3|xy
SSss\_yX[yyx;yx
SSss?[[ۜ
SSss=yzX|
SSss;x3SSS
SSssx;x
SSss}۸[y
SSss=Xy?X|
SSss=Xy?3X|
SSss;Pypy8ҙQy90=XP3p0}xy0=XP3p0Q;x0=XP3p0s0=XP3p0'
SSss;P};;p
This seems to fit the structure of a configuration file, short line lengths. Later in the 
file are longer lines, about the size of a query URL, so this seems right ;) There is also 
a pattern to the characters at the start of the line, and notable is that the repeated 'SS' 
combination appears at the end of strings - this means (hopefully) that it is not a 
position dependent (or offset) substitution.
After a bit of thinking I was convinced that these files are protected by a substitution 
cipher, and more looking at the file content seemed to back this up as there are many 
repeating patterns, as you would expect to see in a file with URL's inside it. So the 
target was to find the translation function or table. I by this time had discounted a 
packed format and had also discarded a binary file, it is a plain text file - this may 
seem like a jump but if you had been sitting on my shoulder you would have seen it 
the same way.
So there are two methods they could use to achieve this, the first would be to use a 
lookup table to do the translation and the second would be to use a function to do the 
same thing. In order to confirm some options, another look at the running program was 
required, when viewed it seemed they did include all lower and uppercase chars and also 
European characters - this was important as it means they have to use all 8 bits of the 
character and cannot throw any away in the function, whereas if they had not included 
any European characters they might be able to throw a bit away somewhere in the function 
and this could affect the findings dramatically. It was also obvious that they used 
normal ASCII characters as the patterns would have been different if they had used 
some form of unicode or multi-byte character set. This gives us more ammunition 
for the coming hunt.
One thing I must add at this point is that there are many known attacks on substitution 
ciphers - these were discarded because they assume a language and work from character 
occurence probability tables. They are very effective but were discarded for this 
target as the contents of the configuration file was known not to match normal text 
as it would be using (presumably) repeated keywords and values which would either be 
meta tags and/or url's, this meant that they might give some results but would 
probably not. So I discounted them to save time!
Getting Hands Dirty
DeDe has now finished, so we can start looking at the assembler for the file. First task
is to hunt down the references to any .ssf or .csf files. When looking through the file you 
will find a few references to this string. These were used as a starting point and breakpoints 
were set on them.
I shall take a wander here - bear with me! When I started looking at DeDe, I was intending 
to work from the disassembled files and track through the code in order to find the 
decryption routine which would restore the files to plaintext. Now my priorities had 
changed somewhat, what I was now after was a portion of the plaintext file and hopefully 
all of one of the files in memory so that it could be saved. The fact that the cipher 
seemed to be a substitution one from the data shown above means that although to find 
the decryption routine would be nice, to find a portion of the plaintext would be just 
as nice in helping find the result. If they have used a table then hopefully once we 
have a portion of the plaintext and what it maps to in the encrypted file, finding the 
table in memory would be very easy. This seems a nicer and quicker approach that 
reading through page after page of disassembled code trying to put it together. This 
point is made more by the fact that the app is in delphi, so a simple instruction 
could quite easily call many functions all over the place.
So trying to stop the urge to go through the code and reassemble what happens, which 
is very hard. I start the code running in W32Dasm with breakpoints set on every 
instance of a string that ends in '.ssf' and '.csf'. It soon breaks on one of them.
At this point I set auto-api stop, and show parameters for local and system calls 
and set it running again. What I am hoping for is one of the calls to have a 
pointer to the plaintext in the call to it.
Here is the bit of code that loads 'Copernic.csf', which is thought to be the 
master configuration file.
* Possible StringData Ref from Code Obj ->"Copernic.csf"
                                  |
:52A00A BAB8A75200       mov edx, 52A7B8
:52A00F E8FCA0EDFF       call 404110
:52A014 8B55E0           mov edx, dword ptr [ebp-20]
:52A017 8B45FC           mov eax, dword ptr [ebp-04]
:52A01A 8B4020           mov eax, dword ptr [eax+20]
:52A01D 8B08             mov ecx, dword ptr [eax]
:52A01F FF5158           call [ecx+58]
:52A022 8B45FC           mov eax, dword ptr [ebp-04]
:52A025 8B4020           mov eax, dword ptr [eax+20]
                         // This following call seems to handle the
                         // file and contains a call which exposes the
                         // plaintext
:52A028 E8970AFAFF       call 4CAAC4   //  HANDLEFILE
:52A02D 85C0             test eax, eax
:52A02F 7425             je 52A056
:52A031 6A00             push 0
:52A033 6A00             push 0
:52A035 A1C4255B00       mov eax, dword ptr [5B25C4]
:52A03A 8B00             mov eax, dword ptr [eax]
:52A03C 8B4050           mov eax, dword ptr [eax+50]
:52A03F BA02000000       mov edx, 2
The code below is the start of the HANDLEFILE routine:
* Referenced by a CALL at Addresses:
|:4EB84D, :52A028, :599F7B, :59A81A   
:4CAAC4 55                      push ebp                      
.
... next part is further down the function.
.
:4CAAFA 8D55E8           lea edx, dword ptr [ebp-18]   
:4CAAFD 8B45FC           mov eax, dword ptr [ebp-04]   
:4CAB00 8B08             mov ecx, dword ptr [eax]
:4CAB02 FF511C           call [ecx+1C]                 
:4CAB05 8B45E8           mov eax, dword ptr [ebp-18]   
:4CAB08 BA01000000       mov edx, 1             
                         //  This function has the plain text for the
                         //  line from the file passed into and outof
                         //  it, so the decoding must happen before this!!!
:4CAB0D E892EDFFFF       call 4C98A4
                         //  [ebp-10] points to the start of text, both into
                         //  and out of this function
So we have found a function that is called with one of the parameters as the plaintext 
for the file currently being handled. This is what we were after, so remove all other 
breakpoints and set a new breakpoint on 0x004CAB0D and make sure we tick the display
parameters to local calls in W32Dasm. Right now every time we hit this function filemon
tells us which file we are reading and the parameter display gives us the location of 
the string.
After placing the breakpoint and grabbing a string of plaintext,
The start of the plaintext is:  "FF01" - 0x46 0x46 0x30 0x31 0x0d 0x0a
While looking at this, I noticed a bit of code further down the disassembly 
listing, which jumped out at me as some possible plaintext.
This is the code that seems to handle parsing the configuration files:
* Possible StringData Ref from Code Obj ->"DisplayName"
:599FA0 BA14A65900         mov edx, 59A614
:599FA5 8B45E4             mov eax, dword ptr [ebp-1C]
:599FA8 E8AB4DF2FF         call 4BED58
:599FAD 8D45C4             lea eax, dword ptr [ebp-3C]
:599FB0 33D2               xor edx, edx
:599FB2 E8B5B6E6FF              call 40566C
:599FB7 8D4DC4                  lea ecx, dword ptr [ebp-3C]
this code is repeated with the following string references:
* Possible StringData Ref from Code Obj ->"Description"
* Possible StringData Ref from Code Obj ->"HomePage"
So this bit of code is parsing a file of some kind looking for the identifiers 
given in the string references, and so that means our file MUST contain some 
of the above strings, as they do not seem to be used in any other files.
Decoding files
So now we have a portion of the plaintext written down (or in a file) 
and this looks very good, and seems to confirm a lot of things. The string 
pointed to is shown below, and when looking for the first time you should 
also refer back to the previous text and see what bells ring ;)
A portion of the plaintext:
FF01
0015Register
0011_Conv="4002->3999 (01-03-09, 10:37:59)"
0011DisplayName="123India"
0011HomePage="http://www.altavista.in/"
The order is slightly changed from the order in the file (only a couple of 
entries swapped) but note the line lengths as these are a giveaway. So we now 
know for sure that we are on the right track - GOOD! Now you can call me stupid 
if you want, but '0011' looks a bit like 'SSss' and also the '001' would mean more 
with the 'SSs' occurences as well.
So this data was saved to a file, and a file was created with the lines mixed and 
grouped in pairs of matching line length. Then a bit of code to read the lines in 
and generate a mapping table from the characters in an encoded line to the 
matching character in the decoded file. This table was then saved to a file as 
a 256 byte list. Obviously this did not include all characters from the table as 
the chances were that not all characters would be used in this one file, but 
the thought was that as I stated above it would either give enough of a clue to 
find the lookup table in memory, or a clue to the function. It was more 
appealing than running through lines and lines of code. So the map table was 
created and any holes were left with their original values, so that errors could
be spotted and added. Then this substitution lookup was loaded into the decoder and 
compiled ready for use. At this point I decided to view the encrypted values with 
the decrypted values in the form of the table, luckily there was a good spread in 
the table and luckily I had picked a file with European characters inside it so 
there were some of those represented in the table.
The original encoded file was then decoded using this partial table as a sortof 
proof-of-concept for the code and the idea. Rightly so the file was decrypted 
and shown in total plain text. So I had proved to myself that I was on the right 
track and I had not even bothered to hunt the disassembly file for the decode 
routine.
The next step was to check for a lookup table in any of the files, so I took a 
portion of the substitution table that contained proper plaintext values and did 
a search of all the files in the root folder for copernic. NOTHING! - so it seems 
they either do not have it in the files, they generate it or the data is encoded 
by a function. This was good news, because the last two options both mean that 
it is created by a function without a lookup table, which means there has to be 
a simple logic to it, as there are only so many ways to scramble 256 entries 
using code and without loosing any entries or values.
Now at this point I should really have dived into the dead listing and tried to 
find the routine, but I took a different approach. I instead turned my attention 
to the output of my lookup table creator, and the results it had given me. I was 
trying to look for a pattern within the mapping
This is a partial dump of the lookup table and values, showing the relationship 
between the encoded and decoded characters: (all values are HEX)
Encoded Decoded 10 2a 11 22 12 3a 13 32 14 0a Encoded Decoded 18 6a f8 6d 19 62 f9 65 1a 7a fa 7d 1b 72 fb 75 1c 4a fc 4d 1d 42 fd 45 1e 5a fe 5d 1f 52 ff 55 38 6b 58 68 39 63 59 60 3a 7b 5a 78 3b 73 5b 70 3c 4b 5c 48 3d 43 5d 40 
It did not take long for one to jump out at me, did you pay attention to the 
above table, did any bells go off? I left holes in the table on purpose so 
you had to look at it. Have you seen the pattern, it is a nice one I must
admit - if you just arrange the table with the characters showing instead 
of the hex, a pattern does jump out, but not as much as when viewing the 
hex bytes. Hopefully you should agree with me when I now say that the dead listing 
approach suddenly lost a LOT of its appeal for this target.
This is a regular pattern based substitution, done by a bit of code which 
is not very complex or large. I have already gone down the road of abandoning 
the dead listing, and it is now firmly in the bin. So to reverse this encoding 
we simply need to analyse the pattern.
It also appears as though the resulting value is made up from two separate 
nibbles (4bits) and they are bolted together, this is shown by the way they 
seem to change out of step with each other.
Pseudo code:
Variables:
IN_A = encoded_byte
IN_H = encoded_byte_high_nibble
IN_L = encoded_byte_low_nibble
OUT_H = decoded_byte_high_nibble
OUT_L = decoded_byte_low_nibble
to set up the code do the following:
IN_A = read_from_file();
IN_H = IN_A & 0xf0;
IN_L = IN_A & 0x0f;
before exiting:
OUT_A = OUT_H | OUT_L;
Taking the examples:
0x38 -> 0x6B and 0x39 -> 0x63
It seems like there are two values for the lower nibble, and these seem to 
be offset by 8, so no matter what the lower value is the higher one is that 
plus 8. (Look at the table above to confirm this) The use of this value seems 
to be dependent on the lower bit of IN_A. So the final step is to take 
the low bit of IN_A and if it is clear to add 0x08 to the output byte.
You can also see that the lower nibble of decoded char (OUT_L) is related to 
the upper nibble of encoded data (IN_H). And that the upper nibble of decoded 
char (OUT_H) is related to lower nibble of encoded char (IN_L).
Look at the 0x*8 and 0x*9 values they all map to 0x6*, just like 0x*A and 0x*B 
values map to 0x7*, and like 0x*E and 0x*F map to 0x5*. Now look at 0xff, the 
lower value for the lower nibble is '5' so 0xf* -> *5 and 0x*F -> 0x5*.
If you do more checking it will reassure you, what is of interest is that these 
mappings seem to be the same for both halves, which should make life a lot easier. 
So now that we have isolated the components, lets create a mapping for the nibbles, 
just taking the values from the previous table.
Original Nibble   Output Nibble
    0x0,0x1           0x2
    0x2,0x3           0x3
    0x4,0x5           0x0
    0x6,0x7           0x1
    0x8,0x9           0x6
    0xA,0xB           0x7
    0xC,0xD           0x4
    0xE,0xF           0x5
So Putting this together gives us:
Variables:
IN_A = encoded_byte
IN_H = encoded_byte_high_nibble
IN_L = encoded_byte_low_nibble
OUT_H = decoded_byte_high_nibble
OUT_L = decoded_byte_low_nibble
LOOKUP = [2,2,3,3,0,0,1,1,6,6,7,7,4,4,5,5]
to set up the code do the following:
IN_A = read_from_file()
IN_H = (IN_A & 0xf0)>>4		// Get high nibble into low nibble
IN_L = IN_A & 0x0f            // Isolate low nibble
OUT_H = lookup[IN_L]<<4       // To get into high nibble 
OUT_L = lookup[IN_H]          // this is low nibble
OUT_A = OUT_H | OUT_L;        // merge the two
if ((IN_A & 0x01) == 0)       // This does the offset on
	OUT_A = OUT_A + 0x08    // the lower nibble
This can be simplified to the code below:
char lookup[]={2,2,3,3,0,0,1,1,6,6,7,7,4,4,5,5};
int decode_character(int encoded)
{
  if (encoded & 0x01)
    return( (lookup[encoded&0xf]<<4) + lookup[(encoded&0xf0)>>4] );
  else
    return( (lookup[encoded&0xf]<<4) + lookup[(encoded&0xf0)>>4] +8 );   
}
I have not looked in the executable for this code or the bit that does the 
same function as that does not matter. If you use the above function as 
a decoder for each character in all the '*.ssf' and '*.csf' files within
the programs directorys it will convert them to the plaintext (unencoded) 
versions.
So I had the files in plain text form and they were all text configuration 
files as I had thought, so I counted (in the version I have) 754 search 
engines or URL's - that is quite a lot of data, and also this product 
has also got them grouped nicely, which will help with the problem of 
how to organise them, its already done.
So at this point I am pretty happy with how things have gone, I have a 
routine which decodes their input files and have converted them all to
plain text, so the data is now usable. And to think this has been 
achieved with only minimal time in front of code, only the period when 
scanning for the plain text. 
Scripting Language
When examination of the decoded files was started, one of the first files 
looked at was 'copernic.csf' as this sits in the approot and is named the 
same as the application, this was a good choice for master configuration or 
some kind of global parameters file.
You should remember from earlier that most lines in the conf files seem to 
have a 4 digit number (0011) of varying value at the start of the line. The 
example given earlier did not show this as clearly as the following example 
hopefully will. This is an instruction for the internal scripting language 
to tell it how to handle the rest of the line.
This is the decoded version of 'copernic.csf':
FF01
1
TimeStamp=2001-03-09 00:00:00
0015Register
0011ChannelSet="Ad"
0011ChannelSet3="Ad"
0011Version=2525
0011FileVersion=0
0011SoftwareVersions="eng;engplus;engpro;fra;fraplus;frapro"
0016
0015Init
0011UseCookies=True
1001
0011SearchQuerySeparator="+"
1003
0011Key=SearchQuery
0011RNDSEED=""
0018Length(RNDSEED)<>12
0011RNDSEED=String(Random(99999999)*Random(9999))
0019
0011T=Random(999999)
0011PromoT=Numeric(Substring(RNDSEED,8,1))
0011PromoTI=Numeric(Substring(RNDSEED,9,1))
0011Random100=Numeric(Substring(RNDSEED,10,2))
0011SourceFLYCAST=Replace("ENG|1|http://ad-adex3.flycast.com/server/_img/Copernic/software/$RANDOMNUMBER$|http://ad-adex3.flycast.com/server/click/Copernic/software/$RANDOMNUMBER$","$RANDOMNUMBER$",String(T))
0011Source247ENG=Replace(Replace("ENG|1|http://connect.247media.ads.link4ads.com/serv/2/Copernic/ros/468x60/40543;uniq=$RANDOMNUMBER$?$KEY$|http://connect.247media.ads.link4ads.com/click/2/Copernic/ros/468x60/40543;uniq=$RANDOMNUMBER$","$KEY$",String(Key)),"$RANDOMNUMBER$",String(T))
0011Source247FRA=Replace(Replace("FRA|1|http://connect.247media.ads.link4ads.com/serv/2/fr-Copernic/ros/468x60/40543;uniq=$RANDOMNUMBER$?$KEY$|http://connect.247media.ads.link4ads.com/click/2/fr-Copernic/ros/468x60/40543;uniq=$RANDOMNUMBER$","$KEY$",String(Key)),"$RANDOMNUMBER$",String(T))
0011SourceUFS="UFS|1|http://banner.unifiedweb.com/cgi-bin/getimage.exe/copernic?GROUP=copernic|http://banner.unifiedweb.com/cgi-bin/redirect.exe/copernic"
0011SourceVALUECLICK="VALUECLICK|1|http://kansas.valueclick.com/cycle?host=hs0136917&b=1&noscript=1|http://kansas.valueclick.com/redirect?host=hs0136917&b=1&v=0"
0011SourceVALUECLICKOLD="VALUECLICK|1|http://kansas.valueclick.com/cycle?host=hs0194203&size=468x60&b=indexpage&noscript=1|http://kansas.valueclick.com/redirect?host=hs0194203&size=468x60&b=indexpage&v=0"
0011SourceSERVERFRA4552=Replace(Replace("BANNERSERVER|1|http://bannerpush.copernicserver.com/RealMedia/ads/adstream_nx.cgi/copernicclient/free/fra/recent/$RANDOMNUMBER$?$KEY$|http://bannerpush.copernicserver.com/RealMedia/ads/click_nx.cgi/copernicclient/free/fra/recent/$RANDOMNUMBER$","$KEY$",String(Key)),"$RANDOMNUMBER$",String(T))
0011SourceSERVERENG4552=Replace(Replace("BANNERSERVER|1|http://bannerpush.copernicserver.com/RealMedia/ads/adstream_nx.cgi/copernicclient/free/eng/recent/$RANDOMNUMBER$?$KEY$|http://bannerpush.copernicserver.com/RealMedia/ads/click_nx.cgi/copernicclient/free/eng/recent/$RANDOMNUMBER$","$KEY$",String(Key)),"$RANDOMNUMBER$",String(T))
0011SourceSERVERFRA4551=Replace(Replace("BANNERSERVER|1|http://bannerpush.copernicserver.com/RealMedia/ads/adstream_nx.cgi/copernicclient/free/fra/old/$RANDOMNUMBER$?$KEY$|http://bannerpush.copernicserver.com/RealMedia/ads/click_nx.cgi/copernicclient/free/fra/old/$RANDOMNUMBER$","$KEY$",String(Key)),"$RANDOMNUMBER$",String(T))
0011SourceSERVERENG4551=Replace(Replace("BANNERSERVER|1|http://bannerpush.copernicserver.com/RealMedia/ads/adstream_nx.cgi/copernicclient/free/eng/old/$RANDOMNUMBER$?$KEY$|http://bannerpush.copernicserver.com/RealMedia/ads/click_nx.cgi/copernicclient/free/eng/old/$RANDOMNUMBER$","$KEY$",String(Key)),"$RANDOMNUMBER$",String(T))
0012Find("ENGUFS",Edition)<>0
0011SourceUrl=Entry(3,SourceUFS,"|")
0011TargetUrl=Entry(4,SourceUFS,"|")
0013
0012(Find("PLUS",Edition)<>0)or(Find("PRO",Edition)<>0)
0012BuildNumber>4551
0011SourceUrl=Entry(3,SourceVALUECLICK,"|")
0011TargetUrl=Entry(4,SourceVALUECLICK,"|")
0013
0011SourceUrl=Entry(3,SourceVALUECLICKOLD,"|")
0011TargetUrl=Entry(4,SourceVALUECLICKOLD,"|")
0014
0013
0012BuildNumber>4551
0011SelfPromoPercent=0
0013
0012Substring(Edition,1,3)="FRA"
0011SelfPromoPercent=0
0013
0011SelfPromoPercent=10
0014
0014
0012Random1004551
0012Substring(Edition,1,3)="FRA"
0011SourceUrl=Entry(3,SourceSERVERFRA4552,"|")
0011TargetUrl=Entry(4,SourceSERVERFRA4552,"|")
0013
0011SourceUrl=Entry(3,SourceSERVERENG4552,"|")
0011TargetUrl=Entry(4,SourceSERVERENG4552,"|")
0014
0013
0012Random100>54
0012Substring(Edition,1,3)="FRA"
0011SourceUrl=Entry(3,Source247FRA,"|")
0011TargetUrl=Entry(4,Source247FRA,"|")
0013
0011SourceUrl=Entry(3,Source247ENG,"|")
0011TargetUrl=Entry(4,Source247ENG,"|")
0014
0013
0011SourceUrl=Entry(3,SourceVALUECLICKOLD,"|")
0011TargetUrl=Entry(4,SourceVALUECLICKOLD,"|")
0014
0014
0014
0014
0014
0011RotationInterval=120000
0016
11A2
 
This is a table giving the function for each command string:
String COMMAND Description 0011 SET SET variable=value 0012 IF IF expression THEN 0013 ELSE ELSE 0014 ENDIF ENDIF 0015 FUNC Function Definition Start 0016 ENDFUNC End Function Def 0018 WHILE WHILE expression DO 0019 WEND End While Loop 
Also there are some functions:
Replace(String A,String B,String B)
This takes the string A, it then finds all occurrences of string B and replaces 
them with the string in C. So Replace("ABCCCBA","CCC","YYY) would return "ABYYYBA"
Substring(String A,Number B,Number C)
This takes the string A and grabs C characters, starting at position B. So Substring("ENGPRO",1,3) 
would return "ENG"
Numeric(Number A)
This returns the number represented in A as a string. So Numeric("100") would return 100
Length(String A)
This returns the length of the String passed in. So Length("ENG") would return 3
Random(Number A)
This returns a random number between upto the value of A. So Random(99999) could return 99999.
String(Number A)
This returns the string representation of the Number A. So String(100) would return "100"
Find(String A,String B)
This returns true if string A is found in string B. So Find("PRO","ENGPRO") would return true
Entry(3,Source247FRA,"|")
Entry(Number A, String B, String C)
This returns an entry in a string which contains delimited values. A is the number 
of the data segment to return. B is the string which holds the data. 
C is the character used for the separator.
Using the example Entry(NUM,"AAA|BBB|CCC|DDD","|")
if NUM is set to 1 it would return "AAA", if NUM is 2 then "BBB", if NUM is 3 then "CCC".
Using the above command table, if we translate the script into normal code 
language we get the script below:
FF01
1
TimeStamp=2001-03-09 00:00:00
FUNC   Register
  SET    ChannelSet="Ad"
  SET    ChannelSet3="Ad"
  SET    Version=2525
  SET    FileVersion=0
  SET    SoftwareVersions="eng;engplus;engpro;fra;fraplus;frapro"
ENDFUNC
FUNC   Init
  SET    UseCookies=True
  1001
  SET    SearchQuerySeparator="+"
  1003
  SET    Key=SearchQuery
  SET    RNDSEED=""
  WHILE  Length(RNDSEED)<>12
    SET    RNDSEED=String(Random(99999999)*Random(9999))
  WEND
  SET    T=Random(999999)
  SET    PromoT=Numeric(Substring(RNDSEED,8,1))
  SET    PromoTI=Numeric(Substring(RNDSEED,9,1))
  SET    Random100=Numeric(Substring(RNDSEED,10,2))
  SET    SourceFLYCAST=Replace("ENG|1|http://ad-adex3.flycast.com/server/_img/Copernic/software/$RANDOMNUMBER$|http://ad-adex3.flycast.com/server/click/Copernic/software/$RANDOMNUMBER$","$RANDOMNUMBER$",String(T))
  SET    Source247ENG=Replace(Replace("ENG|1|http://connect.247media.ads.link4ads.com/serv/2/Copernic/ros/468x60/40543;uniq=$RANDOMNUMBER$?$KEY$|http://connect.247media.ads.link4ads.com/click/2/Copernic/ros/468x60/40543;uniq=$RANDOMNUMBER$","$KEY$",String(Key)),"$RANDOMNUMBER$",String(T))
  SET    Source247FRA=Replace(Replace("FRA|1|http://connect.247media.ads.link4ads.com/serv/2/fr-Copernic/ros/468x60/40543;uniq=$RANDOMNUMBER$?$KEY$|http://connect.247media.ads.link4ads.com/click/2/fr-Copernic/ros/468x60/40543;uniq=$RANDOMNUMBER$","$KEY$",String(Key)),"$RANDOMNUMBER$",String(T))
  SET    SourceUFS="UFS|1|http://banner.unifiedweb.com/cgi-bin/getimage.exe/copernic?GROUP=copernic|http://banner.unifiedweb.com/cgi-bin/redirect.exe/copernic"
  SET    SourceVALUECLICK="VALUECLICK|1|http://kansas.valueclick.com/cycle?host=hs0136917&b=1&noscript=1|http://kansas.valueclick.com/redirect?host=hs0136917&b=1&v=0"
  SET    SourceVALUECLICKOLD="VALUECLICK|1|http://kansas.valueclick.com/cycle?host=hs0194203&size=468x60&b=indexpage&noscript=1|http://kansas.valueclick.com/redirect?host=hs0194203&size=468x60&b=indexpage&v=0"
  SET    SourceSERVERFRA4552=Replace(Replace("BANNERSERVER|1|http://bannerpush.copernicserver.com/RealMedia/ads/adstream_nx.cgi/copernicclient/free/fra/recent/$RANDOMNUMBER$?$KEY$|http://bannerpush.copernicserver.com/RealMedia/ads/click_nx.cgi/copernicclient/free/fra/recent/$RANDOMNUMBER$","$KEY$",String(Key)),"$RANDOMNUMBER$",String(T))
  SET    SourceSERVERENG4552=Replace(Replace("BANNERSERVER|1|http://bannerpush.copernicserver.com/RealMedia/ads/adstream_nx.cgi/copernicclient/free/eng/recent/$RANDOMNUMBER$?$KEY$|http://bannerpush.copernicserver.com/RealMedia/ads/click_nx.cgi/copernicclient/free/eng/recent/$RANDOMNUMBER$","$KEY$",String(Key)),"$RANDOMNUMBER$",String(T))
  SET    SourceSERVERFRA4551=Replace(Replace("BANNERSERVER|1|http://bannerpush.copernicserver.com/RealMedia/ads/adstream_nx.cgi/copernicclient/free/fra/old/$RANDOMNUMBER$?$KEY$|http://bannerpush.copernicserver.com/RealMedia/ads/click_nx.cgi/copernicclient/free/fra/old/$RANDOMNUMBER$","$KEY$",String(Key)),"$RANDOMNUMBER$",String(T))
  SET    SourceSERVERENG4551=Replace(Replace("BANNERSERVER|1|http://bannerpush.copernicserver.com/RealMedia/ads/adstream_nx.cgi/copernicclient/free/eng/old/$RANDOMNUMBER$?$KEY$|http://bannerpush.copernicserver.com/RealMedia/ads/click_nx.cgi/copernicclient/free/eng/old/$RANDOMNUMBER$","$KEY$",String(Key)),"$RANDOMNUMBER$",String(T))
  IF     Find("ENGUFS",Edition)<>0        // if ENGUFS version
    SET    SourceUrl=Entry(3,SourceUFS,"|")
    SET    TargetUrl=Entry(4,SourceUFS,"|")
  ELSE
    IF     (Find("PLUS",Edition)<>0)or(Find("PRO",Edition)<>0)
                                                // PRO or PLUS
      IF     BuildNumber>4551                // BUILD > 4551
        SET    SourceUrl=Entry(3,SourceVALUECLICK,"|")
        SET    TargetUrl=Entry(4,SourceVALUECLICK,"|")
      ELSE                                      // BUILD <= 4551
        SET    SourceUrl=Entry(3,SourceVALUECLICKOLD,"|")
        SET    TargetUrl=Entry(4,SourceVALUECLICKOLD,"|")
      ENDIF
    ELSE
      IF     BuildNumber>4551                // BUILD > 4551
        SET    SelfPromoPercent=0               // clear addshow variable
      ELSE
        IF     Substring(Edition,1,3)="FRA"     // FRENCH
          SET    SelfPromoPercent=0             // clear addshow variable
        ELSE                                    // ENGLISH
          SET    SelfPromoPercent=10            // set addshow to 10%
        ENDIF
      ENDIF
      IF     Random100<SelfPromoPercent      // if random < addshow
        SET    SourceUrl=Entry(3,SourceSERVERENG4551,"|")
        SET    TargetUrl=Entry(4,SourceSERVERENG4551,"|")
      ELSE                                      // if random >= addshow
        IF     BuildNumber>4551              // BUILD > 4551
          IF     Substring(Edition,1,3)="FRA"   // FRENCH
            SET    SourceUrl=Entry(3,SourceSERVERFRA4552,"|")
            SET    TargetUrl=Entry(4,SourceSERVERFRA4552,"|")
          ELSE                                  // ENGLISH
            SET    SourceUrl=Entry(3,SourceSERVERENG4552,"|")
            SET    TargetUrl=Entry(4,SourceSERVERENG4552,"|")
          ENDIF
        ELSE                                    // BUILD <= 4551
          IF     Random100>54                // if random > 54
            IF     Substring(Edition,1,3)="FRA"	// FRENCH
              SET    SourceUrl=Entry(3,Source247FRA,"|")
              SET    TargetUrl=Entry(4,Source247FRA,"|")
            ELSE                                // ENGLISH
              SET    SourceUrl=Entry(3,Source247ENG,"|")
              SET    TargetUrl=Entry(4,Source247ENG,"|")
            ENDIF
          ELSE                                  // random <= 54
            SET    SourceUrl=Entry(3,SourceVALUECLICKOLD,"|")
            SET    TargetUrl=Entry(4,SourceVALUECLICKOLD,"|")
          ENDIF
        ENDIF
      ENDIF
    ENDIF
  ENDIF
  SET    RotationInterval=120000
ENDFUNC
11A2
So this is a script which seems to control all the adverts, so surely a bit of 
creative writing is called for. As we already have a decoder we can simply 
reverse the process to encode the file after we have created the new one.
We can also figure out a couple of other things, the first is that the following 
segment is the header for each file, this does not seem to contain any of the 
found script commands, or even the characters for them. This segment seems to be 
present at start of all the files:
FF01
1
TimeStamp=2001-03-09 00:00:00
The second is this entry at the end of the file, which seems to be a footer of 
some kind -  when first looked at it appears that is possibly some form of CRC.
11A2
How about if you are told that the length of this file in HEX is 0x11C4. 
Another example is a file with 03AC and a file length of 0x3CE.
So if we do 0x11c4 - 0x11a2 we get 0x22 , and 0x3CE - 0x3AC = 0x22, this means 
that this entry is the length of the file minus 0x22 (34 dec). So if we are to 
alter the config file (with the hope of replacing it) then we should put the 
correct value into this entry as well as encoding the file.
It should be noted that in experiments the file was not parsed and loaded unless 
this filelength value was correct, so copernic probably uses it to parse the 
input file, to strip the header and so it must give the data length within 
the file. This value should be set to the correct value!
Search Query Spying
It should be noted that all adverts that are grabbed from the two servers 
"bannerpush.copernicserver.com" and "connect.247media.ads.link4ads.com" contain 
the user query variable from the script in the request. 
This means that if your parameters cause adverts to be grabbed from either of these 
two locations then they are getting details on what you are searching for.
Your can verify this for yourself by looking at the above script and finding 
the entries for these two servers.
Advert Removal
Even though the 'PRO' version has a tick box to turn off adverts, the 
assumption was made that the free version probably displays loads of 
adverts. Also why would anyone with the pro version have the tick 
box turned on - that really puzzles me, apart from if they use the 
same dialog and just have it set so it is ticked and disabled in the 
free version so the user cannot change it - I will not verify this. But 
this gave me an idea, if all versions use the config files then we can 
make a new one for the free version, thus removing that part of the 
whole advert problem.
So the task was to create a new version of 'copernic.csf' which has the 
references to the advert servers removed, because I was not sure of the 
effect of returning empty strings, I chose to instead point the requests 
to the local machine. This should at least save remote requests and also 
save the user the bandwidth in getting the advert images.
This is my version of the script:
FF01
1
TimeStamp=2001-03-09 00:00:00
0015Register
0011ChannelSet="Ad"
0011ChannelSet3="Ad"
0011Version=2525
0011FileVersion=0
0011SoftwareVersions="eng;engplus;engpro;fra;fraplus;frapro"
0016
0015Init
0011UseCookies=True
1001
0011SearchQuerySeparator="+"
1003
0011SelfPromoPercent=0
0011SourceUrl="http://127.0.0.1/"
0011TargetUrl="http://127.0.0.1/"
0011RotationInterval=120000
0016
11A2
We should not forget to change the size value at the end, so set it to the 
length of the file minus 0x22, and write the encoded file to 'copernic.csf'.
Also 'updates.copernic.com', 'regcards.copernic.com' and 'www.copernic.com' 
should be added to your hosts file as local host, or to the banned list 
for your local proxy ;) This is to stop any updates or personal data transfer 
from happening. This should stop the software from any phone home tactics and 
hopefully should remove all adverts without having to touch any of the code. 
After all we are simply using the programs scripts against itself.
I have not tested this but it should work, and I see no reason why it would 
not have the desired effect!
Adding a Group
Looking at the decoded .ssf and .csf files you will see that they share the 
same scripting language with a few additions. So the thought was, as it 
parses all the files in the set directories and not specific ones, could 
a new file or files be added and so add engines and groups to the copernic 
engine. This would mean that we are no longer tied to the ones they supply 
it would also prove how it works.
Using one of the groups file as an example, the following file was created:
FF01
1
TimeStamp=2001-03-15 00:00:00
0015Register
0011_Conv="4002->3999 (01-03-15, 10:58:42)"
0011DisplayName="Custom"
0011DisplayNames("FRA")="Custom French"
0011DisplayNames("DEU")="Custom German"
0011DisplayNames("ITA")="Custom Italian"
0011DisplayNames("ESP")="Custom Spanish"
0011DisplayNames("POR")="Custom Portugese"
0011Description="Custom Search Group"
0011Descriptions("FRA")="Custom Search Group"
0011Descriptions("DEU")="Custom Search Group"
0011Descriptions("ITA")="Custom Search Group"
0011Descriptions("ESP")="Custom Search Group"
0011Descriptions("POR")="Custom Search Group"
0011ResultsPerChannel=10
0011TotalResults=1000
0011Version=3000
0011FileVersion=1
0011AutoUpdate=True
0011SearchType="keywords"
0016
0015AfterDownload
0016
This file was saved as 'Custom.ssf' , encoded using the encode routine 
and placed in the 'Categories' directory. Now to run the application 
and see if the group is now in the lists. The puzzling thing was that 
the group did not appear in the drop down of groups, or the main tab 
on the left giving all the groups, but if we do a search and then in 
that screen browse the groups it is there at the bottom of the list. 
This might be because we have no search engines assigned to this 
group. When we find the group setting in the category dialog it shows 
no engines under the group. This is a good sign.
Note that the group appears only at the end of the list in the 
categories dialog until you have either done a search using that group 
or closed the program and reopened it, then it seems to be alpha sorted 
into the list.
Adding a Search Engine
So to create a search engine file, I will use searchlores own Namazu 
engine as an example, the following file was created:
FF01
1
TimeStamp=2001-03-09 00:00:00
0015Register
0011_Conv="4002->3999 (01-03-09, 10:52:49)"
0011DisplayName="Namazu"
0011HomePage="http://www.searchlores.org/"
0011SupportNew=True
0011Category="Custom"
0011Version=3000
0011FileVersion=2
0011AutoUpdate=True
0011ChannelSet="Custom"
0011ChannelSet3="Custom"
0011SupportOr=True
0011SupportAnd=True
0011SupportQuotes=True
0016
0015Init
0011SourceUrl="http://www.searchlores.org/cgi-bin/search?query="
0011ResultsPerPage=20
100A("")
1004("searchlores.org")
0011Rules("Range").StartMarker="Search Results for"
0011Rules("Range").EndMarker=""
0011Rules("Address").Key=True
0011Rules("Title").StartMarker=">"
0011Rules("Title").EndMarker=""
0011Rules("Title").StartLine=0
0011Rules("Title").NbLines=1
0011Rules("Description").StartMarker=""
0011Rules("Description").EndMarker=""
0011Rules("Description").StartLine=0
0011Rules("Description").NbLines=1
0011SearchQuerySeparator="+"
1003
0016
0015BeforeDownload
1001
1002("query="+SearchQuery)
1002("result=normal")
1002("sort=score")
1002("max=20")
0016
0015AfterDownload
0016
This file was saved as 'Namazu.csf' , encoded using the encode routine 
and placed in the 'Categories\Engines' directory. Now to run the application 
and see if the group is now in the lists. 
Nope the group is not in the normal lists, but is still in the category 
dialog, and also if you click on a group to do a search it is in the 
dropdown box, and when viewing it you can see the Namazu engine within 
the group. So that worked quite well, still have to figure out how to get 
it in the quick groups dropdown and the left hand list in the main view.
But I can select the group and also the search engine, and the request does 
seem to go out (to local proxy). So the engine configuration and group 
configuration will add in any files you place in the app directorys. This 
is really nice and opens up a lot of possible routes.
It should be noted that file above file for namazu is not quite complete as 
the results parsing bit has been taken from another file and may not match 
but the parameters passed in are correct. Examination of the engine configuration 
files is recommended as their scripting language allows some very nice things 
to be performed and is certainly powerful enough for the task required.
After a bit of looking round the menus in copernic (I had not used it before) 
I spotted in the Tools Menu, Options. In options there is a button labelled 
'Category Bar' settings. Ok so lets click on it. So ok we have all the other 
groups on the right hand side as being part of the category bar (the groups 
shortcut menus) and Custom sitting alone on the right hand side (not included) 
so this seems simple. Select the group and add it to the other list using 
the supplied button, use up or down to put it where you want. Right now exit 
from this dialog. LO and BEHOLD the groups list on the right hand side now 
contains the group 'Custom' and if we look inside Custom there is 'Namazu'. 
So adding groups and engines is now possible with copernic.
Conclusions
My aim was not to take the program apart too much, just to get to the data on 
the search engines, without spending hours looking at assembler code. 
But during this task I have found many things out about how 
this program does other things - some are good and some are bad. There is a lot 
of hardcoded bits, especially to do with language and syntax (lexicon) which 
cannot be updated by updates as it is hardcoded, or at least that is how it 
appears to be. I do not like at all the intrusive phone home features of this 
product - at least this product uses the proxy you give it for these requests 
and does not try to bypass it like some similar products.
I was very disappointed with the encryption on the data files, mind you the 
application was coded in delphi. But seriously you would have thought the 
developers would have put a bit more in, after all if you are going to 
put some encryption in, at least make it worthwhile. 
The task was also made a bit easier by the fact that the filenames and directory 
structure of the configuration files told you exactly what group or engine 
each file related to and what to expect in each 
file. It seems like the author wants you to get the data out of 
the program, or at least not make our task too hard.
On hindsight (always a good thing) once it had been decided that the 
method of encryption was a substitution cipher, if the request URL's from the 
proxy server, the strings from the executable and the details 
in the groups files were collected it would have 
been possible to do a known plaintext attack on the encoded files 
and got enough data to recover the encoding method. This would have worked 
equally as well as the path I chose to follow, but might have taken a bit 
longer - but would have had the same result and without having to even touch 
a disassembler or debugger. I 
chose to grab the plaintext from the program, so a whole file of plaintext could 
be grabbed in one go, and a translation table built easily but a partial plaintext 
lookup generator program would have worked equally as well. 
The scripting language they have included interested me most ,it has some nice 
ideas in it, even though it seems to have its roots in a BASIC type language. 
Bot writers and OSLSE project fans should examine this and how 
it works to learn many things. It can provide many pointers and ideas to 
programmers of VSL's for Bots and other such programs, as it can be very 
versatile and is simple in concept 
but offers expandability and flexibility. It also seems a lot more flexible 
than a simple macro type vsl, where you include commands into strings and 
then parse them out, as in webferret. This is not meant to mean that one is 
better or worse than the other, but that both are interesting and that it 
would be easier to include the webferret idea into this than the other way
around. From looking at it, it would be 
very simple to parse and implement because of its defined structure and 
the flexibility of being text based and not some form of microcode. This 
also makes it very suitable for inclusion in a format such as XML, as an 
embedded script.
Final Thoughts
Firstly I would like to point out that you should try and learn about how your 
target works before trying to take it apart, reading the essay you should 
hopefully have seen how the clues picked up early on helped later in the 
process. While you are installing LOG what the program does. When you run the 
program for the first and subsequent times LOG what the program does. These log 
files will not cost you anything to make (apart from the time to start filemon 
and regmon) and will save you doing it later. Then when a question comes up you 
do not have to think - oh I must uninstall and reinstall to get a log of every 
change - not all may be removed or put back on - it depends on the program. So 
do it the first time. Pick your target and work it, right from the start.
After the script code I realise that I was trying to over complicate matters 
and produce some fancy parsing macro type thing for the parsing part of 
my bot, seeing this has brought me back to a simple but very expandable 
idea, which will be much easier to implement and expand as development 
requires. Sometimes it takes seeing another point of view to bring some 
clarity to your thoughts and put you back on the right track.
If you are going to write a paper on a subject you normally would research 
other works on the same subject first, surely the same should be done if 
you are working on some software. This might save you from reinventing the 
wheel as a square. I am not saying use their ideas exactly as they do, 
but you should observe and learn from them, then create a solution which 
brings all the parts most suited to your task together.
I would also like to point out that people tend to download and use software 
without really understanding what it does, or what data about them goes where. 
You should take care of what software you use and should understand the 
hidden datas that they send about you. A prime example is the entry in the 
advert request in this product which gives them what you are searching for, 
quite apart from the update and regcard information. Most products of this 
type seem to conduct this form of activity and the users should be made 
aware of this before using the products.
The use of adverts in products is actually robbing, yes robbing the users 
of their precious bandwidth, while they are showing adverts you are loosing 
bandwidth and I believe that reducing the advert shown to a 1x1 image or 
simply hiding the advert is not a solution as you are still using bandwidth 
the only proper method of advert removal is to make sure the request never 
gets out, or at least not as far as your internet connection.
Disclaimer
I must point out that during the writing of this essay, at no point was Copernic
allowed to interact with the internet in any way shape or form. It has now 
been removed from the PC it was installed on and will not be returning.
A lot information was gained from log files, and some reversing of course! ;).
Hope you enjoyed reading.
Copyright (c) 2001, WayOutThere
    
      
   

   
Back to essays 
 
     
      
   

   
Back to bots lab