Date: prev next · Thread: first prev next last
2012 Archives by date, by thread · List index


Not sure who asked the original question, nor what OS you're on but...

If you're using Linux and the files are in fact text files (CVS) as indicated below, then I would think something to the effect of:

cat file1 file2 >> file3 | sort | uniq > uniqueRecords.txt

Might do part of the trick. I say might because it's been too many years since I hung up my Unix/Linux hat and I've forgotten most of the good stuff. Sorry I can't be more helpful.

The above translates into English as: merge file1 and file2 into file3, sort file3 and then find those lines that are NOT duplicated (i.e. unique) and store them in "uniqueRecords.txt". uniq has an option to return the opposite, i.e. those lines that ARE duplicates.

so then you'd do:
cat file3 | sort | uniq -d > duplicateRecords.txt

Now eliminate the duplicates from the duplicateRecords.txt file
cat duplicateRecords.txt | uniq > almostDone.txt

Finally, merge the first set of unique records with the "unique duplicates".
cat uniqueRecords.txt almostDone.txt >> done.csv

Here's the same thing without all the comments:
cat file1 file2 >> file3 | sort | uniq > uniqueRecords.txt
cat file3 | sort | uniq -d > duplicateRecords.txt
cat duplicateRecords.txt | uniq > almostDone.txt
cat uniqueRecords.txt almostDone.txt >> done.csv

Not having a Unix/Linux system handy I can't test it, but you should see results like this if I got it right:
Assuming file1 is composed of:
x,y,z
a,b,c
d,e,f
and file2:
a,b,c
g,h,i
j,k,l

Results for each line above should be as follows:
1.
d,e,f
g,h,i
j,k,l
x,y,z

2.
a,b,c
a,b,c

3. a,b,c

4.
a,b,c
d,e,f
g,h,i
j,k,l
x,y,z


Another possible tool to look into is AWK. If you know it, it could be even simpler than the above 4 line script. diff and/or one of it's variants might be another potential tool to get the job done.


On 7/28/2012 12:42 PM, Lynne Stevens wrote:


omega
    The
Omega sector
America's Last
Line of
Defense

*Oh yeah both files are CVS and are the way Thomas downloads them but being he is not to computer literate and most likely does not know he can save it in different ways . . No telling . . I ask one time and he said that is how it gets downloaded



*

On 07/28/2012 09:39 AM, Alexander Thurgood wrote:
Le 28/07/12 14:17, Lynne Stevens a écrit :

Hi Lynne,


*How do I check for duplicates in a data base using another data base ?






--
For unsubscribe instructions e-mail to: users+help@global.libreoffice.org
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted

Context


Privacy Policy | Impressum (Legal Info) | Copyright information: Unless otherwise specified, all text and images on this website are licensed under the Creative Commons Attribution-Share Alike 3.0 License. This does not include the source code of LibreOffice, which is licensed under the Mozilla Public License (MPLv2). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy.