[SOLVED] read txt file into an array and make a second txt file

zimbot · 08-27-2015, 05:12 PM

Friends
I am struggling with a multi step process and am stuck on the following step.

I wish to take a text file and make a second (changed ) txt file
there is a pattern of "_2015*10 diff numbers*_"
that i *think I wish to build into an array
then take the 1st array Value and put it in the Next to last arrayNum

I will try to explain.
I want to read a txt file

01.txt

file 'city_name_Z_Other_20150727121507_20150727121813_81443869.mp4'
file 'city_name_Z_Other_20150727121814_20150727122538_81453432.mp4'
file 'city_name_Z_Other_20150727122539_20150727122558_81472261.mp4'
# city_name_Z_Other_20150727121507_20150727122538_81453432.mp4
# city_name_Z_Other_20150727121814_20150727122558_81472261.mp4

I wish to read this 01.txt file and put all the time nums [ eg. 20150727121507 ] into an array
so this would be an array of

string array num
20150727121507 1
20150727121813 2
20150727121814 3
20150727122538 4
20150727122539 5
20150727122558 6
20150727121507 7
20150727122538 8
20150727121814 9
20150727122558 10 there could be much more that 10

and I wish to make a a change in the next to last array position replacing it always with the 1st
in this case I wish to switch

string array num
20150727121814 9
as
20150727121507 new 9

so the entire is
20150727121507 1 < this one
20150727121813 2
20150727121814 3
20150727122538 4
20150727122539 5
20150727122558 6
20150727121507 7
20150727122538 8
20150727121507 9 < now goes here
20150727122558 10
In this case the total num of array elements is 10 but this will vary
but I do always wish to put the 1st in the next to last occurrence.

and I wish to end up with a 02.txt file like so

file 'city_name_Z_Other_20150727121507_20150727121813_81443869.mp4'
file 'city_name_Z_Other_20150727121814_20150727122538_81453432.mp4'
file 'city_name_Z_Other_20150727122539_20150727122558_81472261.mp4'
# city_name_Z_Other_20150727121507_20150727122538_81453432.mp4
# city_name_Z_Other_20150727121507_20150727122558_81472261.mp4

I am thinking that the "_" might be a thing to split on
or "_2015nnnnnnnnnn_"
They always have the format of city_name_something_somethingElse__2015nnnnnnnnnn__2015nnnnnnnnnn_nnnn.mp4

so if the array was made from splitting on the "_"
it would be #5 replaces 3rd from last

Any assistance is appreciated. thanks much!

syg00 · 08-27-2015, 07:48 PM

Why build an array to keep all those values ?. You only need to keep the one value you are interested in.
Gets interesting determining when you have reached the last record. Awk has lots of flexibility - you can test for the first record, and it has a clause that is active only at end of input processing. The END{} would allow you to do the substitution, but you would have to defer the printing of each record to ensure you didn't get two of the last (before and after substitution). And yes, it will seamlessly split on the underscore.

That's how I'd do it.

zimbot · 08-28-2015, 10:16 AM

I suspect that you are ...right and wise.
I confess that I am unsure of how to proceed.
And many times these problems-solutions are dependent on "how one sees it"

given my ( maybe flawed ??

) way i see it....

I wonder if someone could assist me with just this :
how to
read a line*

file 'city_name_Z_Other_20150727121507_20150727121813_81443869.mp4'

split on underscore into an array of

1 file 'city
2 name
3 z
4 other
5 20150727121507
6 20150727121813
7 81443869.mp4'

so that I can then make a var called $time1 == that array element 5
aka 20150727121507

so that I can then make a var called $time2 == that array element 6
aka 20150727121813

Thanks
I been ready on how to read a file and put words in an array but how to parse on "_"
or "2015-10numbers" ...something like that....

*
and I think I can read a line at a time like this

read each line

01.sh

#!/bin/bash
# Set the field seperator to a newline
IFS="
"
# Loop through the file
for line in `cat cat2.txt`;do
echo "$line " >> text2.txt
echo "$line aaaa"
# so I can put Line entire in a var
# now i need to put line 'parts' into vars
done

Thanks for you time - i do appreciate it
This is one of the harder things I have tried to do - yippie

schneidz · 08-28-2015, 10:25 AM

not really sure what you are after but this is what i get:

Code:

[schneidz@hyper ~]$ cat zimbot.txt 
file 'city_name_Z_Other_20150727121507_20150727121813_81443869.mp4'
file 'city_name_Z_Other_20150727121814_20150727122538_81453432.mp4'
file 'city_name_Z_Other_20150727122539_20150727122558_81472261.mp4'
# city_name_Z_Other_20150727121507_20150727122538_81453432.mp4
# city_name_Z_Other_20150727121814_20150727122558_81472261.mp4
[schneidz@hyper ~]$ awk -F _ '{print $5}' zimbot.txt | sort | uniq -c | awk '{print $2 " " $1}'
20150727121507 2
20150727121814 2
20150727122539 1

edit:
scratch the above. i re-read it (still confused but understanding it more). i would hax thru something calling awk, cut, paste, head, tail, ...

edit:
if all you want to do is replace part of the n-1 line with part of the 0th line then this mite work:[untested]:

Code:

ts1=`awk -F _ 'NR==1 {print $5}' zimbot.txt`
line=`awk 'END {print NR}' zimbot.txt`
((line--))
awk -v line=$line -v ts1=$ts1 -F _ 'NR==line {print $1 "_" $2 "_" $3 "_" $4 "_" ts1 "_" $6 "_" $7}' zimbot.txt
# feeding it to cut mite be more elegant:
#linebegin=`awk -v line=$line 'NR==line {print $0}' zimbot.txt | cut -f -4 -d _`
#lineend=`awk -v line=$line 'NR==line {print $0}' zimbot.txt | cut -f 6- -d _`
#echo $linebegin\_$ts1\_$lineend

zimbot · 08-28-2015, 11:11 AM

I am looking at it a little diff

with awk
i see where
echo Guangzhou_Huimin_Z_Other_20150727121507_20150727121813_81443869.mp4 | awk -F '_' '{sub(/-.*$/, "", $3); print $5}'
returns
20150727121507 - ok that is timestamp1

echo Guangzhou_Huimin_Z_Other_20150727121507_20150727121813_81443869.mp4 | awk -F '_' '{sub(/-.*$/, "", $3); print $6}'
returns
20150727121813 - ok that is my timestamp2

how do i know who is "next to last?" with an array I get the array members tot number
if 8 , then next to last is 7.

hummm now with that .... reckon - i need to think some more

syg00 · 08-28-2015, 09:15 PM

That doesn't correlate with your initial post. In both the first and last records, $5 is what you need to be looking at. Lose your fixation on "next to last" - it's irrelevant. I also don't understand what the sub() function is there for. Have a look at this

Code:

awk 'BEGIN{FS="_"}{if (NR==1) sve=$5} ; END{print "record 1 fifth field (saved): ",sve; print "last record fifth field: ",$5 ; print "entire last record: ",$0} input.file'

You need to add the guts, but that shows the saved value being available in the END{} clause. Adding print for records will show the issue I mentioned above.

allend · 08-28-2015, 10:26 PM

Quote:

They always have the format of city_name_something_somethingElse__2015nnnnnnnnnn__2015nnnnnnnnnn_nnnn.mp4

Awk is definitely the optimal way to go, but this could do what you want, provided there is no empty line at the end of the file.

Code:

sed \$s/_2015.*_2015/_$(head -1 zimbot.txt | cut -d "_" -f5)_2015/ zimbot.txt

zimbot · 09-01-2015, 09:45 AM

I have attempted to employ some of the wisdom given

here is where I am currently

1st - i will attempt to communicate my goal

the goal
read the file lst_small.txt
these 5 lines

Guangzhou_Yanyun_L_Bathroom_10_12_67594718.mp4
Guangzhou_Yanyun_L_Bathroom_13_30_67606173.mp4
Guangzhou_Yanyun_L_Bathroom_40_42_67607152.mp4
Guangzhou_Yanyun_L_Bathroom_50_60_67607152.mp4
Guangzhou_Yanyun_L_Bathroom_61_88_67607152.mp4

find what i call the matchs . this is when the next IN is within range of the last OUT

I loop through a txt file ( that is the product of a dir ls )
I find via awk the IN number & out Number
for
Guangzhou_Yanyun_L_Bathroom_10_12_67594718.mp4
10 is the IN , 12 is the out
if the next file IN is eq or within 5 of the last file out
then that is a match
and I should produce a 1.txt that has

Guangzhou_Yanyun_L_Bathroom_10_12_67594718.mp4
Guangzhou_Yanyun_L_Bathroom_13_30_67606173.mp4

-

as I loop through the above list of files I should find 2 matches

these 2 should match and make a file 1.txt
Guangzhou_Yanyun_L_Bathroom_10_12_67594718.mp4
Guangzhou_Yanyun_L_Bathroom_13_30_67606173.mp4

because 13 is "within the range " meaning between to 11 and 17

I would also compare
Guangzhou_Yanyun_L_Bathroom_13_30_67606173.mp4
Guangzhou_Yanyun_L_Bathroom_40_42_67607152.mp4
No match 40 is not between 29 - 35 no action taken

Guangzhou_Yanyun_L_Bathroom_40_42_67607152.mp4
Guangzhou_Yanyun_L_Bathroom_50_60_67607152.mp4
no match no action taken 50 is not between 41 - 47

these 2 should match and make a file 2.txt
Guangzhou_Yanyun_L_Bathroom_50_60_67607152.mp4
Guangzhou_Yanyun_L_Bathroom_61_88_67607152.mp4

because 61 is within the range meaning between 59 - 66

in out of the 5 files
10 12

13 30 match yes- 13 is between 11 - 17
13 is compared to 11 (12-1) as mycompareIN_Low_1 and
13 is compared to 17 (12+5) as compareIN_High_5

40 42 40 will not match to 30

50 60 50 will not match to 42

61 88 match yes- 61 is between 59 - 66
61 is compared to 59 (60-1) as mycompareIN_Low_1 and
61 is compared to 65 (60+5) as compareIN_High_5

-----------------------------
when i run it
the echos are
this is the start 1
10
NO NO MATCH
------------
my2 newOut 12
my2 oldIN 10
2 2nd
13
my1 line Guangzhou_Yanyun_L_Bathroom_13_30_67606173.mp4
mycompareIN_Low_1 8
mycompareIN_High_5 15
my1 newOut 30
my1 oldIN 10
2
MATCH
40
NO NO MATCH
------------
my2 newOut 42
my2 oldIN 40
3 2nd
50
NO NO MATCH
------------
my2 newOut 60
my2 oldIN 50
4 2nd
61
NO NO MATCH
------------
my2 newOut 88
my2 oldIN 61
5 2nd
---------------------------

so it looks like i *AM* finding the matches or not but Not make in the outF.txt
as 1.txt & 2.txt
========================================

currently it only yeilds 1 file
2.txt
that contains but 1 line
Guangzhou_Yanyun_L_Bathroom_13_30_67606173.mp4

:::::::::::::::::::::::::::::::::::::::::::::::::

and here is my current script t5.sh ( yes , my 5th version )

#!/bin/bash
## t5.sh

## actual xample Guangzhou_Huimin_Z_Other_20150727121507_20150727121813_81443869.mp4
## line form City_FirstNam_LstInt_Room_TimeIn_Timeout_lastNm&extn
## awk p$1 _ p$2 _ p$3 _p$4 _p$5 _p$6 _p$6

############### this is the dir listing
#parseMe=list.txt
parseMe=lst_small.txt
OutPath=$HOME/22
outF=1
echo this is the start $outF
# Set the field seperator to a newline
IFS="
"
# Loop through the file
#for line in `cat list.txt`;do
for line in `cat $parseMe`;do

echo $line | awk -F '_' '{sub(/-.*$/, "", $3); print $5}'
###################### MovSize=$(ls -lah ${f%.*}.mov | awk '{ print $5}')
newIN=$(echo $line | awk -F '_' '{sub(/-.*$/, "", $3); print $5}')
newOut=$(echo $line | awk -F '_' '{sub(/-.*$/, "", $3); print $6}')

### test if if newOut match to oldIN or outF=001
compareIN_High_5=$[oldIN+5]
compareIN_Low_1=$[oldIN-2]

## if [[ newIN -gt compareIN_Low_1 && newIN -lt compareIN_High_5 ]] | [ "$outF" = 1 ]; then
if [[ newIN -gt compareIN_Low_1 && newIN -lt compareIN_High_5 ]]; then

echo my1 line $line
echo mycompareIN_Low_1 $compareIN_Low_1
echo mycompareIN_High_5 $compareIN_High_5
echo my1 newOut $newOut
echo my1 oldIN $oldIN
echo $outF
oldIN=$newIN
oldOut=$newOut
echo MATCH
echo $line >> $OutPath/$outF.txt

else
## outF=$[outF+1]
echo NO NO MATCH
outF=$((outF+1))
oldIN=$newIN
oldOut=$newOut
echo ------------
echo my2 newOut $newOut
echo my2 oldIN $oldIN
echo $outF 2nd
fi
done

::::::::::::::::::::::: end of t5.sh

and as you may have noticed I hoped to start with a 1.txt ( currently it only does 2.txt )
with
## if [[ newIN -gt compareIN_Low_1 && newIN -lt compareIN_High_5 ]] | [ "$outF" = 1 ];
i hoped if I had an or that the $outF had not incremented it would populate a 1.txt
but that only results in all lines in a 1.txt

Thank you for your time and wisdom.

schneidz · 09-01-2015, 10:38 AM

maybe c would be better ?

heres my stab at it:

Code:

[schneidz@hyper ~]$ cat zimbot.ksh 
#!/bin/bash

l1=1
lx=`awk "END {print NR}" zimbot.txt`
file_num=1

while [ $l1 -lt $lx ]
do
 l2=`expr $l1 + 1`
 z1=`awk -F _ -v line=$l1 'NR==line {print $6}' zimbot.txt`
 z2=`awk -F _ -v line=$l2 'NR==line {print $5}' zimbot.txt`
 zx=`expr $z1 - $z2`
 if [ $zx -ge -5 ] && [ $zx -le 5 ]
 then
  sed -n "$l1,$l2"p zimbot.txt > $file_num.txt
  ((file_num++))
 fi
 ((l1++))
done

zimbot · 09-01-2015, 01:28 PM

I have just spent the last several minutes googling and trying to see *how* that works.
I sort of understand ...but ..not really , not yet
so few lines
so elegant
I am sort of amazed.
You obviously have some skill - thank you

there 1 thing ...ideally we would find multiple line matches - more than 2 lines
these matches from the input text

if the below were zimbot.txt

Guangzhou_Yanyun_L_Bathroom_10_12_67594718.mp4
Guangzhou_Yanyun_L_Bathroom_13_30_67606173.mp4 < match 2 lines

Guangzhou_Yanyun_L_Bathroom_40_42_67607152.mp4 < no match

Guangzhou_Yanyun_L_Bathroom_50_60_67607152.mp4
Guangzhou_Yanyun_L_Bathroom_61_88_67607152.mp4
Guangzhou_Yanyun_L_Bathroom_89_90_67607152.mp4
Guangzhou_Yanyun_L_Bathroom_92_98_67607152.mp4 < a match of 4

Guangzhou_Yanyun_L_Bathroom_110_112_67607152.mp4 < no match
Guangzhou_Yanyun_L_Bathroom_120_130_67607152.mp4 < no match

Guangzhou_Yanyun_L_Bathroom_131_140_67607152.mp4
Guangzhou_Yanyun_L_Bathroom_141_150_67607152.mp4 < match 2 lines

Guangzhou_Yanyun_L_Bathroom_160_166_67607152.mp4
Guangzhou_Yanyun_L_Bathroom_167_168_67607152.mp4
Guangzhou_Yanyun_L_Bathroom_170_200_67607152.mp4 < a match of 3

so I have attempted to adapt your

#!/bin/bash

l1=1
lx=`awk "END {print NR}" list.txt` ## if the above 14 file.mp4 list were list.txt
file_num=1

while [ $l1 -lt $lx ]
do
l2=`expr $l1 + 1`
z1=`awk -F _ -v line=$l1 'NR==line {print $6}' list.txt`
z2=`awk -F _ -v line=$l2 'NR==line {print $5}' list.txt`
zx=`expr $z1 - $z2`
if [[ $zx -ge -5 ]] && [[ $zx -le 5 ]]
then
sed -n "$l1,$l2"p list.txt > $file_num.txt
((file_num++))
fi
((l1++))
done

I have been trying to adapt the above ( your good script ) to find these "multi" matches
but have not been successful

maybe it would be better in c ( or d or e

it is sooooo close

zimbot · 09-01-2015, 04:43 PM

even if there were repeated lines
I could "cleanse" it by running it through uniq with no options.
{ i actually learned about uniq from this threaded adventure )

and even if I had txt files with 1 line -- that is not a match... i could sift thos eout because they would be smaller than txt files with 2 or more lines

syg00 · 09-01-2015, 07:19 PM

Seeing as you are using awk, why not use it to do it all ?

Code:

 awk '{if (NR==1) {sve=$0 ; out=$6 ; next}  ; if (($5-out)<5) {print (printed ? "" : sve"\n") $0 ; printed++} else {printed = ""} ; sve=$0 ; out=$6}' FS="_" input.file

Assumes numbers are in sequence and always increasing, and no embedded blank lines.

zimbot · 09-05-2015, 01:39 PM

I thank all for the wisdom and advice.
I am going to call it solved.
even though ... the full challenge ... of my goal still eludes me.
it has proven to be a hard one.
but I have learned much

thank again