LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 08-27-2015, 05:12 PM   #1
zimbot
Member
 
Registered: Nov 2005
Location: cincinnati , ohio . USA
Distribution: ubuntu , Opensuse , CentOS
Posts: 179

Rep: Reputation: 17
read txt file into an array and make a second txt file


Friends
I am struggling with a multi step process and am stuck on the following step.

I wish to take a text file and make a second (changed ) txt file
there is a pattern of "_2015*10 diff numbers*_"
that i *think I wish to build into an array
then take the 1st array Value and put it in the Next to last arrayNum

I will try to explain.
I want to read a txt file

01.txt

file 'city_name_Z_Other_20150727121507_20150727121813_81443869.mp4'
file 'city_name_Z_Other_20150727121814_20150727122538_81453432.mp4'
file 'city_name_Z_Other_20150727122539_20150727122558_81472261.mp4'
# city_name_Z_Other_20150727121507_20150727122538_81453432.mp4
# city_name_Z_Other_20150727121814_20150727122558_81472261.mp4

I wish to read this 01.txt file and put all the time nums [ eg. 20150727121507 ] into an array
so this would be an array of

string array num
20150727121507 1
20150727121813 2
20150727121814 3
20150727122538 4
20150727122539 5
20150727122558 6
20150727121507 7
20150727122538 8
20150727121814 9
20150727122558 10 there could be much more that 10

and I wish to make a a change in the next to last array position replacing it always with the 1st
in this case I wish to switch

string array num
20150727121814 9
as
20150727121507 new 9

so the entire is
20150727121507 1 < this one
20150727121813 2
20150727121814 3
20150727122538 4
20150727122539 5
20150727122558 6
20150727121507 7
20150727122538 8
20150727121507 9 < now goes here
20150727122558 10
In this case the total num of array elements is 10 but this will vary
but I do always wish to put the 1st in the next to last occurrence.

and I wish to end up with a 02.txt file like so

file 'city_name_Z_Other_20150727121507_20150727121813_81443869.mp4'
file 'city_name_Z_Other_20150727121814_20150727122538_81453432.mp4'
file 'city_name_Z_Other_20150727122539_20150727122558_81472261.mp4'
# city_name_Z_Other_20150727121507_20150727122538_81453432.mp4
# city_name_Z_Other_20150727121507_20150727122558_81472261.mp4


I am thinking that the "_" might be a thing to split on
or "_2015nnnnnnnnnn_"
They always have the format of city_name_something_somethingElse__2015nnnnnnnnnn__2015nnnnnnnnnn_nnnn.mp4

so if the array was made from splitting on the "_"
it would be #5 replaces 3rd from last

Any assistance is appreciated. thanks much!
 
Old 08-27-2015, 07:48 PM   #2
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,153

Rep: Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125
Why build an array to keep all those values ?. You only need to keep the one value you are interested in.
Gets interesting determining when you have reached the last record. Awk has lots of flexibility - you can test for the first record, and it has a clause that is active only at end of input processing. The END{} would allow you to do the substitution, but you would have to defer the printing of each record to ensure you didn't get two of the last (before and after substitution). And yes, it will seamlessly split on the underscore.

That's how I'd do it.
 
1 members found this post helpful.
Old 08-28-2015, 10:16 AM   #3
zimbot
Member
 
Registered: Nov 2005
Location: cincinnati , ohio . USA
Distribution: ubuntu , Opensuse , CentOS
Posts: 179

Original Poster
Rep: Reputation: 17
I suspect that you are ...right and wise.
I confess that I am unsure of how to proceed.
And many times these problems-solutions are dependent on "how one sees it"

given my ( maybe flawed ?? ) way i see it....

I wonder if someone could assist me with just this :
how to
read a line*

file 'city_name_Z_Other_20150727121507_20150727121813_81443869.mp4'

split on underscore into an array of

1 file 'city
2 name
3 z
4 other
5 20150727121507
6 20150727121813
7 81443869.mp4'

so that I can then make a var called $time1 == that array element 5
aka 20150727121507

so that I can then make a var called $time2 == that array element 6
aka 20150727121813


Thanks
I been ready on how to read a file and put words in an array but how to parse on "_"
or "2015-10numbers" ...something like that....

*
and I think I can read a line at a time like this

read each line

01.sh

#!/bin/bash
# Set the field seperator to a newline
IFS="
"
# Loop through the file
for line in `cat cat2.txt`;do
echo "$line " >> text2.txt
echo "$line aaaa"
# so I can put Line entire in a var
# now i need to put line 'parts' into vars
done


Thanks for you time - i do appreciate it
This is one of the harder things I have tried to do - yippie
 
Old 08-28-2015, 10:25 AM   #4
schneidz
LQ Guru
 
Registered: May 2005
Location: boston, usa
Distribution: fedora-35
Posts: 5,313

Rep: Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918
not really sure what you are after but this is what i get:
Code:
[schneidz@hyper ~]$ cat zimbot.txt 
file 'city_name_Z_Other_20150727121507_20150727121813_81443869.mp4'
file 'city_name_Z_Other_20150727121814_20150727122538_81453432.mp4'
file 'city_name_Z_Other_20150727122539_20150727122558_81472261.mp4'
# city_name_Z_Other_20150727121507_20150727122538_81453432.mp4
# city_name_Z_Other_20150727121814_20150727122558_81472261.mp4
[schneidz@hyper ~]$ awk -F _ '{print $5}' zimbot.txt | sort | uniq -c | awk '{print $2 " " $1}'
20150727121507 2
20150727121814 2
20150727122539 1
edit:
scratch the above. i re-read it (still confused but understanding it more). i would hax thru something calling awk, cut, paste, head, tail, ...

edit:
if all you want to do is replace part of the n-1 line with part of the 0th line then this mite work:[untested]:
Code:
ts1=`awk -F _ 'NR==1 {print $5}' zimbot.txt`
line=`awk 'END {print NR}' zimbot.txt`
((line--))
awk -v line=$line -v ts1=$ts1 -F _ 'NR==line {print $1 "_" $2 "_" $3 "_" $4 "_" ts1 "_" $6 "_" $7}' zimbot.txt
# feeding it to cut mite be more elegant:
#linebegin=`awk -v line=$line 'NR==line {print $0}' zimbot.txt | cut -f -4 -d _`
#lineend=`awk -v line=$line 'NR==line {print $0}' zimbot.txt | cut -f 6- -d _`
#echo $linebegin\_$ts1\_$lineend

Last edited by schneidz; 08-28-2015 at 10:54 AM.
 
1 members found this post helpful.
Old 08-28-2015, 11:11 AM   #5
zimbot
Member
 
Registered: Nov 2005
Location: cincinnati , ohio . USA
Distribution: ubuntu , Opensuse , CentOS
Posts: 179

Original Poster
Rep: Reputation: 17
the suggestion of awk

I am looking at it a little diff

with awk
i see where
echo Guangzhou_Huimin_Z_Other_20150727121507_20150727121813_81443869.mp4 | awk -F '_' '{sub(/-.*$/, "", $3); print $5}'
returns
20150727121507 - ok that is timestamp1

echo Guangzhou_Huimin_Z_Other_20150727121507_20150727121813_81443869.mp4 | awk -F '_' '{sub(/-.*$/, "", $3); print $6}'
returns
20150727121813 - ok that is my timestamp2

how do i know who is "next to last?" with an array I get the array members tot number
if 8 , then next to last is 7.


hummm now with that .... reckon - i need to think some more
 
Old 08-28-2015, 09:15 PM   #6
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,153

Rep: Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125
That doesn't correlate with your initial post. In both the first and last records, $5 is what you need to be looking at. Lose your fixation on "next to last" - it's irrelevant. I also don't understand what the sub() function is there for. Have a look at this
Code:
awk 'BEGIN{FS="_"}{if (NR==1) sve=$5} ; END{print "record 1 fifth field (saved): ",sve; print "last record fifth field: ",$5 ; print "entire last record: ",$0} input.file'
You need to add the guts, but that shows the saved value being available in the END{} clause. Adding print for records will show the issue I mentioned above.
 
Old 08-28-2015, 10:26 PM   #7
allend
LQ 5k Club
 
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware64-15.0
Posts: 6,383

Rep: Reputation: 2762Reputation: 2762Reputation: 2762Reputation: 2762Reputation: 2762Reputation: 2762Reputation: 2762Reputation: 2762Reputation: 2762Reputation: 2762Reputation: 2762
Quote:
They always have the format of city_name_something_somethingElse__2015nnnnnnnnnn__2015nnnnnnnnnn_nnnn.mp4
Awk is definitely the optimal way to go, but this could do what you want, provided there is no empty line at the end of the file.
Code:
sed \$s/_2015.*_2015/_$(head -1 zimbot.txt | cut -d "_" -f5)_2015/ zimbot.txt
 
Old 09-01-2015, 09:45 AM   #8
zimbot
Member
 
Registered: Nov 2005
Location: cincinnati , ohio . USA
Distribution: ubuntu , Opensuse , CentOS
Posts: 179

Original Poster
Rep: Reputation: 17
here is where I am currently

I have attempted to employ some of the wisdom given

here is where I am currently

1st - i will attempt to communicate my goal


the goal
read the file lst_small.txt
these 5 lines

Guangzhou_Yanyun_L_Bathroom_10_12_67594718.mp4
Guangzhou_Yanyun_L_Bathroom_13_30_67606173.mp4
Guangzhou_Yanyun_L_Bathroom_40_42_67607152.mp4
Guangzhou_Yanyun_L_Bathroom_50_60_67607152.mp4
Guangzhou_Yanyun_L_Bathroom_61_88_67607152.mp4

find what i call the matchs . this is when the next IN is within range of the last OUT



I loop through a txt file ( that is the product of a dir ls )
I find via awk the IN number & out Number
for
Guangzhou_Yanyun_L_Bathroom_10_12_67594718.mp4
10 is the IN , 12 is the out
if the next file IN is eq or within 5 of the last file out
then that is a match
and I should produce a 1.txt that has

Guangzhou_Yanyun_L_Bathroom_10_12_67594718.mp4
Guangzhou_Yanyun_L_Bathroom_13_30_67606173.mp4

-

as I loop through the above list of files I should find 2 matches

these 2 should match and make a file 1.txt
Guangzhou_Yanyun_L_Bathroom_10_12_67594718.mp4
Guangzhou_Yanyun_L_Bathroom_13_30_67606173.mp4

because 13 is "within the range " meaning between to 11 and 17

I would also compare
Guangzhou_Yanyun_L_Bathroom_13_30_67606173.mp4
Guangzhou_Yanyun_L_Bathroom_40_42_67607152.mp4
No match 40 is not between 29 - 35 no action taken

Guangzhou_Yanyun_L_Bathroom_40_42_67607152.mp4
Guangzhou_Yanyun_L_Bathroom_50_60_67607152.mp4
no match no action taken 50 is not between 41 - 47

these 2 should match and make a file 2.txt
Guangzhou_Yanyun_L_Bathroom_50_60_67607152.mp4
Guangzhou_Yanyun_L_Bathroom_61_88_67607152.mp4

because 61 is within the range meaning between 59 - 66

in out of the 5 files
10 12

13 30 match yes- 13 is between 11 - 17
13 is compared to 11 (12-1) as mycompareIN_Low_1 and
13 is compared to 17 (12+5) as compareIN_High_5


40 42 40 will not match to 30

50 60 50 will not match to 42

61 88 match yes- 61 is between 59 - 66
61 is compared to 59 (60-1) as mycompareIN_Low_1 and
61 is compared to 65 (60+5) as compareIN_High_5

-----------------------------
when i run it
the echos are
this is the start 1
10
NO NO MATCH
------------
my2 newOut 12
my2 oldIN 10
2 2nd
13
my1 line Guangzhou_Yanyun_L_Bathroom_13_30_67606173.mp4
mycompareIN_Low_1 8
mycompareIN_High_5 15
my1 newOut 30
my1 oldIN 10
2
MATCH
40
NO NO MATCH
------------
my2 newOut 42
my2 oldIN 40
3 2nd
50
NO NO MATCH
------------
my2 newOut 60
my2 oldIN 50
4 2nd
61
NO NO MATCH
------------
my2 newOut 88
my2 oldIN 61
5 2nd
---------------------------

so it looks like i *AM* finding the matches or not but Not make in the outF.txt
as 1.txt & 2.txt
========================================

currently it only yeilds 1 file
2.txt
that contains but 1 line
Guangzhou_Yanyun_L_Bathroom_13_30_67606173.mp4


:::::::::::::::::::::::::::::::::::::::::::::::::

and here is my current script t5.sh ( yes , my 5th version )


#!/bin/bash
## t5.sh

## actual xample Guangzhou_Huimin_Z_Other_20150727121507_20150727121813_81443869.mp4
## line form City_FirstNam_LstInt_Room_TimeIn_Timeout_lastNm&extn
## awk p$1 _ p$2 _ p$3 _p$4 _p$5 _p$6 _p$6

############### this is the dir listing
#parseMe=list.txt
parseMe=lst_small.txt
OutPath=$HOME/22
outF=1
echo this is the start $outF
# Set the field seperator to a newline
IFS="
"
# Loop through the file
#for line in `cat list.txt`;do
for line in `cat $parseMe`;do

echo $line | awk -F '_' '{sub(/-.*$/, "", $3); print $5}'
###################### MovSize=$(ls -lah ${f%.*}.mov | awk '{ print $5}')
newIN=$(echo $line | awk -F '_' '{sub(/-.*$/, "", $3); print $5}')
newOut=$(echo $line | awk -F '_' '{sub(/-.*$/, "", $3); print $6}')

### test if if newOut match to oldIN or outF=001
compareIN_High_5=$[oldIN+5]
compareIN_Low_1=$[oldIN-2]

## if [[ newIN -gt compareIN_Low_1 && newIN -lt compareIN_High_5 ]] | [ "$outF" = 1 ]; then
if [[ newIN -gt compareIN_Low_1 && newIN -lt compareIN_High_5 ]]; then

echo my1 line $line
echo mycompareIN_Low_1 $compareIN_Low_1
echo mycompareIN_High_5 $compareIN_High_5
echo my1 newOut $newOut
echo my1 oldIN $oldIN
echo $outF
oldIN=$newIN
oldOut=$newOut
echo MATCH
echo $line >> $OutPath/$outF.txt

else
## outF=$[outF+1]
echo NO NO MATCH
outF=$((outF+1))
oldIN=$newIN
oldOut=$newOut
echo ------------
echo my2 newOut $newOut
echo my2 oldIN $oldIN
echo $outF 2nd
fi
done

::::::::::::::::::::::: end of t5.sh

and as you may have noticed I hoped to start with a 1.txt ( currently it only does 2.txt )
with
## if [[ newIN -gt compareIN_Low_1 && newIN -lt compareIN_High_5 ]] | [ "$outF" = 1 ];
i hoped if I had an or that the $outF had not incremented it would populate a 1.txt
but that only results in all lines in a 1.txt

Thank you for your time and wisdom.
 
Old 09-01-2015, 10:38 AM   #9
schneidz
LQ Guru
 
Registered: May 2005
Location: boston, usa
Distribution: fedora-35
Posts: 5,313

Rep: Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918
maybe c would be better ?

heres my stab at it:
Code:
[schneidz@hyper ~]$ cat zimbot.ksh 
#!/bin/bash

l1=1
lx=`awk "END {print NR}" zimbot.txt`
file_num=1

while [ $l1 -lt $lx ]
do
 l2=`expr $l1 + 1`
 z1=`awk -F _ -v line=$l1 'NR==line {print $6}' zimbot.txt`
 z2=`awk -F _ -v line=$l2 'NR==line {print $5}' zimbot.txt`
 zx=`expr $z1 - $z2`
 if [ $zx -ge -5 ] && [ $zx -le 5 ]
 then
  sed -n "$l1,$l2"p zimbot.txt > $file_num.txt
  ((file_num++))
 fi
 ((l1++))
done
 
1 members found this post helpful.
Old 09-01-2015, 01:28 PM   #10
zimbot
Member
 
Registered: Nov 2005
Location: cincinnati , ohio . USA
Distribution: ubuntu , Opensuse , CentOS
Posts: 179

Original Poster
Rep: Reputation: 17
I am more than humbled

I have just spent the last several minutes googling and trying to see *how* that works.
I sort of understand ...but ..not really , not yet
so few lines
so elegant
I am sort of amazed.
You obviously have some skill - thank you

there 1 thing ...ideally we would find multiple line matches - more than 2 lines
these matches from the input text

if the below were zimbot.txt

Guangzhou_Yanyun_L_Bathroom_10_12_67594718.mp4
Guangzhou_Yanyun_L_Bathroom_13_30_67606173.mp4 < match 2 lines

Guangzhou_Yanyun_L_Bathroom_40_42_67607152.mp4 < no match

Guangzhou_Yanyun_L_Bathroom_50_60_67607152.mp4
Guangzhou_Yanyun_L_Bathroom_61_88_67607152.mp4
Guangzhou_Yanyun_L_Bathroom_89_90_67607152.mp4
Guangzhou_Yanyun_L_Bathroom_92_98_67607152.mp4 < a match of 4

Guangzhou_Yanyun_L_Bathroom_110_112_67607152.mp4 < no match
Guangzhou_Yanyun_L_Bathroom_120_130_67607152.mp4 < no match

Guangzhou_Yanyun_L_Bathroom_131_140_67607152.mp4
Guangzhou_Yanyun_L_Bathroom_141_150_67607152.mp4 < match 2 lines

Guangzhou_Yanyun_L_Bathroom_160_166_67607152.mp4
Guangzhou_Yanyun_L_Bathroom_167_168_67607152.mp4
Guangzhou_Yanyun_L_Bathroom_170_200_67607152.mp4 < a match of 3

so I have attempted to adapt your

#!/bin/bash

l1=1
lx=`awk "END {print NR}" list.txt` ## if the above 14 file.mp4 list were list.txt
file_num=1

while [ $l1 -lt $lx ]
do
l2=`expr $l1 + 1`
z1=`awk -F _ -v line=$l1 'NR==line {print $6}' list.txt`
z2=`awk -F _ -v line=$l2 'NR==line {print $5}' list.txt`
zx=`expr $z1 - $z2`
if [[ $zx -ge -5 ]] && [[ $zx -le 5 ]]
then
sed -n "$l1,$l2"p list.txt > $file_num.txt
((file_num++))
fi
((l1++))
done

I have been trying to adapt the above ( your good script ) to find these "multi" matches
but have not been successful

maybe it would be better in c ( or d or e

it is sooooo close
 
Old 09-01-2015, 04:43 PM   #11
zimbot
Member
 
Registered: Nov 2005
Location: cincinnati , ohio . USA
Distribution: ubuntu , Opensuse , CentOS
Posts: 179

Original Poster
Rep: Reputation: 17
even if there were repeated lines
I could "cleanse" it by running it through uniq with no options.
{ i actually learned about uniq from this threaded adventure )

and even if I had txt files with 1 line -- that is not a match... i could sift thos eout because they would be smaller than txt files with 2 or more lines
 
Old 09-01-2015, 07:19 PM   #12
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,153

Rep: Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125
Seeing as you are using awk, why not use it to do it all ?
Code:
 awk '{if (NR==1) {sve=$0 ; out=$6 ; next}  ; if (($5-out)<5) {print (printed ? "" : sve"\n") $0 ; printed++} else {printed = ""} ; sve=$0 ; out=$6}' FS="_" input.file
Assumes numbers are in sequence and always increasing, and no embedded blank lines.

Last edited by syg00; 09-01-2015 at 07:22 PM. Reason: removed unneeded variable
 
2 members found this post helpful.
Old 09-05-2015, 01:39 PM   #13
zimbot
Member
 
Registered: Nov 2005
Location: cincinnati , ohio . USA
Distribution: ubuntu , Opensuse , CentOS
Posts: 179

Original Poster
Rep: Reputation: 17
I thank all for the wisdom and advice.
I am going to call it solved.
even though ... the full challenge ... of my goal still eludes me.
it has proven to be a hard one.
but I have learned much

thank again
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
cut first 10 lines of file master.txt and paste in ab1.txt and so on yogeshkumkar Programming 4 08-31-2011 07:23 AM
Copy the contents of a txt file to other txt files (with similar names) by cp command Aquarius_Girl Linux - Newbie 7 07-03-2010 12:54 AM
Read a txt file in C xeon123 Programming 3 10-21-2007 11:14 AM
How can read from file.txt C++ where can save this file(file.txt) to start reading sam_22 Programming 1 01-11-2007 05:11 PM
Reading from a txt file into a two dimension array in C kponenation Programming 3 11-26-2005 07:04 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 12:02 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration