LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > SUSE / openSUSE
User Name
Password
SUSE / openSUSE This Forum is for the discussion of Suse Linux.

Notices


Reply
  Search this Thread
Old 08-02-2015, 05:39 PM   #1
wroom
Member
 
Registered: Dec 2009
Location: Sweden
Posts: 159

Rep: Reputation: 31
Question Please help! OpenSUSE 13.2 mount silently fails


Have been running OpenSUSE 13.1 for a while now, and everything is working fine.

Now i have set up two new computers with OpenSUSE 13.2 and have setup one of them with a hardware raid megaraid-sas (DELL H330), and the other with six SATA set up as btrfs raid0.


1)
The one with the hw raid works well except a peculiar failure. The mounts silently vanish after an unspecified time. Some minutes to hours.

Not a single trace in any logs of what happened. The mount just gets unmounted. Silently, except for the inevitable havoc that comes from a filesystem being dismounted in the midst of working with the data on it.

It does not seem to depend on what filesystem is mounted. ext4, btrfs, reiserfs, even ntfs-3g.

I found a workaround. But don't know why it works. More of that later on.


2)
The other computer was first setup as follows:

/dev/sda1 2GiB ext3 grub2 boot partition mounted as /boot
/dev/sd[bcdef]1 as five 2 GiB swap partitions

/dev/sd[abcdef]2 as ext4 raid10-f2 on md127 mounted as /

/dev/sd[abcdef]3 as ext4 raid0 on md0 mounted as /share and then exported as nfs

This far it worked like a charm.

After some considering, i decided to use zfs instead of a raid0 ext4 on /dev/sd[abcdef]3, (with "mount point" /zvol0 instead of /share). This because i need to handle lots of data with both integrity checks and also wanting to exploit dedup, since i am to merge three really large file trees which where mostly similar in it's content.

I scrapped that idea after realizing that a dedup zfs on OpenSUSE will "incinerate" 32G memory and grind the computer down to a complete stall, simply because i deleted one small file from the pool. It didn't complain when i shuffled in a 4 TiB data tree before that.
Performance of the OpenSUSE implementation of zfs wasn't as good as i had expected anyway. Putting files on the OpenSUSE dedup zfs pool is rather fast. Reading the data is sluggish but usable. Using the command 'rm' is similar to "harakiri". Ritual suicide.


So i thought, why not try using btrfs? Manual dedup is said to be working now on 13.2 so i made a btrfs -d raid0 of the partitions /dev/sd[abcdef]3 and wrote up /etc/fstab and /etc/exports and then did a mount of this to /share .

No go.
No errors from the mount command. Not even in the syslog. But it wasn't mounted.


Some sort of a solution, or workaround:
After tinkering a bit, i realized that i could try mounting whatever filesystem i wanted on /share but any attempts would silently be ignored.

But if i create a new mountpoint directory, say "/mountfocker", "/bielsebob" or any other nice directory name, and then mounted my new filesystem there - It works.

But not on "/share". By some unknown reason i am no longer allowed to mount anything on that directory.

The same solution works for both of the bugs. Both for the mounts silently disappearing on the first computer, and also for the inability to mount at all on the other computer.

I haven't seen this bug on any other distro/version than OpenSUSE 13.2.
For OpenSUSE 13.1 the same works like a charm. (Well, except for the "df bug" introduced in OpenSUSE 13.1 where df did not want to list ALL the mounts by some strange reason).

This is rather clean installs, and i am very experienced with Linux.

Having no logged error to work on, i am having difficulties to finding the cause of this peculiar mount bug.

The only thing i have figured out, is that after using a directory as a mount point with great success for a while, some "mysterion in the system" decides that nothing should ever more be able to mount at that mount point any more. Not leaving the slightest clue to why.
And: No, i havent activated any automounter of any kind.

Have anyone except me seen this bug? Anyone know what's going on?
 
Old 08-02-2015, 09:46 PM   #2
ferrari
LQ Guru
 
Registered: Sep 2003
Location: Auckland, NZ
Distribution: openSUSE Leap
Posts: 5,842

Rep: Reputation: 1148Reputation: 1148Reputation: 1148Reputation: 1148Reputation: 1148Reputation: 1148Reputation: 1148Reputation: 1148Reputation: 1148
Interesting - I haven't observed or read of this behaviour previously. Have you raised a bug report for this issue yet?
 
Old 08-03-2015, 02:52 AM   #3
wroom
Member
 
Registered: Dec 2009
Location: Sweden
Posts: 159

Original Poster
Rep: Reputation: 31
No, i havent made any bug reports yet. Problem is that i really don't know if the bug is specific to OpenSUSE 13.2, or other distros, or if it is specific to the kernel...

But i have used OpenSUSE 13.2 before, without this bug appearing. So i suspect it can be in one of the recent month of updates.

Output from uname -srvio for both computers is:
Code:
Linux 3.16.7-21-desktop #1 SMP PREEMPT Tue Apr 14 07:11:37 UTC 2015 (93c1539) x86_64 GNU/Linux
One of the OpenSUSE 13.1 installs that do not have this bug shows the following:
Code:
Linux 3.11.10-29-desktop #1 SMP PREEMPT Thu Mar 5 16:24:00 UTC 2015 (338c513) x86_64 GNU/Linux
If the bug is in the kernel, then we have a window between the working version 3.11.10-29 and the version 3.16.7-21 on the computers showing the bug.

And mount -V shows the following for the bug affected computers:
Code:
mount from util-linux 2.25.1 (libmount 2.25.0: selinux, assert, debug)
And on the unaffected OpenSUSE 13.1 computer it shows:
Code:
mount from util-linux 2.23.2 (libmount 2.23.0: selinux, debug, assert)
The cause for the bug appearing can be anywhere in the system.
But it seem to be limited to the above mentioned versions OpenSUSE 13.2 updated this week, Linux kernel version 3.16.7-21-desktop and mount version 2.25.1 .

Google shows many similar issues. But i haven't found anything that is spot on the same bug.
If someone sees this, that also have problem with mounts silently disappearing or the mount command silently failing to mount but that will mount the same filesystem on another, newly created mount directory - Then at least we have something to go on.
 
Old 08-03-2015, 03:53 AM   #4
ferrari
LQ Guru
 
Registered: Sep 2003
Location: Auckland, NZ
Distribution: openSUSE Leap
Posts: 5,842

Rep: Reputation: 1148Reputation: 1148Reputation: 1148Reputation: 1148Reputation: 1148Reputation: 1148Reputation: 1148Reputation: 1148Reputation: 1148
I had trouble following all the details concerning the mounts. I prefer commands and output to long verbal descriptions). Are you sure that you don't have failing storage hardware occurring with the first machine you speak of? That can result in strange issues like this occurring.
 
Old 08-03-2015, 04:20 AM   #5
wroom
Member
 
Registered: Dec 2009
Location: Sweden
Posts: 159

Original Poster
Rep: Reputation: 31
Quote:
Originally Posted by ferrari View Post
I had trouble following all the details concerning the mounts. I prefer commands and output to long verbal descriptions).
Point taken.
But since we don't know exactly where the problem is, or if it affects anybody else, then i wanted to hear if anybody else had a similar experience.
With more input to the discussion we will be able to pinpoint where the bug is.
In the end it will (hopefully) come to a detailed description of the bug.

Quote:
Originally Posted by ferrari View Post
Are you sure that you don't have failing storage hardware occurring with the first machine you speak of? That can result in strange issues like this occurring.
Thank you for mentioning that.
The unmount issues happened at the same time that i had willingly installed two drives that i know have flaws that smart hasn't picked up yet.

I set them up as raid0, raid1 and as jbods, and then hammered them with data until somethings happens.
Reason? I wanted to test how the fairly new DELL H330 controller with the fairly new megaraid_sas driver handles drive errors.

That test was a success. The H330 controller isolates the fault. In raid1 it fails first one of the drives from the array, and then it sets the second disk to readonly when it fails. In raid0 it sets the whole array to readonly when one drive fails.

That test was a success for the DELL H330 controller so far. But at the same time i was having the silent unmount issues on the other "vdevs"/arrays on the same controller that had good disks in them. The other arrays got silently umounted. But they where not failed/isolated in the controller. Just unmounted. And i didn't see any synchronicity in it.

The "test disks" are still attached to the controller as jbod's, but i dont access them for now.
Maybe i can trigger umounts by accessing those disks again? I must test that.

Could be that is one bug, and the case for the second computer refusing to mount being a different bug?
 
Old 10-19-2015, 01:06 PM   #6
wroom
Member
 
Registered: Dec 2009
Location: Sweden
Posts: 159

Original Poster
Rep: Reputation: 31
It has happened again. More info on this might make us find the bug.

This time it is a HP Proliant DL140 G2 that i am setting up as a backup node for managing a HP StorageWorks tape library.

Installed openSUSE 13.2 and made all the updates.

"uname -srvio" gives the following info:
Code:
Linux 3.16.7-24-desktop #1 SMP PREEMPT Mon Aug 3 14:37:06 UTC 2015 (ec183cc) x86_64 GNU/Linux
Disk setup is two WD Caviar Blue SATA disks of 640GB each.
/dev/sda1 = 4 GiB swap
/dev/sdb1 = 4 GiB swap
/dev/sda2,/dev/sdb2 = soft raid 1 md127, 48 GiB ext4 mounted as "/", (root fs).
/dev/sda3,/dev/sdb3 = "mkfs.btrfs -m raid1 -d raid0 /dev/sda3 /dev/sdb3"; A btrfs stripe mounted as "/share".
Share is then exported as NFS.
So far, so good.

Found out the disks, which should clock out at some 110 to 120 MB/s read rate only managed to get around 100 MB/s reads. The idea is that the tape should be able to use about 80 MB/s and NFS to get somewhere between 80 to 110 MB/s from the stripe disk. But it doesn't. The stripe is limited to some 100 MB/s bandwidth.

So i tried setting the "/share" up with software raid0 and an ext4 fs on that. But it performs similar, if not slightly worse than using a btrfs stripe.

I then found the issue being the "Intel 82801EB (ICH5) SATA Controller" in the south bridge having a serious bottleneck.
To get enough disk speed i will need to add a PCI-X SATA controller.

So i decided to stop and remove the soft raid, and setting up the btrfs stripe again. And thenafter adding a SATA controller in a PCI-X slot.

But the btrfs filesystem did not mount. No errors from "mount", nothing in the /var/log/messages except the:
Code:
2015-10-19T16:57:36.959036+02:00 dynXXX kernel: [20362.515178] BTRFS info (device sdb3): disk space caching is enabled
2015-10-19T16:57:36.959086+02:00 dynXXX kernel: [20362.515190] BTRFS: has skinny extents
Mount silently failed.

Changing mount point from "/share" to "/mnt/share" and that mounts.
Changing back to "/share" and mount fails without any errors.

Please note that i have changed from btrfs RAID0 to software RAID0 on partitions without rebooting.
(A running test on the tape library will finish in about some hours, and i don't want to interrupt that test).


Command history is something like this:

# Setting up partitions /dev/sd[ab]3 with type 83 using fdisk.

# Running partprobe on both disks. (As if fdisk would not do the job).
partprobe /dev/sda
partprobe /dev/sdb

# Creating the
mkfs.btrfs -m raid1 -d raid0 /dev/sda3 /dev/sdb3

# Adding the mount point
mkdir /share

# Adding the following to /etc/fstab:
# (blkid gives the UUID for the newly created btrfs file system).
Code:
UUID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx	/mnt/share	btrfs	noatime 0 0
# Mounting /share the first time.
mount /share

# Mounts ok. Then add the following to "/etc/exports":
Code:
/share	nnn.nnn.nnn.0/24(rw,no_root_squash,sync,no_subtree_check)
# Export it:
exportfs -ra

# This far everything works as expected.
# Now we want to take it down and reshape it as a soft RAID0 EXT4 share.

# Comment out the share in /etc/exports:
Code:
#/share	nnn.nnn.nnn.0/24(rw,no_root_squash,sync,no_subtree_check)
# Unexport it, then dismount it:
exportfs -ra
umount /share

# Hm? how to wipe the btrfs UUID? Maybe something like:
dd if=/dev/zero bs=32k count=32k of=/dev/sda3 & dd if=/dev/zero bs=32k count=32k of=/dev/sdb3 & wait && sync

# "btrfs fi show -d" shows no btrfs on the system.
# Yup! It's all gone.

# Use fdisk to change type of /dev/sd[ab]3 to fd.

# Running partprobe on both disks. (As if fdisk would not do the job).
partprobe /dev/sda
partprobe /dev/sdb

# Create soft RAID0:
mdadm --create /dev/md126 --auto=yes --level=0 -c 32 --raid-devices=2 /dev/sd[ab]3

# Make a filesystem:
mkfs.ext4 -E stride=32 /dev/md126

# Modify the mount point for '/share' in /etc/fstab:
Code:
/dev/md126	/mnt/share	ext4	noatime 0 0
# Mounting /share the second time. Now it is a soft striped ext4 fs.
mount /share

# Yes, it still works as expected.
# Then i did some performance tests and found the soft stripe ext4 to be slightly inferior to the btrfs stripe. Didn't solve the problem.
# So, unmount and stop the raid. Remove it and then create a btrfs stripe again on the same partitions.
# The idea being to add a SATA adapter card and just move the disks to that controller.

umount /share
mdadm --stop /dev/md126

# Check if the raid is really stopped:
cat /proc/mdstat

# Wipe the superblocks on the raid partitions
mdadm --zero-superblock --force /dev/sda3
mdadm --zero-superblock --force /dev/sdb3

sync # Of religious reasons.

# Use fdisk to change type of /dev/sd[ab]3 to 83.

# Running partprobe on both disks. (As if fdisk would not do the job).
partprobe /dev/sda
partprobe /dev/sdb

# Create a btrfs stripe. (Use the '-f' option since we don't bother with wiping the fraglets of a ext4 filesystem in the partitions).

mkfs.btrfs -f -m raid1 -d raid0 /dev/sda3 /dev/sdb3

# Again, modify the mount point for '/share' in /etc/fstab:
# (blkid gives the UUID for the newly created btrfs file system).
Code:
UUID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx	/mnt/share	btrfs	noatime 0 0
# Mounting /share the third time.
mount /share

No errors. No mount. Silent failure.

Moving the mount of the filesystem from '/share' to '/mnt/share' by editing '/etc/fstab'.
'mount /mnt/share' works. It mounts.
Trying to move back to the '/share' directory does not work. Mount will fail without any form of error message.

Still a mystery to me why it fails like this. If it didn't mount as expected, it should have thrown an error of some sort.
 
Old 10-19-2015, 05:42 PM   #7
wroom
Member
 
Registered: Dec 2009
Location: Sweden
Posts: 159

Original Poster
Rep: Reputation: 31
And several hours later "the villain" is exposed:
Code:
2015-10-20T00:01:31.740639+02:00 dynXXX systemd[1]: Dependency failed for /share.
What dependency?
Why am i not surprised that this has something to do with the can of worms named "systemd"?

Why does 'systemd' even consider reporting about a dependency to a mount point, several hours later, while 'mount' on the other hand didn't bother telling it failed to mount? Where in that spaghetti heap of systemd should i start looking for this "dependency".

I tried to change mount point back to '/share' again, but mount to that directory still fails.
Mounting the same fs to '/mnt/share' still works everytime.

It may be that some minion of the systemd anthill is set to guard an imaginary mount and forever defend that mount point from anything that can interfere with the imaginary mount? Now how to debug such a monster?

Maybe the bug will go away if i replace systemd with something more reliable, like a SYSV init?

Systemd is in deed a digital reproduction of the tower of babel. Not finished yet... It's gonna be great... Work in progress... What could ever go wrong...

 
1 members found this post helpful.
Old 10-19-2015, 06:51 PM   #8
ferrari
LQ Guru
 
Registered: Sep 2003
Location: Auckland, NZ
Distribution: openSUSE Leap
Posts: 5,842

Rep: Reputation: 1148Reputation: 1148Reputation: 1148Reputation: 1148Reputation: 1148Reputation: 1148Reputation: 1148Reputation: 1148Reputation: 1148
I can't offer a lot here, and I don't pretend to understand how systemd may be impacting here with the failed mount. I can only suggest reviewing the following documentation (specifically relating to /etc/fstab entries and dependencies)

http://www.freedesktop.org/software/...emd.mount.html
http://www.freedesktop.org/software/...generator.html

It may be that a bug report is required.
 
Old 10-19-2015, 07:28 PM   #9
wroom
Member
 
Registered: Dec 2009
Location: Sweden
Posts: 159

Original Poster
Rep: Reputation: 31
Quote:
Originally Posted by ferrari View Post
I can't offer a lot here, and I don't pretend to understand how systemd may be impacting here with the failed mount. I can only suggest reviewing the following documentation (specifically relating to /etc/fstab entries and dependencies)

http://www.freedesktop.org/software/...emd.mount.html
http://www.freedesktop.org/software/...generator.html
Thank you!

Quote:
Originally Posted by ferrari View Post
It may be that a bug report is required.
Concerning systemd i think it might be difficult to pin down the exact error.
Maybe easier to switch to some distro that is not yet infected with systemd? Any suggestions?

On the other hand, 'mount' itself should report an error, since the mount actually fail. But it doesn't.
It might be two bugs? But root cause seem to be systemd.
 
Old 10-19-2015, 08:03 PM   #10
Emerson
LQ Sage
 
Registered: Nov 2004
Location: Saint Amant, Acadiana
Distribution: Gentoo ~amd64
Posts: 7,665

Rep: Reputation: Disabled
Want to try Devuan? And of course, my only Linux for more than a decade, Gentoo.
 
Old 10-19-2015, 09:02 PM   #11
wroom
Member
 
Registered: Dec 2009
Location: Sweden
Posts: 159

Original Poster
Rep: Reputation: 31
Quote:
Originally Posted by Emerson View Post
Want to try Devuan? And of course, my only Linux for more than a decade, Gentoo.
You read my mind! I was just reading up on those two. I will try Devuan.
And don't forget FreeBSD! It is a bit put in the shadow by Linux. But if things progress in the current direction for Linux, it might be better for folks to start using FreeBSD instead.
I will certainly try Devuan.

We have come to a road crossing. Which way is best to go? Leaving systemd, nepomuk, akonadi, baloo et cetera to rot, and not support those who work to tear down Linux. I really don't understand why so much effort are put into things that are detrimental to the Linux community.

I propose that systemd should be fully rewritten in Cobol. It would be a perfect match.

(Sorry for the offtopic).


Now i finally could reboot the machine, since the tape robot was finished testing, and labeling tapes.
After rebooting, i can mount on '/share' as well as on '/mnt/share'. So the issue does not survive a reboot.
 
Old 10-19-2015, 09:24 PM   #12
Emerson
LQ Sage
 
Registered: Nov 2004
Location: Saint Amant, Acadiana
Distribution: Gentoo ~amd64
Posts: 7,665

Rep: Reputation: Disabled
KDE works well without poetterd in Gentoo, at least for now. There even is a Gnome fork without systemd dependency. FreeBSD is great, I started using it before I found Gentoo ... and then in about a year or two all my desktops and laptops got moved to Gentoo from FreeBSD. Not servers, though.
 
Old 12-11-2015, 02:04 PM   #13
Frank Broda
LQ Newbie
 
Registered: Dec 2015
Posts: 1

Rep: Reputation: Disabled
Lightbulb systemctl daemon-reload required

Today the same problem appeared on one of my machines: I had merged one volume group (vgXX) into another (vgXX2). Everything went well but I was not able to mount the volumes which changed their volume group name on the original mount points. I had updated /etc/fstab accordingly and mounting somewhere else was ok as well. No error messages but reports of successful mounts in the syslog instead. This thread informed me that systemd is the key. Without the need to boot the machine

systemctl daemon-reload

fixed all my problems. Now all volumes can be mounted again where they belong to.
 
1 members found this post helpful.
Old 12-13-2015, 01:48 PM   #14
wroom
Member
 
Registered: Dec 2009
Location: Sweden
Posts: 159

Original Poster
Rep: Reputation: 31
Quote:
Originally Posted by Frank Broda View Post
Without the need to boot the machine

systemctl daemon-reload

fixed all my problems. Now all volumes can be mounted again where they belong to.
I'll try that the next time this happens.


I can add another mounting issue that also seem to do with systemd:
I had one btrfs volume setup on OpenSUSE 13.2 with three disks. Meta in RAID1 and data in RAID0.
This worked for several weeks as a work area, until i had a power brownout in the rack.
After fixing the power issue and booting everything up again, this btrfs volume would not mount.
At the time of the power failure the btrfs volume had been mounted readonly for some 20+ hours, so nothing was written to the disks coincident with the power failure.
It turned out to be corrupt. But only when mounting on this specific computer. I could move the disks to another computer, and mount it. But moving it back and finding it to be corrupt again.
I haven't done any neat writeup on this issue. But The only trace of what's going on is that the logs says that one of the disks is busy then the system checks if the fs can be booted. It is also interesting that between bootups the "busy disk" state randomly change to any of the three disks.
So i traced the fault to belong to initial checks done by systemd, and then came to the conclusion that i am not interested in spending weeks on debugging btrfs, (which is really, really weak on recovery tools), on a systemd infested system. So i just wiped the disks and put the work data on a ZFS pool in another system to continue the work.

I guess the situation with systemd will never get any better, so best to move to systems that is not systemd infested. (Or at least only has an early version of systemd, before it got cancer and started to take over everything in the OS).

It is also appropriate to emphasize that btrfs as a filesystem works rather good, but the tools to check and repair the btrfs filesystem when it eventually fails is in a bleak "alpha.0.0.1a" state. The tools are not safe to run without first making image backups. The documentation is lacking, and the error reporting of btrfs is like homebrew coder's "to do note to the coder".
 
  


Reply

Tags
mount, opensuse



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Debugging an /etc/init.d service when it fails silently jddancks Linux - Newbie 5 11-25-2014 08:00 AM
Printing this PDF silently fails KenJackson Linux - Software 3 02-12-2008 02:32 PM
picasa dies silently on opensuse 10.2 dukeinlondon Linux - Software 3 11-01-2007 12:15 AM
Installing Packages Fails: Silently pizmooz Slackware - Installation 2 06-21-2006 12:30 AM
NE2000-PCI detection fails silently dregoma Linux - Networking 0 07-14-2003 07:22 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > SUSE / openSUSE

All times are GMT -5. The time now is 10:48 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration