Opened 7 days ago

Last modified 6 days ago

#900 new defect

cdfs assertion abort in libfs during system installation

Reported by: Jiri Svoboda Owned by: Jakub Jermář
Priority: major Milestone: 0.16.1
Component: helenos/lib/fs Version: mainline
Keywords: Cc:
Blocker for: Depends on:
See also:

Description

As of revision 5df257077a2eb3d939122d95f8fcecad4b07ff62 I get a cdfs assertion abort in libfs while installing HelenOS to disk.

I am using a standard amd64 build and ew.py. The problem always occurs while installer is copying /inst/boot/grub/i386-pc/acpi.mod. It can be reproduced when running the installer both in desktop mode and in console mode.

The problem does not happen when running with -bigmem (4G of RAM). However, when running without -bigmem (128MB of RAM), there is around 28 MB free RAM and the installer consumes very little memory, based on top output.

When cdfs goes down, it seems to take vfs down with it.

sysinst] note: Copying /inst/boot/grub/COPYING.
[sysinst] note: Copying /inst/boot/grub/README.
[sysinst] note: Copying /inst/boot/grub/REVISION.
[sysinst] note: Copying /inst/boot/grub/grub.cfg.
[sysinst] note: Creating /inst/boot/grub/i386-pc.
[sysinst] note: Copied 5 files, 216.9 kB; current file: 100% done.
[sysinst] note: Copying /inst/boot/grub/i386-pc/acpi.mod.
[/srv/fs/cdfs(13)] Assertion failed (cur != NULL) in task 13, file "../uspace/lib/fs/libfs.c", line 563.
[/srv/fs/cdfs(13)] 0x0000000000c35c60: 0x0000000070171ac3()
[/srv/fs/cdfs(13)] 0x0000000000c35c90: 0x0000000070160b2a()
[/srv/fs/cdfs(13)] 0x0000000000c35f50: 0x00000000004054de()
[/srv/fs/cdfs(13)] 0x0000000000c35fe0: 0x0000000070163d3d()
[/srv/fs/cdfs(13)] 0x0000000000c35ff0: 0x0000000070177316()
[/srv/fs/cdfs(13)] -- end of stack trace --
[/srv/taskmon(47)] taskmon: Task 13 fault in thread 0xffffffff8231e000.
[/srv/taskmon(47)] taskmon: Executing /app/taskdump -t 13
[/app/taskdump(79)] Task Dump Utility
[/app/taskdump(79)] Dumping task '/srv/fs/cdfs' (task ID 13).
[/app/taskdump(79)] Loaded symbol table from /srv/fs/cdfs
[/app/taskdump(79)] Threads:
[/app/taskdump(79)]  [1] hash: 0xffffffff8231e000
[/app/taskdump(79)] Thread 0xffffffff8231e000: PC = 0x0000000070175a8e (_end+1876352334). FP = 0x000000000040ef70
[/app/taskdump(79)]   0x000000000040ef70: 0x0000000070175a8e (_end+1876352334)
[/app/taskdump(79)]   0x000000000040efe0: 0x000000007017647a (_end+1876354874)
[/app/taskdump(79)]   0x000000000040eff0: 0x0000000070177316 (_end+1876358614)
[/app/taskdump(79)] Address space areas:
[/app/taskdump(79)]  [1] flags: R-XC base: 0x0000000000400000 size: 28672
[/app/taskdump(79)]  [2] flags: RW-C base: 0x0000000000407000 size: 4096
[/app/taskdump(79)]  [3] flags: RW-C base: 0x0000000000408000 size: 4096
[/app/taskdump(79)]  [4] flags: RW-C base: 0x000000000040a000 size: 4096
[/app/taskdump(79)]  [5] flags: RW-C base: 0x000000000040c000 size: 4096
[/app/taskdump(79)]  [6] flags: RW-C base: 0x000000000040e000 size: 4096
[/app/taskdump(79)]  [7] flags: R--C base: 0x0000000000410000 size: 4096
[/app/taskdump(79)]  [8] flags: RW-C base: 0x0000000000411000 size: 8192
[/app/taskdump(79)]  [9] flags: RW-C base: 0x0000000000414000 size: 1048576
[/app/taskdump(79)]  [10] flags: R--C base: 0x0000000000515000 size: 65536
[/app/taskdump(79)]  [11] flags: RW-C base: 0x0000000000525000 size: 4096
[/app/taskdump(79)]  [12] flags: RW-C base: 0x0000000000527000 size: 1048576
[/app/taskdump(79)]  [13] flags: RW-C base: 0x0000000000628000 size: 1048576
[/app/taskdump(79)]  [14] flags: RW-C base: 0x0000000000729000 size: 4096
[/app/taskdump(79)]  [15] flags: RW-C base: 0x000000000072b000 size: 1048576
[/app/taskdump(79)]  [16] flags: RW-C base: 0x000000000082c000 size: 16384
[/app/taskdump(79)]  [17] flags: RW-C base: 0x0000000000831000 size: 1048576
[/app/taskdump(79)]  [18] flags: RW-C base: 0x0000000000932000 size: 1048576
[/app/taskdump(79)]  [19] flags: RW-C base: 0x0000000000a33000 size: 4096
[/app/taskdump(79)]  [20] flags: RW-C base: 0x0000000000a35000 size: 1048576
[/app/taskdump(79)]  [21] flags: RW-C base: 0x0000000000b36000 size: 1048576
[/app/taskdump(79)]  [22] flags: RW-C base: 0x0000000000c37000 size: 126976
[/app/taskdump(79)]  [23] flags: R-XC base: 0x0000000070000000 size: 98304
[/app/taskdump(79)]  [24] flags: RW-C base: 0x0000000070018000 size: 126976
[/app/taskdump(79)]  [25] flags: RW-C base: 0x0000000070037000 size: 4096
[/app/taskdump(79)]  [26] flags: RW-C base: 0x0000000070038000 size: 4096
[/app/taskdump(79)]  [27] flags: RW-C base: 0x000000007003a000 size: 4096
[/app/taskdump(79)]  [28] flags: RW-C base: 0x000000007003c000 size: 4096
[/app/taskdump(79)]  [29] flags: RW-C base: 0x000000007003e000 size: 4096
[/app/taskdump(79)]  [30] flags: R--C base: 0x0000000070040000 size: 4096
[/app/taskdump(79)]  [31] flags: RW-C base: 0x0000000070041000 size: 4096
[/app/taskdump(79)]  [32] flags: RW-C base: 0x0000000070043000 size: 1048576
[/app/taskdump(79)]  [33] flags: R-XC base: 0x0000000070144000 size: 311296
[/app/taskdump(79)]  [34] flags: RW-C base: 0x0000000070190000 size: 131072
[/app/taskdump(79)]  [35] flags: RW-C base: 0x00007ffffff00000 size: 1048576
[/app/taskdump(79)] Failed dumping fibrils.
[sysinst] note: Copied 6 files, 227.5 kB; current file: 100% done.
Task init:vfs (6) killed due to an exception at program counter 0x000000000040dad3.
cs =0x0000000000000023	rip=0x000000000040dad3	rfl=0x0000000000010202	err=0x0000000000000004
ss =0x000000000000001b
rax=0x0000000000000002	rbx=0x0000000000000001	rcx=0x0000000000000026	rdx=000000000000000000
rsi=0x000000000043ef60	rdi=0x0000000000000001	rbp=0x000000000043ef40	rsp=0x000000000043ef20
r8 =000000000000000000	r9 =000000000000000000	r10=000000000000000000	r11=0x0000000000000216
r12=000000000000000000	r13=0x000000000043ef60	r14=0x0000000000407a80	r15=0x0000000000001000
0x000000000043ef40:                mpsc_send()+19
0x000000000043efe0:   async_data_write_start()+1181
0x000000000043eff0:              fibril_exit()+98
Kill message: Page fault: 0x0000000000000001.
[/srv/taskmon(47)] taskmon: Task 6 fault in thread 0xffffffff81e47000.
[/srv/taskmon(47)] taskmon: Executing /app/taskdump -t 6

The failed assertion is here:

void libfs_lookup(libfs_ops_t *ops, fs_handle_t fs_handle, ipc_call_t *req)
{
        unsigned first = ipc_get_arg1(req);
        unsigned len = ipc_get_arg2(req);
        service_id_t service_id = ipc_get_arg3(req);
        fs_index_t index = ipc_get_arg4(req);
        int lflag = ipc_get_arg5(req);

        // TODO: Validate flags.

        unsigned next = first;
        unsigned last = first + len;

        char component[NAME_MAX + 1];
        errno_t rc;

        fs_node_t *par = NULL;
        fs_node_t *cur = NULL;
        fs_node_t *tmp = NULL;
        unsigned clen = 0;

        rc = ops->node_get(&cur, service_id, index);
        if (rc != EOK) {
                async_answer_0(req, rc);
                goto out;
        }
        
        assert(cur != NULL); <<<<<<<<<<<<< FAILED

Change History (4)

comment:1 by Jiri Svoboda, 7 days ago

Milestone: 0.16.1

comment:2 by Jiri Svoboda, 7 days ago

  1. If a file system server does not recognize a service_id/index combination, some filesystems return an error code at least in some cases (udf, mfs), while others return EOK and a NULL pointer (tmpfs, cdfs). Here we don't expect to get EOK+NULL. So what should the correct behavior be?
  1. It looks like an invalid service_id/index combination is passed. Why?

comment:3 by Jiri Svoboda, 7 days ago

Component: helenos/unspecifiedhelenos/lib/fs
Owner: set to Jakub Jermář

comment:4 by Jiri Svoboda, 6 days ago

I added KLOG_PRINTFs to cdfs whenever I insert or remove from the node hash table, as well as in cdfs_node_get(). In the end service ID was correct, the index was 0 (root node).

The root node was removed from the cache by cdfs_remove_closed(). It was removed because node→opened was false. cdfs_close() does not do anything to the root node. That means the problem is that the root node was never opened.

sysinst copies /vol/HelenOS-CD/boot to under /inst. It means it opens /vol/HelenOS-CD/boot, but not /vol/HelenOS-CD specifically. Not sure how this works out exactly in vfs/libfs, but it seems that we can go to /boot without opening the root node.

If we cd /vol/HelenOS-CD , cd / before the installation, the problem does not occur (the root node is set to open).

cdfs_remove_closed() is called from cleanup_cache() which is called from cdfs_close().

The simple fix is to never evict the root node from the cache. I am not sure, though, if that is the correct fix.

What bothers me:

  • I cannot explain why the problem is not reproduced with -bigmem
  • is it okay that the FS's root dir is traversed without the directory being open?
  • could this happen on other type of FS?
  • how should the NULL/error returns from node_get be returned by filesystems / handled by libfs?
Note: See TracTickets for help on using tickets.