commit 2dc2565902d3c24108c4b7101e91957fd068a242 Author: Greg Kroah-Hartman Date: Fri Nov 21 09:23:44 2014 -0800 Linux 3.14.25 commit ee78ce5d442f0ca0f41fec1ff0749394d0b2d4d4 Author: Vlastimil Babka Date: Wed Jun 4 16:07:22 2014 -0700 mm/page_alloc: prevent MIGRATE_RESERVE pages from being misplaced commit 5bcc9f86ef09a933255ee66bd899d4601785dad5 upstream. For the MIGRATE_RESERVE pages, it is useful when they do not get misplaced on free_list of other migratetype, otherwise they might get allocated prematurely and e.g. fragment the MIGRATE_RESEVE pageblocks. While this cannot be avoided completely when allocating new MIGRATE_RESERVE pageblocks in min_free_kbytes sysctl handler, we should prevent the misplacement where possible. Currently, it is possible for the misplacement to happen when a MIGRATE_RESERVE page is allocated on pcplist through rmqueue_bulk() as a fallback for other desired migratetype, and then later freed back through free_pcppages_bulk() without being actually used. This happens because free_pcppages_bulk() uses get_freepage_migratetype() to choose the free_list, and rmqueue_bulk() calls set_freepage_migratetype() with the *desired* migratetype and not the page's original MIGRATE_RESERVE migratetype. This patch fixes the problem by moving the call to set_freepage_migratetype() from rmqueue_bulk() down to __rmqueue_smallest() and __rmqueue_fallback() where the actual page's migratetype (e.g. from which free_list the page is taken from) is used. Note that this migratetype might be different from the pageblock's migratetype due to freepage stealing decisions. This is OK, as page stealing never uses MIGRATE_RESERVE as a fallback, and also takes care to leave all MIGRATE_CMA pages on the correct freelist. Therefore, as an additional benefit, the call to get_pageblock_migratetype() from rmqueue_bulk() when CMA is enabled, can be removed completely. This relies on the fact that MIGRATE_CMA pageblocks are created only during system init, and the above. The related is_migrate_isolate() check is also unnecessary, as memory isolation has other ways to move pages between freelists, and drain pcp lists containing pages that should be isolated. The buffered_rmqueue() can also benefit from calling get_freepage_migratetype() instead of get_pageblock_migratetype(). Signed-off-by: Vlastimil Babka Reported-by: Yong-Taek Lee Reported-by: Bartlomiej Zolnierkiewicz Suggested-by: Joonsoo Kim Acked-by: Joonsoo Kim Suggested-by: Mel Gorman Acked-by: Minchan Kim Cc: KOSAKI Motohiro Cc: Marek Szyprowski Cc: Hugh Dickins Cc: Rik van Riel Cc: Michal Nazarewicz Cc: "Wang, Yalin" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Mel Gorman Signed-off-by: Greg Kroah-Hartman commit 24fa05302731cbf672602b355b185127ede58018 Author: Mel Gorman Date: Wed Jun 4 16:10:49 2014 -0700 mm: vmscan: use proportional scanning during direct reclaim and full scan at DEF_PRIORITY commit 1a501907bbea8e6ebb0b16cf6db9e9cbf1d2c813 upstream. Commit "mm: vmscan: obey proportional scanning requirements for kswapd" ensured that file/anon lists were scanned proportionally for reclaim from kswapd but ignored it for direct reclaim. The intent was to minimse direct reclaim latency but Yuanhan Liu pointer out that it substitutes one long stall for many small stalls and distorts aging for normal workloads like streaming readers/writers. Hugh Dickins pointed out that a side-effect of the same commit was that when one LRU list dropped to zero that the entirety of the other list was shrunk leading to excessive reclaim in memcgs. This patch scans the file/anon lists proportionally for direct reclaim to similarly age page whether reclaimed by kswapd or direct reclaim but takes care to abort reclaim if one LRU drops to zero after reclaiming the requested number of pages. Based on ext4 and using the Intel VM scalability test 3.15.0-rc5 3.15.0-rc5 shrinker proportion Unit lru-file-readonce elapsed 5.3500 ( 0.00%) 5.4200 ( -1.31%) Unit lru-file-readonce time_range 0.2700 ( 0.00%) 0.1400 ( 48.15%) Unit lru-file-readonce time_stddv 0.1148 ( 0.00%) 0.0536 ( 53.33%) Unit lru-file-readtwice elapsed 8.1700 ( 0.00%) 8.1700 ( 0.00%) Unit lru-file-readtwice time_range 0.4300 ( 0.00%) 0.2300 ( 46.51%) Unit lru-file-readtwice time_stddv 0.1650 ( 0.00%) 0.0971 ( 41.16%) The test cases are running multiple dd instances reading sparse files. The results are within the noise for the small test machine. The impact of the patch is more noticable from the vmstats 3.15.0-rc5 3.15.0-rc5 shrinker proportion Minor Faults 35154 36784 Major Faults 611 1305 Swap Ins 394 1651 Swap Outs 4394 5891 Allocation stalls 118616 44781 Direct pages scanned 4935171 4602313 Kswapd pages scanned 15921292 16258483 Kswapd pages reclaimed 15913301 16248305 Direct pages reclaimed 4933368 4601133 Kswapd efficiency 99% 99% Kswapd velocity 670088.047 682555.961 Direct efficiency 99% 99% Direct velocity 207709.217 193212.133 Percentage direct scans 23% 22% Page writes by reclaim 4858.000 6232.000 Page writes file 464 341 Page writes anon 4394 5891 Note that there are fewer allocation stalls even though the amount of direct reclaim scanning is very approximately the same. Signed-off-by: Mel Gorman Cc: Johannes Weiner Cc: Hugh Dickins Cc: Tim Chen Cc: Dave Chinner Tested-by: Yuanhan Liu Cc: Bob Liu Cc: Jan Kara Cc: Rik van Riel Cc: Al Viro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Mel Gorman Signed-off-by: Greg Kroah-Hartman commit 14261448c60e30c31df90164ebe0667123a64792 Author: Tim Chen Date: Wed Jun 4 16:10:47 2014 -0700 fs/superblock: avoid locking counting inodes and dentries before reclaiming them commit d23da150a37c9fe3cc83dbaf71b3e37fd434ed52 upstream. We remove the call to grab_super_passive in call to super_cache_count. This becomes a scalability bottleneck as multiple threads are trying to do memory reclamation, e.g. when we are doing large amount of file read and page cache is under pressure. The cached objects quickly got reclaimed down to 0 and we are aborting the cache_scan() reclaim. But counting creates a log jam acquiring the sb_lock. We are holding the shrinker_rwsem which ensures the safety of call to list_lru_count_node() and s_op->nr_cached_objects. The shrinker is unregistered now before ->kill_sb() so the operation is safe when we are doing unmount. The impact will depend heavily on the machine and the workload but for a small machine using postmark tuned to use 4xRAM size the results were 3.15.0-rc5 3.15.0-rc5 vanilla shrinker-v1r1 Ops/sec Transactions 21.00 ( 0.00%) 24.00 ( 14.29%) Ops/sec FilesCreate 39.00 ( 0.00%) 44.00 ( 12.82%) Ops/sec CreateTransact 10.00 ( 0.00%) 12.00 ( 20.00%) Ops/sec FilesDeleted 6202.00 ( 0.00%) 6202.00 ( 0.00%) Ops/sec DeleteTransact 11.00 ( 0.00%) 12.00 ( 9.09%) Ops/sec DataRead/MB 25.97 ( 0.00%) 29.10 ( 12.05%) Ops/sec DataWrite/MB 49.99 ( 0.00%) 56.02 ( 12.06%) ffsb running in a configuration that is meant to simulate a mail server showed 3.15.0-rc5 3.15.0-rc5 vanilla shrinker-v1r1 Ops/sec readall 9402.63 ( 0.00%) 9567.97 ( 1.76%) Ops/sec create 4695.45 ( 0.00%) 4735.00 ( 0.84%) Ops/sec delete 173.72 ( 0.00%) 179.83 ( 3.52%) Ops/sec Transactions 14271.80 ( 0.00%) 14482.81 ( 1.48%) Ops/sec Read 37.00 ( 0.00%) 37.60 ( 1.62%) Ops/sec Write 18.20 ( 0.00%) 18.30 ( 0.55%) Signed-off-by: Tim Chen Signed-off-by: Mel Gorman Cc: Johannes Weiner Cc: Hugh Dickins Cc: Dave Chinner Tested-by: Yuanhan Liu Cc: Bob Liu Cc: Jan Kara Acked-by: Rik van Riel Cc: Al Viro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Mel Gorman Signed-off-by: Greg Kroah-Hartman commit e6bed540241ac74b6c30688ba9411b1f7b8d96fa Author: Dave Chinner Date: Wed Jun 4 16:10:46 2014 -0700 fs/superblock: unregister sb shrinker before ->kill_sb() commit 28f2cd4f6da24a1aa06c226618ed5ad69e13df64 upstream. This series is aimed at regressions noticed during reclaim activity. The first two patches are shrinker patches that were posted ages ago but never merged for reasons that are unclear to me. I'm posting them again to see if there was a reason they were dropped or if they just got lost. Dave? Time? The last patch adjusts proportional reclaim. Yuanhan Liu, can you retest the vm scalability test cases on a larger machine? Hugh, does this work for you on the memcg test cases? Based on ext4, I get the following results but unfortunately my larger test machines are all unavailable so this is based on a relatively small machine. postmark 3.15.0-rc5 3.15.0-rc5 vanilla proportion-v1r4 Ops/sec Transactions 21.00 ( 0.00%) 25.00 ( 19.05%) Ops/sec FilesCreate 39.00 ( 0.00%) 45.00 ( 15.38%) Ops/sec CreateTransact 10.00 ( 0.00%) 12.00 ( 20.00%) Ops/sec FilesDeleted 6202.00 ( 0.00%) 6202.00 ( 0.00%) Ops/sec DeleteTransact 11.00 ( 0.00%) 12.00 ( 9.09%) Ops/sec DataRead/MB 25.97 ( 0.00%) 30.02 ( 15.59%) Ops/sec DataWrite/MB 49.99 ( 0.00%) 57.78 ( 15.58%) ffsb (mail server simulator) 3.15.0-rc5 3.15.0-rc5 vanilla proportion-v1r4 Ops/sec readall 9402.63 ( 0.00%) 9805.74 ( 4.29%) Ops/sec create 4695.45 ( 0.00%) 4781.39 ( 1.83%) Ops/sec delete 173.72 ( 0.00%) 177.23 ( 2.02%) Ops/sec Transactions 14271.80 ( 0.00%) 14764.37 ( 3.45%) Ops/sec Read 37.00 ( 0.00%) 38.50 ( 4.05%) Ops/sec Write 18.20 ( 0.00%) 18.50 ( 1.65%) dd of a large file 3.15.0-rc5 3.15.0-rc5 vanilla proportion-v1r4 WallTime DownloadTar 75.00 ( 0.00%) 61.00 ( 18.67%) WallTime DD 423.00 ( 0.00%) 401.00 ( 5.20%) WallTime Delete 2.00 ( 0.00%) 5.00 (-150.00%) stutter (times mmap latency during large amounts of IO) 3.15.0-rc5 3.15.0-rc5 vanilla proportion-v1r4 Unit >5ms Delays 80252.0000 ( 0.00%) 81523.0000 ( -1.58%) Unit Mmap min 8.2118 ( 0.00%) 8.3206 ( -1.33%) Unit Mmap mean 17.4614 ( 0.00%) 17.2868 ( 1.00%) Unit Mmap stddev 24.9059 ( 0.00%) 34.6771 (-39.23%) Unit Mmap max 2811.6433 ( 0.00%) 2645.1398 ( 5.92%) Unit Mmap 90% 20.5098 ( 0.00%) 18.3105 ( 10.72%) Unit Mmap 93% 22.9180 ( 0.00%) 20.1751 ( 11.97%) Unit Mmap 95% 25.2114 ( 0.00%) 22.4988 ( 10.76%) Unit Mmap 99% 46.1430 ( 0.00%) 43.5952 ( 5.52%) Unit Ideal Tput 85.2623 ( 0.00%) 78.8906 ( 7.47%) Unit Tput min 44.0666 ( 0.00%) 43.9609 ( 0.24%) Unit Tput mean 45.5646 ( 0.00%) 45.2009 ( 0.80%) Unit Tput stddev 0.9318 ( 0.00%) 1.1084 (-18.95%) Unit Tput max 46.7375 ( 0.00%) 46.7539 ( -0.04%) This patch (of 3): We will like to unregister the sb shrinker before ->kill_sb(). This will allow cached objects to be counted without call to grab_super_passive() to update ref count on sb. We want to avoid locking during memory reclamation especially when we are skipping the memory reclaim when we are out of cached objects. This is safe because grab_super_passive does a try-lock on the sb->s_umount now, and so if we are in the unmount process, it won't ever block. That means what used to be a deadlock and races we were avoiding by using grab_super_passive() is now: shrinker umount down_read(shrinker_rwsem) down_write(sb->s_umount) shrinker_unregister down_write(shrinker_rwsem) grab_super_passive(sb) down_read_trylock(sb->s_umount) .... up_read(shrinker_rwsem) up_write(shrinker_rwsem) ->kill_sb() .... So it is safe to deregister the shrinker before ->kill_sb(). Signed-off-by: Tim Chen Signed-off-by: Mel Gorman Cc: Johannes Weiner Cc: Hugh Dickins Cc: Dave Chinner Tested-by: Yuanhan Liu Cc: Bob Liu Cc: Jan Kara Acked-by: Rik van Riel Cc: Al Viro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Mel Gorman Signed-off-by: Greg Kroah-Hartman commit ddb5f1a61f4336a808a9d5a32dd02052d5110946 Author: Hugh Dickins Date: Sat Jul 26 12:58:23 2014 -0700 mm: fix direct reclaim writeback regression commit 8bdd638091605dc66d92c57c4b80eb87fffc15f7 upstream. Shortly before 3.16-rc1, Dave Jones reported: WARNING: CPU: 3 PID: 19721 at fs/xfs/xfs_aops.c:971 xfs_vm_writepage+0x5ce/0x630 [xfs]() CPU: 3 PID: 19721 Comm: trinity-c61 Not tainted 3.15.0+ #3 Call Trace: xfs_vm_writepage+0x5ce/0x630 [xfs] shrink_page_list+0x8f9/0xb90 shrink_inactive_list+0x253/0x510 shrink_lruvec+0x563/0x6c0 shrink_zone+0x3b/0x100 shrink_zones+0x1f1/0x3c0 try_to_free_pages+0x164/0x380 __alloc_pages_nodemask+0x822/0xc90 alloc_pages_vma+0xaf/0x1c0 handle_mm_fault+0xa31/0xc50 etc. 970 if (WARN_ON_ONCE((current->flags & (PF_MEMALLOC|PF_KSWAPD)) == 971 PF_MEMALLOC)) I did not respond at the time, because a glance at the PageDirty block in shrink_page_list() quickly shows that this is impossible: we don't do writeback on file pages (other than tmpfs) from direct reclaim nowadays. Dave was hallucinating, but it would have been disrespectful to say so. However, my own /var/log/messages now shows similar complaints WARNING: CPU: 1 PID: 28814 at fs/ext4/inode.c:1881 ext4_writepage+0xa7/0x38b() WARNING: CPU: 0 PID: 27347 at fs/ext4/inode.c:1764 ext4_writepage+0xa7/0x38b() from stressing some mmotm trees during July. Could a dirty xfs or ext4 file page somehow get marked PageSwapBacked, so fail shrink_page_list()'s page_is_file_cache() test, and so proceed to mapping->a_ops->writepage()? Yes, 3.16-rc1's commit 68711a746345 ("mm, migration: add destination page freeing callback") has provided such a way to compaction: if migrating a SwapBacked page fails, its newpage may be put back on the list for later use with PageSwapBacked still set, and nothing will clear it. Whether that can do anything worse than issue WARN_ON_ONCEs, and get some statistics wrong, is unclear: easier to fix than to think through the consequences. Fixing it here, before the put_new_page(), addresses the bug directly, but is probably the worst place to fix it. Page migration is doing too many parts of the job on too many levels: fixing it in move_to_new_page() to complement its SetPageSwapBacked would be preferable, except why is it (and newpage->mapping and newpage->index) done there, rather than down in migrate_page_move_mapping(), once we are sure of success? Not a cleanup to get into right now, especially not with memcg cleanups coming in 3.17. Reported-by: Dave Jones Signed-off-by: Hugh Dickins Signed-off-by: Linus Torvalds Signed-off-by: Mel Gorman Signed-off-by: Greg Kroah-Hartman commit 5450bba9e32b69a5431112324fc0877923192433 Author: Shaohua Li Date: Tue Apr 8 15:58:09 2014 +0800 x86/mm: In the PTE swapout page reclaim case clear the accessed bit instead of flushing the TLB commit b13b1d2d8692b437203de7a404c6b809d2cc4d99 upstream. We use the accessed bit to age a page at page reclaim time, and currently we also flush the TLB when doing so. But in some workloads TLB flush overhead is very heavy. In my simple multithreaded app with a lot of swap to several pcie SSDs, removing the tlb flush gives about 20% ~ 30% swapout speedup. Fortunately just removing the TLB flush is a valid optimization: on x86 CPUs, clearing the accessed bit without a TLB flush doesn't cause data corruption. It could cause incorrect page aging and the (mistaken) reclaim of hot pages, but the chance of that should be relatively low. So as a performance optimization don't flush the TLB when clearing the accessed bit, it will eventually be flushed by a context switch or a VM operation anyway. [ In the rare event of it not getting flushed for a long time the delay shouldn't really matter because there's no real memory pressure for swapout to react to. ] Suggested-by: Linus Torvalds Signed-off-by: Shaohua Li Acked-by: Rik van Riel Acked-by: Mel Gorman Acked-by: Hugh Dickins Acked-by: Johannes Weiner Cc: linux-mm@kvack.org Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/20140408075809.GA1764@kernel.org [ Rewrote the changelog and the code comments. ] Signed-off-by: Ingo Molnar Signed-off-by: Mel Gorman Signed-off-by: Greg Kroah-Hartman commit 4201cb7e8776faf89d587e78a3ff11a4c9516082 Author: Vlastimil Babka Date: Wed Jun 4 16:10:41 2014 -0700 mm, compaction: properly signal and act upon lock and need_sched() contention commit be9765722e6b7ece8263cbab857490332339bd6f upstream. Compaction uses compact_checklock_irqsave() function to periodically check for lock contention and need_resched() to either abort async compaction, or to free the lock, schedule and retake the lock. When aborting, cc->contended is set to signal the contended state to the caller. Two problems have been identified in this mechanism. First, compaction also calls directly cond_resched() in both scanners when no lock is yet taken. This call either does not abort async compaction, or set cc->contended appropriately. This patch introduces a new compact_should_abort() function to achieve both. In isolate_freepages(), the check frequency is reduced to once by SWAP_CLUSTER_MAX pageblocks to match what the migration scanner does in the preliminary page checks. In case a pageblock is found suitable for calling isolate_freepages_block(), the checks within there are done on higher frequency. Second, isolate_freepages() does not check if isolate_freepages_block() aborted due to contention, and advances to the next pageblock. This violates the principle of aborting on contention, and might result in pageblocks not being scanned completely, since the scanning cursor is advanced. This problem has been noticed in the code by Joonsoo Kim when reviewing related patches. This patch makes isolate_freepages_block() check the cc->contended flag and abort. In case isolate_freepages() has already isolated some pages before aborting due to contention, page migration will proceed, which is OK since we do not want to waste the work that has been done, and page migration has own checks for contention. However, we do not want another isolation attempt by either of the scanners, so cc->contended flag check is added also to compaction_alloc() and compact_finished() to make sure compaction is aborted right after the migration. The outcome of the patch should be reduced lock contention by async compaction and lower latencies for higher-order allocations where direct compaction is involved. [akpm@linux-foundation.org: fix typo in comment] Reported-by: Joonsoo Kim Signed-off-by: Vlastimil Babka Reviewed-by: Naoya Horiguchi Cc: Minchan Kim Cc: Mel Gorman Cc: Bartlomiej Zolnierkiewicz Cc: Michal Nazarewicz Cc: Christoph Lameter Cc: Rik van Riel Acked-by: Michal Nazarewicz Tested-by: Shawn Guo Tested-by: Kevin Hilman Tested-by: Stephen Warren Tested-by: Fabio Estevam Cc: David Rientjes Cc: Stephen Rothwell Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Mel Gorman Signed-off-by: Greg Kroah-Hartman commit fb81c5ee2692f75aa149330f54b56a8cbf2cd902 Author: Vlastimil Babka Date: Wed Jun 4 16:08:34 2014 -0700 mm/compaction: avoid rescanning pageblocks in isolate_freepages commit e9ade569910a82614ff5f2c2cea2b65a8d785da4 upstream. The compaction free scanner in isolate_freepages() currently remembers PFN of the highest pageblock where it successfully isolates, to be used as the starting pageblock for the next invocation. The rationale behind this is that page migration might return free pages to the allocator when migration fails and we don't want to skip them if the compaction continues. Since migration now returns free pages back to compaction code where they can be reused, this is no longer a concern. This patch changes isolate_freepages() so that the PFN for restarting is updated with each pageblock where isolation is attempted. Using stress-highalloc from mmtests, this resulted in 10% reduction of the pages scanned by the free scanner. Note that the somewhat similar functionality that records highest successful pageblock in zone->compact_cached_free_pfn, remains unchanged. This cache is used when the whole compaction is restarted, not for multiple invocations of the free scanner during single compaction. Signed-off-by: Vlastimil Babka Cc: Minchan Kim Cc: Mel Gorman Cc: Joonsoo Kim Cc: Bartlomiej Zolnierkiewicz Acked-by: Michal Nazarewicz Reviewed-by: Naoya Horiguchi Cc: Christoph Lameter Cc: Rik van Riel Acked-by: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Mel Gorman Signed-off-by: Greg Kroah-Hartman commit 41c9323cf11df2465a3064591b669d90111b6f9e Author: Vlastimil Babka Date: Wed Jun 4 16:08:32 2014 -0700 mm/compaction: do not count migratepages when unnecessary commit f8c9301fa5a2a8b873c67f2a3d8230d5c13f61b7 upstream. During compaction, update_nr_listpages() has been used to count remaining non-migrated and free pages after a call to migrage_pages(). The freepages counting has become unneccessary, and it turns out that migratepages counting is also unnecessary in most cases. The only situation when it's needed to count cc->migratepages is when migrate_pages() returns with a negative error code. Otherwise, the non-negative return value is the number of pages that were not migrated, which is exactly the count of remaining pages in the cc->migratepages list. Furthermore, any non-zero count is only interesting for the tracepoint of mm_compaction_migratepages events, because after that all remaining unmigrated pages are put back and their count is set to 0. This patch therefore removes update_nr_listpages() completely, and changes the tracepoint definition so that the manual counting is done only when the tracepoint is enabled, and only when migrate_pages() returns a negative error code. Furthermore, migrate_pages() and the tracepoints won't be called when there's nothing to migrate. This potentially avoids some wasted cycles and reduces the volume of uninteresting mm_compaction_migratepages events where "nr_migrated=0 nr_failed=0". In the stress-highalloc mmtest, this was about 75% of the events. The mm_compaction_isolate_migratepages event is better for determining that nothing was isolated for migration, and this one was just duplicating the info. Signed-off-by: Vlastimil Babka Reviewed-by: Naoya Horiguchi Cc: Minchan Kim Cc: Mel Gorman Cc: Joonsoo Kim Cc: Bartlomiej Zolnierkiewicz Acked-by: Michal Nazarewicz Cc: Christoph Lameter Cc: Rik van Riel Acked-by: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Mel Gorman Signed-off-by: Greg Kroah-Hartman commit 1c99371f2bd5a31e66b8199c3e66629043d98a6c Author: David Rientjes Date: Wed Jun 4 16:08:31 2014 -0700 mm, compaction: terminate async compaction when rescheduling commit aeef4b83806f49a0c454b7d4578671b71045bee2 upstream. Async compaction terminates prematurely when need_resched(), see compact_checklock_irqsave(). This can never trigger, however, if the cond_resched() in isolate_migratepages_range() always takes care of the scheduling. If the cond_resched() actually triggers, then terminate this pageblock scan for async compaction as well. Signed-off-by: David Rientjes Acked-by: Mel Gorman Acked-by: Vlastimil Babka Cc: Mel Gorman Cc: Vlastimil Babka Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Mel Gorman Signed-off-by: Greg Kroah-Hartman commit 102a623045f715b79f9e4ad697c3f413506d6378 Author: David Rientjes Date: Wed Jun 4 16:08:28 2014 -0700 mm, compaction: embed migration mode in compact_control commit e0b9daeb453e602a95ea43853dc12d385558ce1f upstream. We're going to want to manipulate the migration mode for compaction in the page allocator, and currently compact_control's sync field is only a bool. Currently, we only do MIGRATE_ASYNC or MIGRATE_SYNC_LIGHT compaction depending on the value of this bool. Convert the bool to enum migrate_mode and pass the migration mode in directly. Later, we'll want to avoid MIGRATE_SYNC_LIGHT for thp allocations in the pagefault patch to avoid unnecessary latency. This also alters compaction triggered from sysfs, either for the entire system or for a node, to force MIGRATE_SYNC. [akpm@linux-foundation.org: fix build] [iamjoonsoo.kim@lge.com: use MIGRATE_SYNC in alloc_contig_range()] Signed-off-by: David Rientjes Suggested-by: Mel Gorman Acked-by: Vlastimil Babka Cc: Greg Thelen Cc: Naoya Horiguchi Signed-off-by: Joonsoo Kim Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Mel Gorman Signed-off-by: Greg Kroah-Hartman commit 3793816b671250366b9334288c760e60136bfe4e Author: David Rientjes Date: Wed Jun 4 16:08:27 2014 -0700 mm, compaction: add per-zone migration pfn cache for async compaction commit 35979ef3393110ff3c12c6b94552208d3bdf1a36 upstream. Each zone has a cached migration scanner pfn for memory compaction so that subsequent calls to memory compaction can start where the previous call left off. Currently, the compaction migration scanner only updates the per-zone cached pfn when pageblocks were not skipped for async compaction. This creates a dependency on calling sync compaction to avoid having subsequent calls to async compaction from scanning an enormous amount of non-MOVABLE pageblocks each time it is called. On large machines, this could be potentially very expensive. This patch adds a per-zone cached migration scanner pfn only for async compaction. It is updated everytime a pageblock has been scanned in its entirety and when no pages from it were successfully isolated. The cached migration scanner pfn for sync compaction is updated only when called for sync compaction. Signed-off-by: David Rientjes Acked-by: Vlastimil Babka Reviewed-by: Naoya Horiguchi Cc: Greg Thelen Cc: Mel Gorman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Mel Gorman Signed-off-by: Greg Kroah-Hartman commit 20f0d30fb0b4bfc8ae288b1ae1184ab27ccebc39 Author: David Rientjes Date: Wed Jun 4 16:08:26 2014 -0700 mm, compaction: return failed migration target pages back to freelist commit d53aea3d46d64e95da9952887969f7533b9ab25e upstream. Greg reported that he found isolated free pages were returned back to the VM rather than the compaction freelist. This will cause holes behind the free scanner and cause it to reallocate additional memory if necessary later. He detected the problem at runtime seeing that ext4 metadata pages (esp the ones read by "sbi->s_group_desc[i] = sb_bread(sb, block)") were constantly visited by compaction calls of migrate_pages(). These pages had a non-zero b_count which caused fallback_migrate_page() -> try_to_release_page() -> try_to_free_buffers() to fail. Memory compaction works by having a "freeing scanner" scan from one end of a zone which isolates pages as migration targets while another "migrating scanner" scans from the other end of the same zone which isolates pages for migration. When page migration fails for an isolated page, the target page is returned to the system rather than the freelist built by the freeing scanner. This may require the freeing scanner to continue scanning memory after suitable migration targets have already been returned to the system needlessly. This patch returns destination pages to the freeing scanner freelist when page migration fails. This prevents unnecessary work done by the freeing scanner but also encourages memory to be as compacted as possible at the end of the zone. Signed-off-by: David Rientjes Reported-by: Greg Thelen Acked-by: Mel Gorman Acked-by: Vlastimil Babka Reviewed-by: Naoya Horiguchi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Mel Gorman Signed-off-by: Greg Kroah-Hartman commit a527e8d4f7ae0e1d6a3c267f31e13b1d2a198508 Author: David Rientjes Date: Wed Jun 4 16:08:25 2014 -0700 mm, migration: add destination page freeing callback commit 68711a746345c44ae00c64d8dbac6a9ce13ac54a upstream. Memory migration uses a callback defined by the caller to determine how to allocate destination pages. When migration fails for a source page, however, it frees the destination page back to the system. This patch adds a memory migration callback defined by the caller to determine how to free destination pages. If a caller, such as memory compaction, builds its own freelist for migration targets, this can reuse already freed memory instead of scanning additional memory. If the caller provides a function to handle freeing of destination pages, it is called when page migration fails. If the caller passes NULL then freeing back to the system will be handled as usual. This patch introduces no functional change. Signed-off-by: David Rientjes Reviewed-by: Naoya Horiguchi Acked-by: Mel Gorman Acked-by: Vlastimil Babka Cc: Greg Thelen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Mel Gorman Signed-off-by: Greg Kroah-Hartman commit 5721949c486705f6a5ba4fc6d2b73f3bb8456149 Author: Vlastimil Babka Date: Wed Jun 4 16:07:26 2014 -0700 mm/compaction: cleanup isolate_freepages() commit c96b9e508f3d06ddb601dcc9792d62c044ab359e upstream. isolate_freepages() is currently somewhat hard to follow thanks to many looks like it is related to the 'low_pfn' variable, but in fact it is not. This patch renames the 'high_pfn' variable to a hopefully less confusing name, and slightly changes its handling without a functional change. A comment made obsolete by recent changes is also updated. [akpm@linux-foundation.org: comment fixes, per Minchan] [iamjoonsoo.kim@lge.com: cleanups] Signed-off-by: Vlastimil Babka Cc: Minchan Kim Cc: Mel Gorman Cc: Joonsoo Kim Cc: Bartlomiej Zolnierkiewicz Cc: Michal Nazarewicz Cc: Naoya Horiguchi Cc: Christoph Lameter Cc: Rik van Riel Cc: Dongjun Shin Cc: Sunghwan Yun Signed-off-by: Joonsoo Kim Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Mel Gorman Signed-off-by: Greg Kroah-Hartman commit 46504e575d59fca88515e62fbe0a21f205b4cfdb Author: Heesub Shin Date: Wed Jun 4 16:07:24 2014 -0700 mm/compaction: clean up unused code lines commit 13fb44e4b0414d7e718433a49e6430d5b76bd46e upstream. Remove code lines currently not in use or never called. Signed-off-by: Heesub Shin Acked-by: Vlastimil Babka Cc: Dongjun Shin Cc: Sunghwan Yun Cc: Minchan Kim Cc: Mel Gorman Cc: Joonsoo Kim Cc: Bartlomiej Zolnierkiewicz Cc: Michal Nazarewicz Cc: Naoya Horiguchi Cc: Christoph Lameter Cc: Rik van Riel Cc: Dongjun Shin Cc: Sunghwan Yun Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Mel Gorman Signed-off-by: Greg Kroah-Hartman commit aa64050a24605bb1b25ba06653cf13590fbdb568 Author: Fabian Frederick Date: Mon Apr 7 15:37:55 2014 -0700 mm/readahead.c: inline ra_submit commit 29f175d125f0f3a9503af8a5596f93d714cceb08 upstream. Commit f9acc8c7b35a ("readahead: sanify file_ra_state names") left ra_submit with a single function call. Move ra_submit to internal.h and inline it to save some stack. Thanks to Andrew Morton for commenting different versions. Signed-off-by: Fabian Frederick Suggested-by: Andrew Morton Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Mel Gorman Signed-off-by: Greg Kroah-Hartman commit 9fb77c771373c078f93807f077c29ebafe720a25 Author: Al Viro Date: Sun Feb 2 22:10:25 2014 -0500 callers of iov_copy_from_user_atomic() don't need pagecache_disable() commit 9e8c2af96e0d2d5fe298dd796fb6bc16e888a48d upstream. ... it does that itself (via kmap_atomic()) Signed-off-by: Al Viro Signed-off-by: Mel Gorman Signed-off-by: Greg Kroah-Hartman commit 034c4b3e832b22ec83e7bd409cf1ad3efba18f45 Author: Sasha Levin Date: Thu Apr 3 14:48:18 2014 -0700 mm: remove read_cache_page_async() commit 67f9fd91f93c582b7de2ab9325b6e179db77e4d5 upstream. This patch removes read_cache_page_async() which wasn't really needed anywhere and simplifies the code around it a bit. read_cache_page_async() is useful when we want to read a page into the cache without waiting for it to complete. This happens when the appropriate callback 'filler' doesn't complete its read operation and releases the page lock immediately, and instead queues a different completion routine to do that. This never actually happened anywhere in the code. read_cache_page_async() had 3 different callers: - read_cache_page() which is the sync version, it would just wait for the requested read to complete using wait_on_page_read(). - JFFS2 would call it from jffs2_gc_fetch_page(), but the filler function it supplied doesn't do any async reads, and would complete before the filler function returns - making it actually a sync read. - CRAMFS would call it using the read_mapping_page_async() wrapper, with a similar story to JFFS2 - the filler function doesn't do anything that reminds async reads and would always complete before the filler function returns. To sum it up, the code in mm/filemap.c never took advantage of having read_cache_page_async(). While there are filler callbacks that do async reads (such as the block one), we always called it with the read_cache_page(). This patch adds a mandatory wait for read to complete when adding a new page to the cache, and removes read_cache_page_async() and its wrappers. Signed-off-by: Sasha Levin Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Mel Gorman Signed-off-by: Greg Kroah-Hartman commit 30fe6d33fe04411ed4e8cf16615cd1ea98c1f88e Author: Johannes Weiner Date: Thu May 22 11:54:17 2014 -0700 mm: madvise: fix MADV_WILLNEED on shmem swapouts commit 55231e5c898c5c03c14194001e349f40f59bd300 upstream. MADV_WILLNEED currently does not read swapped out shmem pages back in. Commit 0cd6144aadd2 ("mm + fs: prepare for non-page entries in page cache radix trees") made find_get_page() filter exceptional radix tree entries but failed to convert all find_get_page() callers that WANT exceptional entries over to find_get_entry(). One of them is shmem swap readahead in madvise, which now skips over any swap-out records. Convert it to find_get_entry(). Fixes: 0cd6144aadd2 ("mm + fs: prepare for non-page entries in page cache radix trees") Signed-off-by: Johannes Weiner Reported-by: Hugh Dickins Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Mel Gorman Signed-off-by: Greg Kroah-Hartman commit 414af56f42cc15271fed99d0acc0f96a3686e458 Author: Johannes Weiner Date: Thu Apr 3 14:47:46 2014 -0700 mm + fs: prepare for non-page entries in page cache radix trees commit 0cd6144aadd2afd19d1aca880153530c52957604 upstream. shmem mappings already contain exceptional entries where swap slot information is remembered. To be able to store eviction information for regular page cache, prepare every site dealing with the radix trees directly to handle entries other than pages. The common lookup functions will filter out non-page entries and return NULL for page cache holes, just as before. But provide a raw version of the API which returns non-page entries as well, and switch shmem over to use it. Signed-off-by: Johannes Weiner Reviewed-by: Rik van Riel Reviewed-by: Minchan Kim Cc: Andrea Arcangeli Cc: Bob Liu Cc: Christoph Hellwig Cc: Dave Chinner Cc: Greg Thelen Cc: Hugh Dickins Cc: Jan Kara Cc: KOSAKI Motohiro Cc: Luigi Semenzato Cc: Mel Gorman Cc: Metin Doslu Cc: Michel Lespinasse Cc: Ozgun Erdogan Cc: Peter Zijlstra Cc: Roman Gushchin Cc: Ryan Mallon Cc: Tejun Heo Cc: Vlastimil Babka Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Mel Gorman Signed-off-by: Greg Kroah-Hartman commit d141bb0e3f48552f3a72a7996e264e12320174ae Author: Johannes Weiner Date: Thu Apr 3 14:47:44 2014 -0700 mm: filemap: move radix tree hole searching here commit e7b563bb2a6f4d974208da46200784b9c5b5a47e upstream. The radix tree hole searching code is only used for page cache, for example the readahead code trying to get a a picture of the area surrounding a fault. It sufficed to rely on the radix tree definition of holes, which is "empty tree slot". But this is about to change, though, as shadow page descriptors will be stored in the page cache after the actual pages get evicted from memory. Move the functions over to mm/filemap.c and make them native page cache operations, where they can later be adapted to handle the new definition of "page cache hole". Signed-off-by: Johannes Weiner Reviewed-by: Rik van Riel Reviewed-by: Minchan Kim Acked-by: Mel Gorman Cc: Andrea Arcangeli Cc: Bob Liu Cc: Christoph Hellwig Cc: Dave Chinner Cc: Greg Thelen Cc: Hugh Dickins Cc: Jan Kara Cc: KOSAKI Motohiro Cc: Luigi Semenzato Cc: Metin Doslu Cc: Michel Lespinasse Cc: Ozgun Erdogan Cc: Peter Zijlstra Cc: Roman Gushchin Cc: Ryan Mallon Cc: Tejun Heo Cc: Vlastimil Babka Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Mel Gorman Signed-off-by: Greg Kroah-Hartman commit c2667299f23105ae24c019699b46022a1701ff0e Author: Johannes Weiner Date: Thu Apr 3 14:47:41 2014 -0700 mm: shmem: save one radix tree lookup when truncating swapped pages commit 6dbaf22ce1f1dfba33313198eb5bd989ae76dd87 upstream. Page cache radix tree slots are usually stabilized by the page lock, but shmem's swap cookies have no such thing. Because the overall truncation loop is lockless, the swap entry is currently confirmed by a tree lookup and then deleted by another tree lookup under the same tree lock region. Use radix_tree_delete_item() instead, which does the verification and deletion with only one lookup. This also allows removing the delete-only special case from shmem_radix_tree_replace(). Signed-off-by: Johannes Weiner Reviewed-by: Minchan Kim Reviewed-by: Rik van Riel Acked-by: Mel Gorman Cc: Andrea Arcangeli Cc: Bob Liu Cc: Christoph Hellwig Cc: Dave Chinner Cc: Greg Thelen Cc: Hugh Dickins Cc: Jan Kara Cc: KOSAKI Motohiro Cc: Luigi Semenzato Cc: Metin Doslu Cc: Michel Lespinasse Cc: Ozgun Erdogan Cc: Peter Zijlstra Cc: Roman Gushchin Cc: Ryan Mallon Cc: Tejun Heo Cc: Vlastimil Babka Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Mel Gorman Signed-off-by: Greg Kroah-Hartman commit d35a6232f850723d46c3d9271a6b6217af58731b Author: Johannes Weiner Date: Thu Apr 3 14:47:39 2014 -0700 lib: radix-tree: add radix_tree_delete_item() commit 53c59f262d747ea82e7414774c59a489501186a0 upstream. Provide a function that does not just delete an entry at a given index, but also allows passing in an expected item. Delete only if that item is still located at the specified index. This is handy when lockless tree traversals want to delete entries as well because they don't have to do an second, locked lookup to verify the slot has not changed under them before deleting the entry. Signed-off-by: Johannes Weiner Reviewed-by: Minchan Kim Reviewed-by: Rik van Riel Acked-by: Mel Gorman Cc: Andrea Arcangeli Cc: Bob Liu Cc: Christoph Hellwig Cc: Dave Chinner Cc: Greg Thelen Cc: Hugh Dickins Cc: Jan Kara Cc: KOSAKI Motohiro Cc: Luigi Semenzato Cc: Metin Doslu Cc: Michel Lespinasse Cc: Ozgun Erdogan Cc: Peter Zijlstra Cc: Roman Gushchin Cc: Ryan Mallon Cc: Tejun Heo Cc: Vlastimil Babka Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Mel Gorman Signed-off-by: Greg Kroah-Hartman commit 197b3975f4b079f004b02786bfa985ec276f6112 Author: Quentin Casasnovas Date: Wed Nov 12 11:19:23 2014 +0100 regmap: fix kernel hang on regmap_bulk_write with zero val_count. Fixes commit 2f06fa04cf35da5c24481da3ac84a2900d0b99c3 which was an incorrect backported version of commit d6b41cb06044a7d895db82bdd54f6e4219970510 upstream. If val_count is zero we return -EINVAL with map->lock_arg locked, which will deadlock the kernel next time we try to acquire this lock. This was introduced by f5942dd ("regmap: fix possible ZERO_SIZE_PTR pointer dereferencing error.") which improperly back-ported d6b41cb0. This issue was found during review of Ubuntu Trusty 3.13.0-40.68 kernel to prepare Ksplice rebootless updates. Fixes: f5942dd ("regmap: fix possible ZERO_SIZE_PTR pointer dereferencing error.") Signed-off-by: Quentin Casasnovas Signed-off-by: Greg Kroah-Hartman commit 84efcb2025064ad5650a476b8beab2fb2c27cc9f Author: Emmanuel Grumbach Date: Tue Sep 23 23:02:41 2014 +0300 iwlwifi: configure the LTR commit 9180ac50716a097a407c6d7e7e4589754a922260 upstream. The LTR is the handshake between the device and the root complex about the latency allowed when the bus exits power save. This configuration was missing and this led to high latency in the link power up. The end user could experience high latency in the network because of this. Signed-off-by: Emmanuel Grumbach Signed-off-by: Greg Kroah-Hartman commit e36b6ac9e011205eb7ad3af329dbd27a21bacd50 Author: Daniel Borkmann Date: Thu Oct 9 22:55:31 2014 +0200 net: sctp: fix skb_over_panic when receiving malformed ASCONF chunks commit 9de7922bc709eee2f609cd01d98aaedc4cf5ea74 upstream. Commit 6f4c618ddb0 ("SCTP : Add paramters validity check for ASCONF chunk") added basic verification of ASCONF chunks, however, it is still possible to remotely crash a server by sending a special crafted ASCONF chunk, even up to pre 2.6.12 kernels: skb_over_panic: text:ffffffffa01ea1c3 len:31056 put:30768 head:ffff88011bd81800 data:ffff88011bd81800 tail:0x7950 end:0x440 dev: ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:129! [...] Call Trace: [] skb_put+0x5c/0x70 [] sctp_addto_chunk+0x63/0xd0 [sctp] [] sctp_process_asconf+0x1af/0x540 [sctp] [] ? _read_unlock_bh+0x15/0x20 [] sctp_sf_do_asconf+0x168/0x240 [sctp] [] sctp_do_sm+0x71/0x1210 [sctp] [] ? fib_rules_lookup+0xad/0xf0 [] ? sctp_cmp_addr_exact+0x32/0x40 [sctp] [] sctp_assoc_bh_rcv+0xd3/0x180 [sctp] [] sctp_inq_push+0x56/0x80 [sctp] [] sctp_rcv+0x982/0xa10 [sctp] [] ? ipt_local_in_hook+0x23/0x28 [iptable_filter] [] ? nf_iterate+0x69/0xb0 [] ? ip_local_deliver_finish+0x0/0x2d0 [] ? nf_hook_slow+0x76/0x120 [] ? ip_local_deliver_finish+0x0/0x2d0 [] ip_local_deliver_finish+0xdd/0x2d0 [] ip_local_deliver+0x98/0xa0 [] ip_rcv_finish+0x12d/0x440 [] ip_rcv+0x275/0x350 [] __netif_receive_skb+0x4ab/0x750 [] netif_receive_skb+0x58/0x60 This can be triggered e.g., through a simple scripted nmap connection scan injecting the chunk after the handshake, for example, ... -------------- INIT[ASCONF; ASCONF_ACK] -------------> <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------ -------------------- COOKIE-ECHO --------------------> <-------------------- COOKIE-ACK --------------------- ------------------ ASCONF; UNKNOWN ------------------> ... where ASCONF chunk of length 280 contains 2 parameters ... 1) Add IP address parameter (param length: 16) 2) Add/del IP address parameter (param length: 255) ... followed by an UNKNOWN chunk of e.g. 4 bytes. Here, the Address Parameter in the ASCONF chunk is even missing, too. This is just an example and similarly-crafted ASCONF chunks could be used just as well. The ASCONF chunk passes through sctp_verify_asconf() as all parameters passed sanity checks, and after walking, we ended up successfully at the chunk end boundary, and thus may invoke sctp_process_asconf(). Parameter walking is done with WORD_ROUND() to take padding into account. In sctp_process_asconf()'s TLV processing, we may fail in sctp_process_asconf_param() e.g., due to removal of the IP address that is also the source address of the packet containing the ASCONF chunk, and thus we need to add all TLVs after the failure to our ASCONF response to remote via helper function sctp_add_asconf_response(), which basically invokes a sctp_addto_chunk() adding the error parameters to the given skb. When walking to the next parameter this time, we proceed with ... length = ntohs(asconf_param->param_hdr.length); asconf_param = (void *)asconf_param + length; ... instead of the WORD_ROUND()'ed length, thus resulting here in an off-by-one that leads to reading the follow-up garbage parameter length of 12336, and thus throwing an skb_over_panic for the reply when trying to sctp_addto_chunk() next time, which implicitly calls the skb_put() with that length. Fix it by using sctp_walk_params() [ which is also used in INIT parameter processing ] macro in the verification *and* in ASCONF processing: it will make sure we don't spill over, that we walk parameters WORD_ROUND()'ed. Moreover, we're being more defensive and guard against unknown parameter types and missized addresses. Joint work with Vlad Yasevich. Fixes: b896b82be4ae ("[SCTP] ADDIP: Support for processing incoming ASCONF_ACK chunks.") Signed-off-by: Daniel Borkmann Signed-off-by: Vlad Yasevich Acked-by: Neil Horman Signed-off-by: David S. Miller Cc: Josh Boyer Signed-off-by: Greg Kroah-Hartman commit 59ea8663e3a7fc3a0c2841e310b83f7aaec1c017 Author: Daniel Borkmann Date: Thu Oct 9 22:55:32 2014 +0200 net: sctp: fix panic on duplicate ASCONF chunks commit b69040d8e39f20d5215a03502a8e8b4c6ab78395 upstream. When receiving a e.g. semi-good formed connection scan in the form of ... -------------- INIT[ASCONF; ASCONF_ACK] -------------> <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------ -------------------- COOKIE-ECHO --------------------> <-------------------- COOKIE-ACK --------------------- ---------------- ASCONF_a; ASCONF_b -----------------> ... where ASCONF_a equals ASCONF_b chunk (at least both serials need to be equal), we panic an SCTP server! The problem is that good-formed ASCONF chunks that we reply with ASCONF_ACK chunks are cached per serial. Thus, when we receive a same ASCONF chunk twice (e.g. through a lost ASCONF_ACK), we do not need to process them again on the server side (that was the idea, also proposed in the RFC). Instead, we know it was cached and we just resend the cached chunk instead. So far, so good. Where things get nasty is in SCTP's side effect interpreter, that is, sctp_cmd_interpreter(): While incoming ASCONF_a (chunk = event_arg) is being marked !end_of_packet and !singleton, and we have an association context, we do not flush the outqueue the first time after processing the ASCONF_ACK singleton chunk via SCTP_CMD_REPLY. Instead, we keep it queued up, although we set local_cork to 1. Commit 2e3216cd54b1 changed the precedence, so that as long as we get bundled, incoming chunks we try possible bundling on outgoing queue as well. Before this commit, we would just flush the output queue. Now, while ASCONF_a's ASCONF_ACK sits in the corked outq, we continue to process the same ASCONF_b chunk from the packet. As we have cached the previous ASCONF_ACK, we find it, grab it and do another SCTP_CMD_REPLY command on it. So, effectively, we rip the chunk->list pointers and requeue the same ASCONF_ACK chunk another time. Since we process ASCONF_b, it's correctly marked with end_of_packet and we enforce an uncork, and thus flush, thus crashing the kernel. Fix it by testing if the ASCONF_ACK is currently pending and if that is the case, do not requeue it. When flushing the output queue we may relink the chunk for preparing an outgoing packet, but eventually unlink it when it's copied into the skb right before transmission. Joint work with Vlad Yasevich. Fixes: 2e3216cd54b1 ("sctp: Follow security requirement of responding with 1 packet") Signed-off-by: Daniel Borkmann Signed-off-by: Vlad Yasevich Signed-off-by: David S. Miller Cc: Josh Boyer Signed-off-by: Greg Kroah-Hartman commit 75680aa393f12465fc10642d2d55be49a333d828 Author: Daniel Borkmann Date: Thu Oct 9 22:55:33 2014 +0200 net: sctp: fix remote memory pressure from excessive queueing commit 26b87c7881006311828bb0ab271a551a62dcceb4 upstream. This scenario is not limited to ASCONF, just taken as one example triggering the issue. When receiving ASCONF probes in the form of ... -------------- INIT[ASCONF; ASCONF_ACK] -------------> <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------ -------------------- COOKIE-ECHO --------------------> <-------------------- COOKIE-ACK --------------------- ---- ASCONF_a; [ASCONF_b; ...; ASCONF_n;] JUNK ------> [...] ---- ASCONF_m; [ASCONF_o; ...; ASCONF_z;] JUNK ------> ... where ASCONF_a, ASCONF_b, ..., ASCONF_z are good-formed ASCONFs and have increasing serial numbers, we process such ASCONF chunk(s) marked with !end_of_packet and !singleton, since we have not yet reached the SCTP packet end. SCTP does only do verification on a chunk by chunk basis, as an SCTP packet is nothing more than just a container of a stream of chunks which it eats up one by one. We could run into the case that we receive a packet with a malformed tail, above marked as trailing JUNK. All previous chunks are here goodformed, so the stack will eat up all previous chunks up to this point. In case JUNK does not fit into a chunk header and there are no more other chunks in the input queue, or in case JUNK contains a garbage chunk header, but the encoded chunk length would exceed the skb tail, or we came here from an entirely different scenario and the chunk has pdiscard=1 mark (without having had a flush point), it will happen, that we will excessively queue up the association's output queue (a correct final chunk may then turn it into a response flood when flushing the queue ;)): I ran a simple script with incremental ASCONF serial numbers and could see the server side consuming excessive amount of RAM [before/after: up to 2GB and more]. The issue at heart is that the chunk train basically ends with !end_of_packet and !singleton markers and since commit 2e3216cd54b1 ("sctp: Follow security requirement of responding with 1 packet") therefore preventing an output queue flush point in sctp_do_sm() -> sctp_cmd_interpreter() on the input chunk (chunk = event_arg) even though local_cork is set, but its precedence has changed since then. In the normal case, the last chunk with end_of_packet=1 would trigger the queue flush to accommodate possible outgoing bundling. In the input queue, sctp_inq_pop() seems to do the right thing in terms of discarding invalid chunks. So, above JUNK will not enter the state machine and instead be released and exit the sctp_assoc_bh_rcv() chunk processing loop. It's simply the flush point being missing at loop exit. Adding a try-flush approach on the output queue might not work as the underlying infrastructure might be long gone at this point due to the side-effect interpreter run. One possibility, albeit a bit of a kludge, would be to defer invalid chunk freeing into the state machine in order to possibly trigger packet discards and thus indirectly a queue flush on error. It would surely be better to discard chunks as in the current, perhaps better controlled environment, but going back and forth, it's simply architecturally not possible. I tried various trailing JUNK attack cases and it seems to look good now. Joint work with Vlad Yasevich. Fixes: 2e3216cd54b1 ("sctp: Follow security requirement of responding with 1 packet") Signed-off-by: Daniel Borkmann Signed-off-by: Vlad Yasevich Signed-off-by: David S. Miller Cc: Josh Boyer Signed-off-by: Greg Kroah-Hartman commit d8af79d3cb4a181d3265b1419e63828d2487b3df Author: Nadav Amit Date: Wed Sep 17 02:50:50 2014 +0300 KVM: x86: Don't report guest userspace emulation error to userspace commit a2b9e6c1a35afcc0973acb72e591c714e78885ff upstream. Commit fc3a9157d314 ("KVM: X86: Don't report L2 emulation failures to user-space") disabled the reporting of L2 (nested guest) emulation failures to userspace due to race-condition between a vmexit and the instruction emulator. The same rational applies also to userspace applications that are permitted by the guest OS to access MMIO area or perform PIO. This patch extends the current behavior - of injecting a #UD instead of reporting it to userspace - also for guest userspace code. Signed-off-by: Nadav Amit Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman commit 8e751287c36c060dbf4cffb6c3875821bd64495b Author: Vince Weaver Date: Mon Jul 14 15:33:25 2014 -0400 perf/x86/intel: Use proper dTLB-load-misses event on IvyBridge commit 1996388e9f4e3444db8273bc08d25164d2967c21 upstream. This was discussed back in February: https://lkml.org/lkml/2014/2/18/956 But I never saw a patch come out of it. On IvyBridge we share the SandyBridge cache event tables, but the dTLB-load-miss event is not compatible. Patch it up after the fact to the proper DTLB_LOAD_MISSES.DEMAND_LD_MISS_CAUSES_A_WALK Signed-off-by: Vince Weaver Signed-off-by: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Linus Torvalds Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1407141528200.17214@vincent-weaver-1.umelst.maine.edu Signed-off-by: Ingo Molnar Cc: Hou Pengyang Signed-off-by: Greg Kroah-Hartman commit e252f74ecd88f9c4d7a1b14cfcb5bb93c517c118 Author: Pawel Moll Date: Fri Jun 13 16:03:32 2014 +0100 perf: Handle compat ioctl commit b3f207855f57b9c8f43a547a801340bb5cbc59e5 upstream. When running a 32-bit userspace on a 64-bit kernel (eg. i386 application on x86_64 kernel or 32-bit arm userspace on arm64 kernel) some of the perf ioctls must be treated with special care, as they have a pointer size encoded in the command. For example, PERF_EVENT_IOC_ID in 32-bit world will be encoded as 0x80042407, but 64-bit kernel will expect 0x80082407. In result the ioctl will fail returning -ENOTTY. This patch solves the problem by adding code fixing up the size as compat_ioctl file operation. Reported-by: Drew Richardson Signed-off-by: Pawel Moll Signed-off-by: Peter Zijlstra Cc: Arnaldo Carvalho de Melo Cc: Jiri Olsa Link: http://lkml.kernel.org/r/1402671812-9078-1-git-send-email-pawel.moll@arm.com Signed-off-by: Ingo Molnar Signed-off-by: David Ahern Signed-off-by: Greg Kroah-Hartman commit d07dd9bce7b90980393cecd372b08600e0f345de Author: Pali Rohár Date: Mon Sep 29 15:10:51 2014 +0200 dell-wmi: Fix access out of memory commit a666b6ffbc9b6705a3ced704f52c3fe9ea8bf959 upstream. Without this patch, dell-wmi is trying to access elements of dynamically allocated array without checking the array size. This can lead to memory corruption or a kernel panic. This patch adds the missing checks for array size. Signed-off-by: Pali Rohár Signed-off-by: Darren Hart Signed-off-by: Greg Kroah-Hartman commit 42d49f4525661181162386eea643275a5de10b59 Author: Pranith Kumar Date: Tue Aug 12 13:07:47 2014 -0400 rcu: Use rcu_gp_kthread_wake() to wake up grace period kthreads commit 2aa792e6faf1a00f5accf1f69e87e11a390ba2cd upstream. The rcu_gp_kthread_wake() function checks for three conditions before waking up grace period kthreads: * Is the thread we are trying to wake up the current thread? * Are the gp_flags zero? (all threads wait on non-zero gp_flags condition) * Is there no thread created for this flavour, hence nothing to wake up? If any one of these condition is true, we do not call wake_up(). It was found that there are quite a few avoidable wake ups both during idle time and under stress induced by rcutorture. Idle: Total:66000, unnecessary:66000, case1:61827, case2:66000, case3:0 Total:68000, unnecessary:68000, case1:63696, case2:68000, case3:0 rcutorture: Total:254000, unnecessary:254000, case1:199913, case2:254000, case3:0 Total:256000, unnecessary:256000, case1:201784, case2:256000, case3:0 Here case{1-3} are the cases listed above. We can avoid these wake ups by using rcu_gp_kthread_wake() to conditionally wake up the grace period kthreads. There is a comment about an implied barrier supplied by the wake_up() logic. This barrier is necessary for the awakened thread to see the updated ->gp_flags. This flag is always being updated with the root node lock held. Also, the awakened thread tries to acquire the root node lock before reading ->gp_flags because of which there is proper ordering. Hence this commit tries to avoid calling wake_up() whenever we can by using rcu_gp_kthread_wake() function. Signed-off-by: Pranith Kumar CC: Mathieu Desnoyers Signed-off-by: Paul E. McKenney Cc: Kamal Mostafa Signed-off-by: Greg Kroah-Hartman commit 35cbd149f07864e5d68c8368549a03609fddd1a3 Author: Paul E. McKenney Date: Tue Mar 11 13:02:16 2014 -0700 rcu: Make callers awaken grace-period kthread commit 48a7639ce80cf279834d0d44865e49ecd714f37d upstream. The rcu_start_gp_advanced() function currently uses irq_work_queue() to defer wakeups of the RCU grace-period kthread. This deferring is necessary to avoid RCU-scheduler deadlocks involving the rcu_node structure's lock, meaning that RCU cannot call any of the scheduler's wake-up functions while holding one of these locks. Unfortunately, the second and subsequent calls to irq_work_queue() are ignored, and the first call will be ignored (aside from queuing the work item) if the scheduler-clock tick is turned off. This is OK for many uses, especially those where irq_work_queue() is called from an interrupt or softirq handler, because in those cases the scheduler-clock-tick state will be re-evaluated, which will turn the scheduler-clock tick back on. On the next tick, any deferred work will then be processed. However, this strategy does not always work for RCU, which can be invoked at process level from idle CPUs. In this case, the tick might never be turned back on, indefinitely defering a grace-period start request. Note that the RCU CPU stall detector cannot see this condition, because there is no RCU grace period in progress. Therefore, we can (and do!) see long tens-of-seconds stalls in grace-period handling. In theory, we could see a full grace-period hang, but rcutorture testing to date has seen only the tens-of-seconds stalls. Event tracing demonstrates that irq_work_queue() is being called repeatedly to no effect during these stalls: The "newreq" event appears repeatedly from a task that is not one of the grace-period kthreads. In theory, irq_work_queue() might be fixed to avoid this sort of issue, but RCU's requirements are unusual and it is quite straightforward to pass wake-up responsibility up through RCU's call chain, so that the wakeup happens when the offending locks are released. This commit therefore makes this change. The rcu_start_gp_advanced(), rcu_start_future_gp(), rcu_accelerate_cbs(), rcu_advance_cbs(), __note_gp_changes(), and rcu_start_gp() functions now return a boolean which indicates when a wake-up is needed. A new rcu_gp_kthread_wake() does the wakeup when it is necessary and safe to do so: No self-wakes, no wake-ups if the ->gp_flags field indicates there is no need (as in someone else did the wake-up before we got around to it), and no wake-ups before the grace-period kthread has been created. Signed-off-by: Paul E. McKenney Cc: Peter Zijlstra Cc: Steven Rostedt Cc: Frederic Weisbecker Reviewed-by: Josh Triplett [ Pranith: backport to 3.13-stable: just rcu_gp_kthread_wake(), prereq for 2aa792e "rcu: Use rcu_gp_kthread_wake() to wake up grace period kthreads" ] Signed-off-by: Pranith Kumar Signed-off-by: Kamal Mostafa Signed-off-by: Greg Kroah-Hartman commit 646ab9b65015fcc96e1fa6ba6425ab8f1d97de56 Author: Steven Whitehouse Date: Mon Mar 31 17:48:27 2014 +0100 GFS2: Fix address space from page function commit 1b2ad41214c9bf6e8befa000f0522629194bf540 upstream. Now that rgrps use the address space which is part of the super block, we need to update gfs2_mapping2sbd() to take account of that. The only way to do that easily is to use a different set of address_space_operations for rgrps. Reported-by: Abhi Das Tested-by: Abhi Das Signed-off-by: Steven Whitehouse Signed-off-by: Greg Kroah-Hartman commit 592339d9a7509a252324dade00b73e5024a572d7 Author: Ben Dooks Date: Fri Nov 8 18:29:25 2013 +0000 ARM: probes: fix instruction fetch order with commit 888be25402021a425da3e85e2d5a954d7509286e upstream. If we are running BE8, the data and instruction endianness do not match, so use to correctly translate memory accesses into ARM instructions. Acked-by: Jon Medhurst Signed-off-by: Ben Dooks [taras.kondratiuk@linaro.org: fixed Thumb instruction fetch order] Signed-off-by: Taras Kondratiuk [wangnan: backport to 3.10 and 3.14: - adjust context - backport all changes on arch/arm/kernel/probes.c to arch/arm/kernel/kprobes-common.c since we don't have commit c18377c303787ded44b7decd7dee694db0f205e9. - After the above adjustments, becomes same to Taras Kondratiuk's original patch: http://lists.linaro.org/pipermail/linaro-kernel/2014-January/010346.html ] Signed-off-by: Wang Nan Signed-off-by: Greg Kroah-Hartman commit da478d3c5b48cc1fa5c0b327e71c2fc962f48630 Author: Pablo Neira Date: Tue Jul 29 18:12:15 2014 +0200 netfilter: xt_bpf: add mising opaque struct sk_filter definition commit e10038a8ec06ac819b7552bb67aaa6d2d6f850c1 upstream. This structure is not exposed to userspace, so fix this by defining struct sk_filter; so we skip the casting in kernelspace. This is safe since userspace has no way to lurk with that internal pointer. Fixes: e6f30c7 ("netfilter: x_tables: add xt_bpf match") Signed-off-by: Pablo Neira Ayuso Acked-by: Willem de Bruijn Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 8d445bdcdb7bd750f1a8f490f0d220c736633498 Author: Arturo Borrero Date: Sun Oct 26 12:22:40 2014 +0100 netfilter: nft_compat: fix wrong target lookup in nft_target_select_ops() commit 7965ee93719921ea5978f331da653dfa2d7b99f5 upstream. The code looks for an already loaded target, and the correct list to search is nft_target_list, not nft_match_list. Signed-off-by: Arturo Borrero Gonzalez Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit 2cfb188282653659cbeed6c4918a92347eabd01e Author: Houcheng Lin Date: Thu Oct 23 10:36:08 2014 +0200 netfilter: nf_log: release skbuff on nlmsg put failure commit b51d3fa364885a2c1e1668f88776c67c95291820 upstream. The kernel should reserve enough room in the skb so that the DONE message can always be appended. However, in case of e.g. new attribute erronously not being size-accounted for, __nfulnl_send() will still try to put next nlmsg into this full skbuf, causing the skb to be stuck forever and blocking delivery of further messages. Fix issue by releasing skb immediately after nlmsg_put error and WARN() so we can track down the cause of such size mismatch. [ fw@strlen.de: add tailroom/len info to WARN ] Signed-off-by: Houcheng Lin Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit 74525d5efb7c5d99f1c307ad4ffec7ecbd952acb Author: Florian Westphal Date: Thu Oct 23 10:36:07 2014 +0200 netfilter: nfnetlink_log: fix maximum packet length logged to userspace commit c1e7dc91eed0ed1a51c9b814d648db18bf8fc6e9 upstream. don't try to queue payloads > 0xffff - NLA_HDRLEN, it does not work. The nla length includes the size of the nla struct, so anything larger results in u16 integer overflow. This patch is similar to 9cefbbc9c8f9abe (netfilter: nfnetlink_queue: cleanup copy_range usage). Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit b1fef6b81871a396f3b8702077333e769673c87b Author: Florian Westphal Date: Thu Oct 23 10:36:06 2014 +0200 netfilter: nf_log: account for size of NLMSG_DONE attribute commit 9dfa1dfe4d5e5e66a991321ab08afe69759d797a upstream. We currently neither account for the nlattr size, nor do we consider the size of the trailing NLMSG_DONE when allocating nlmsg skb. This can result in nflog to stop working, as __nfulnl_send() re-tries sending forever if it failed to append NLMSG_DONE (which will never work if buffer is not large enough). Reported-by: Houcheng Lin Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit c74c508e0ee513fbaf7516e8db89e9050a2c0e1e Author: Dan Carpenter Date: Tue Oct 21 11:28:12 2014 +0300 netfilter: ipset: off by one in ip_set_nfnl_get_byindex() commit 0f9f5e1b83abd2b37c67658e02a6fc9001831fa5 upstream. The ->ip_set_list[] array is initialized in ip_set_net_init() and it has ->ip_set_max elements so this check should be >= instead of > otherwise we are off by one. Signed-off-by: Dan Carpenter Acked-by: Jozsef Kadlecsik Signed-off-by: Pablo Neira Ayuso Signed-off-by: Greg Kroah-Hartman commit fdf538ce50f08e3b19ed297705a01c59e593c457 Author: Andrey Vagin Date: Mon Oct 13 15:54:10 2014 -0700 ipc: always handle a new value of auto_msgmni commit 1195d94e006b23c6292e78857e154872e33b6d7e upstream. proc_dointvec_minmax() returns zero if a new value has been set. So we don't need to check all charecters have been handled. Below you can find two examples. In the new value has not been handled properly. $ strace ./a.out open("/proc/sys/kernel/auto_msgmni", O_WRONLY) = 3 write(3, "0\n\0", 3) = 2 close(3) = 0 exit_group(0) $ cat /sys/kernel/debug/tracing/trace $strace ./a.out open("/proc/sys/kernel/auto_msgmni", O_WRONLY) = 3 write(3, "0\n", 2) = 2 close(3) = 0 $ cat /sys/kernel/debug/tracing/trace a.out-697 [000] .... 3280.998235: unregister_ipcns_notifier <-proc_ipcauto_dointvec_minmax Fixes: 9eefe520c814 ("ipc: do not use a negative value to re-enable msgmni automatic recomputin") Signed-off-by: Andrey Vagin Cc: Mathias Krause Cc: Manfred Spraul Cc: Joe Perches Cc: Davidlohr Bueso Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit ff1d3b7890f9d58aa057eadad1bb0c9e368091fe Author: Devesh Sharma Date: Fri Sep 26 20:45:32 2014 +0530 IB/core: Clear AH attr variable to prevent garbage data commit 8b0f93d9490653a7b9fc91f3570089132faed1c0 upstream. During create-ah from userspace, uverbs is sending garbage data in attr.dmac and attr.vlan_id. This patch sets attr.dmac and attr.vlan_id to zero. Fixes: dd5f03beb4f7 ("IB/core: Ethernet L2 attributes in verbs/cm structures") Signed-off-by: Devesh Sharma Signed-off-by: Roland Dreier Signed-off-by: Greg Kroah-Hartman commit ce1d89b64cb210b29bd4802549d831577f005816 Author: Bjorn Helgaas Date: Mon Oct 13 18:59:09 2014 -0600 clocksource: Remove "weak" from clocksource_default_clock() declaration commit 96a2adbc6f501996418da9f7afe39bf0e4d006a9 upstream. kernel/time/jiffies.c provides a default clocksource_default_clock() definition explicitly marked "weak". arch/s390 provides its own definition intended to override the default, but the "weak" attribute on the declaration applied to the s390 definition as well, so the linker chose one based on link order (see 10629d711ed7 ("PCI: Remove __weak annotation from pcibios_get_phb_of_node decl")). Remove the "weak" attribute from the clocksource_default_clock() declaration so we always prefer a non-weak definition over the weak one, independent of link order. Fixes: f1b82746c1e9 ("clocksource: Cleanup clocksource selection") Signed-off-by: Bjorn Helgaas Acked-by: John Stultz Acked-by: Ingo Molnar CC: Daniel Lezcano CC: Martin Schwidefsky Signed-off-by: Greg Kroah-Hartman commit 82da7a705d246e9d6a7fa308734a646f31e72b11 Author: Bjorn Helgaas Date: Mon Oct 13 19:00:25 2014 -0600 kgdb: Remove "weak" from kgdb_arch_pc() declaration commit 107bcc6d566cb40184068d888637f9aefe6252dd upstream. kernel/debug/debug_core.c provides a default kgdb_arch_pc() definition explicitly marked "weak". Several architectures provide their own definitions intended to override the default, but the "weak" attribute on the declaration applied to the arch definitions as well, so the linker chose one based on link order (see 10629d711ed7 ("PCI: Remove __weak annotation from pcibios_get_phb_of_node decl")). Remove the "weak" attribute from the declaration so we always prefer a non-weak definition over the weak one, independent of link order. Fixes: 688b744d8bc8 ("kgdb: fix signedness mixmatches, add statics, add declaration to header") Tested-by: Vineet Gupta # for ARC build Signed-off-by: Bjorn Helgaas Reviewed-by: Harvey Harrison Signed-off-by: Greg Kroah-Hartman commit 7a74695ecc5bdf710d8d079bac474aa0ede6ba34 Author: Bjorn Helgaas Date: Mon Oct 13 18:59:41 2014 -0600 vmcore: Remove "weak" from function declarations commit 5ab03ac5aaa1f032e071f1b3dc433b7839359c03 upstream. For the following functions: elfcorehdr_alloc() elfcorehdr_free() elfcorehdr_read() elfcorehdr_read_notes() remap_oldmem_pfn_range() fs/proc/vmcore.c provides default definitions explicitly marked "weak". arch/s390 provides its own definitions intended to override the default ones, but the "weak" attribute on the declarations applied to the s390 definitions as well, so the linker chose one based on link order (see 10629d711ed7 ("PCI: Remove __weak annotation from pcibios_get_phb_of_node decl")). Remove the "weak" attribute from the declarations so we always prefer a non-weak definition over the weak one, independent of link order. Fixes: be8a8d069e50 ("vmcore: introduce ELF header in new memory feature") Fixes: 9cb218131de1 ("vmcore: introduce remap_oldmem_pfn_range()") Signed-off-by: Bjorn Helgaas Acked-by: Andrew Morton Acked-by: Vivek Goyal CC: Michael Holzheu Signed-off-by: Greg Kroah-Hartman commit 8ec1a6d3a273f2336dbc8a606c0f1089047782d1 Author: Bjorn Helgaas Date: Mon Oct 13 19:00:47 2014 -0600 memory-hotplug: Remove "weak" from memory_block_size_bytes() declaration commit e0a8400c6923a163265d52798cdd4c33f3f8ab5a upstream. drivers/base/memory.c provides a default memory_block_size_bytes() definition explicitly marked "weak". Several architectures provide their own definitions intended to override the default, but the "weak" attribute on the declaration applied to the arch definitions as well, so the linker chose one based on link order (see 10629d711ed7 ("PCI: Remove __weak annotation from pcibios_get_phb_of_node decl")). Remove the "weak" attribute from the declaration so we always prefer a non-weak definition over the weak one, independent of link order. Fixes: 41f107266b19 ("drivers: base: Add prototype declaration to the header file") Signed-off-by: Bjorn Helgaas Acked-by: Andrew Morton CC: Rashika Kheria CC: Nathan Fontenot CC: Anton Blanchard CC: Heiko Carstens CC: Yinghai Lu Signed-off-by: Greg Kroah-Hartman commit c8e0fd4818f29aaafafb01f0bacf376b86e82830 Author: Dan Carpenter Date: Fri Sep 5 09:09:28 2014 -0300 media: ttusb-dec: buffer overflow in ioctl commit f2e323ec96077642d397bb1c355def536d489d16 upstream. We need to add a limit check here so we don't overflow the buffer. Signed-off-by: Dan Carpenter Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Greg Kroah-Hartman commit 215894c980ccbc13d3ca8333cf329f305e6014d8 Author: Trond Myklebust Date: Wed Nov 12 14:44:49 2014 -0500 NFSv4.1: nfs41_clear_delegation_stateid shouldn't trust NFS_DELEGATED_STATE commit 0c116cadd94b16b30b1dd90d38b2784d9b39b01a upstream. This patch removes the assumption made previously, that we only need to check the delegation stateid when it matches the stateid on a cached open. If we believe that we hold a delegation for this file, then we must assume that its stateid may have been revoked or expired too. If we don't test it then our state recovery process may end up caching open/lock state in a situation where it should not. We therefore rename the function nfs41_clear_delegation_stateid as nfs41_check_delegation_stateid, and change it to always run through the delegation stateid test and recovery process as outlined in RFC5661. http://lkml.kernel.org/r/CAN-5tyHwG=Cn2Q9KsHWadewjpTTy_K26ee+UnSvHvG4192p-Xw@mail.gmail.com Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman commit cc7fa4c0e0e98244d3b16b08524d7536da3334a8 Author: Trond Myklebust Date: Mon Nov 10 18:43:56 2014 -0500 NFSv4: Fix races between nfs_remove_bad_delegation() and delegation return commit 869f9dfa4d6d57b79e0afc3af14772c2a023eeb1 upstream. Any attempt to call nfs_remove_bad_delegation() while a delegation is being returned is currently a no-op. This means that we can end up looping forever in nfs_end_delegation_return() if something causes the delegation to be revoked. This patch adds a mechanism whereby the state recovery code can communicate to the delegation return code that the delegation is no longer valid and that it should not be used when reclaiming state. It also changes the return value for nfs4_handle_delegation_recall_error() to ensure that nfs_end_delegation_return() does not reattempt the lock reclaim before state recovery is done. http://lkml.kernel.org/r/CAN-5tyHwG=Cn2Q9KsHWadewjpTTy_K26ee+UnSvHvG4192p-Xw@mail.gmail.com Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman commit ba5b9d07bd3b322ffb60c2bf464532213764f45c Author: Jan Kara Date: Thu Oct 23 14:02:47 2014 +0200 nfs: Fix use of uninitialized variable in nfs_getattr() commit 16caf5b6101d03335b386e77e9e14136f989be87 upstream. Variable 'err' needn't be initialized when nfs_getattr() uses it to check whether it should call generic_fillattr() or not. That can result in spurious error returns. Initialize 'err' properly. Signed-off-by: Jan Kara Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman commit 5d59a6f54c7afc72b79d7e4fde05ada706b5e6bc Author: Trond Myklebust Date: Fri Oct 17 23:02:52 2014 +0300 NFS: Don't try to reclaim delegation open state if recovery failed commit f8ebf7a8ca35dde321f0cd385fee6f1950609367 upstream. If state recovery failed, then we should not attempt to reclaim delegated state. http://lkml.kernel.org/r/CAN-5tyHwG=Cn2Q9KsHWadewjpTTy_K26ee+UnSvHvG4192p-Xw@mail.gmail.com Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman commit b8bc6004711c8615ec377fba64077c4fc696589a Author: Trond Myklebust Date: Fri Oct 17 15:10:25 2014 +0300 NFSv4: Ensure that we remove NFSv4.0 delegations when state has expired commit 4dfd4f7af0afd201706ad186352ca423b0f17d4b upstream. NFSv4.0 does not have TEST_STATEID/FREE_STATEID functionality, so unlike NFSv4.1, the recovery procedure when stateids have expired or have been revoked requires us to just forget the delegation. http://lkml.kernel.org/r/CAN-5tyHwG=Cn2Q9KsHWadewjpTTy_K26ee+UnSvHvG4192p-Xw@mail.gmail.com Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman commit 4e2e6c8457fabdd38b9e9b3b83d46a3e5eeb0bfc Author: NeilBrown Date: Wed Oct 29 08:49:50 2014 +1100 md: Always set RECOVERY_NEEDED when clearing RECOVERY_FROZEN commit 45eaf45dfa4850df16bc2e8e7903d89021137f40 upstream. md_check_recovery will skip any recovery and also clear MD_RECOVERY_NEEDED if MD_RECOVERY_FROZEN is set. So when we clear _FROZEN, we must set _NEEDED and ensure that md_check_recovery gets run. Otherwise we could miss out on something that is needed. In particular, this can make it impossible to remove a failed device from an array is the 'recovery-needed' processing didn't happen. Suitable for stable kernels since 3.13. Reported-and-tested-by: Joe Lawrence Fixes: 30b8feb730f9b9b3c5de02580897da03f59b6b16 Signed-off-by: NeilBrown Signed-off-by: Greg Kroah-Hartman commit 57c340a8ca1133e9771a533aecd117414a499a4b Author: Junjie Mao Date: Fri Oct 31 21:40:38 2014 +0800 x86, kaslr: Prevent .bss from overlaping initrd commit e6023367d779060fddc9a52d1f474085b2b36298 upstream. When choosing a random address, the current implementation does not take into account the reversed space for .bss and .brk sections. Thus the relocated kernel may overlap other components in memory. Here is an example of the overlap from a x86_64 kernel in qemu (the ranges of physical addresses are presented): Physical Address 0x0fe00000 --+--------------------+ <-- randomized base / | relocated kernel | vmlinux.bin | (from vmlinux.bin) | 0x1336d000 (an ELF file) +--------------------+-- \ | | \ 0x1376d870 --+--------------------+ | | relocs table | | 0x13c1c2a8 +--------------------+ .bss and .brk | | | 0x13ce6000 +--------------------+ | | | / 0x13f77000 | initrd |-- | | 0x13fef374 +--------------------+ The initrd image will then be overwritten by the memset during early initialization: [ 1.655204] Unpacking initramfs... [ 1.662831] Initramfs unpacking failed: junk in compressed archive This patch prevents the above situation by requiring a larger space when looking for a random kernel base, so that existing logic can effectively avoids the overlap. [kees: switched to perl to avoid hex translation pain in mawk vs gawk] [kees: calculated overlap without relocs table] Fixes: 82fa9637a2 ("x86, kaslr: Select random position from e820 maps") Reported-by: Fengguang Wu Signed-off-by: Junjie Mao Signed-off-by: Kees Cook Cc: Josh Triplett Cc: Matt Fleming Cc: Ard Biesheuvel Cc: Vivek Goyal Cc: Andi Kleen Link: http://lkml.kernel.org/r/1414762838-13067-1-git-send-email-eternal.n08@gmail.com Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman commit 60f8e109c344d9fce0f8cc81bf0891cb25612ce8 Author: Borislav Petkov Date: Wed Nov 5 17:42:42 2014 +0100 x86, microcode, AMD: Fix ucode patch stashing on 32-bit commit c0a717f23dccdb6e3b03471bc846fdc636f2b353 upstream. Save the patch while we're running on the BSP instead of later, before the initrd has been jettisoned. More importantly, on 32-bit we need to access the physical address instead of the virtual. This way we actually do find it on the APs instead of having to go through the initrd each time. Tested-by: Richard Hendershot Fixes: 5335ba5cf475 ("x86, microcode, AMD: Fix early ucode loading") Signed-off-by: Borislav Petkov Signed-off-by: Greg Kroah-Hartman commit af1017e6645da7d3c30f1e6b3c59ceb6219bf0b2 Author: Borislav Petkov Date: Fri Oct 31 23:23:43 2014 +0100 x86, microcode, AMD: Fix early ucode loading on 32-bit commit 4750a0d112cbfcc744929f1530ffe3193436766c upstream. Konrad triggered the following splat below in a 32-bit guest on an AMD box. As it turns out, in save_microcode_in_initrd_amd() we're using the *physical* address of the container *after* we have enabled paging and thus we #PF in load_microcode_amd() when trying to access the microcode container in the ramdisk range. Because the ramdisk is exactly there: [ 0.000000] RAMDISK: [mem 0x35e04000-0x36ef9fff] and we fault at 0x35e04304. And since this guest doesn't relocate the ramdisk, we don't do the computation which will give us the correct virtual address and we end up with the PA. So, we should actually be using virtual addresses on 32-bit too by the time we're freeing the initrd. Do that then! Unpacking initramfs... BUG: unable to handle kernel paging request at 35d4e304 IP: [] load_microcode_amd+0x25/0x4a0 *pde = 00000000 Oops: 0000 [#1] SMP Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.17.1-302.fc21.i686 #1 Hardware name: Xen HVM domU, BIOS 4.4.1 10/01/2014 task: f5098000 ti: f50d0000 task.ti: f50d0000 EIP: 0060:[] EFLAGS: 00010246 CPU: 0 EIP is at load_microcode_amd+0x25/0x4a0 EAX: 00000000 EBX: f6e9ec4c ECX: 00001ec4 EDX: 00000000 ESI: f5d4e000 EDI: 35d4e2fc EBP: f50d1ed0 ESP: f50d1e94 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 CR0: 8005003b CR2: 35d4e304 CR3: 00e33000 CR4: 000406d0 Stack: 00000000 00000000 f50d1ebc f50d1ec4 f5d4e000 c0d7735a f50d1ed0 15a3d17f f50d1ec4 00600f20 00001ec4 bfb83203 f6e9ec4c f5d4e000 c0d7735a f50d1ed8 c0d80861 f50d1ee0 c0d80429 f50d1ef0 c0d889a9 f5d4e000 c0000000 f50d1f04 Call Trace: ? unpack_to_rootfs ? unpack_to_rootfs save_microcode_in_initrd_amd save_microcode_in_initrd free_initrd_mem populate_rootfs ? unpack_to_rootfs do_one_initcall ? unpack_to_rootfs ? repair_env_string ? proc_mkdir kernel_init_freeable kernel_init ret_from_kernel_thread ? rest_init Reported-and-tested-by: Konrad Rzeszutek Wilk References: https://bugzilla.redhat.com/show_bug.cgi?id=1158204 Fixes: 75a1ba5b2c52 ("x86, microcode, AMD: Unify valid container checks") Signed-off-by: Borislav Petkov Link: http://lkml.kernel.org/r/20141101100100.GA4462@pd.tnic Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman commit 699c202790c66f5ff7fff2ac598f4b106c5d2f5e Author: Krzysztof Kozlowski Date: Wed Oct 15 16:25:10 2014 +0200 power: bq2415x_charger: Fix memory leak on DTS parsing error commit 21e863b233553998737e1b506c823a00bf012e00 upstream. Memory allocated for 'name' was leaking if required binding properties were not present. The memory for 'name' was allocated early at probe with kasprintf(). It was freed in error paths executed before and after parsing DTS but not in that error path. Fix the error path for parsing device tree properties. Signed-off-by: Krzysztof Kozlowski Fixes: faffd234cf85 ("bq2415x_charger: Add DT support") Signed-off-by: Sebastian Reichel Signed-off-by: Greg Kroah-Hartman commit 169aa821d1a3694d488c39f748b903e5084095bf Author: Krzysztof Kozlowski Date: Wed Oct 15 16:25:09 2014 +0200 power: bq2415x_charger: Properly handle ENODEV from power_supply_get_by_phandle commit 0eaf437aa14949d2230aeab7364f4ab47901304a upstream. The power_supply_get_by_phandle() on error returns ENODEV or NULL. The driver later expects obtained pointer to power supply to be valid or NULL. If it is not NULL then it dereferences it in bq2415x_notifier_call() which would lead to dereferencing ENODEV-value pointer. Properly handle the power_supply_get_by_phandle() error case by replacing error value with NULL. This indicates that usb charger detection won't be used. Fix also memory leak of 'name' if power_supply_get_by_phandle() fails with NULL and probe should defer. Signed-off-by: Krzysztof Kozlowski Fixes: faffd234cf85 ("bq2415x_charger: Add DT support") [small fix regarding the missing ti,usb-charger-detection info message] Signed-off-by: Sebastian Reichel Signed-off-by: Greg Kroah-Hartman commit 1f863a274f70ce2f4bd467e4d090a396dfdbd5e5 Author: Krzysztof Kozlowski Date: Mon Oct 13 15:34:31 2014 +0200 power: charger-manager: Fix accessing invalidated power supply after charger unbind commit cdaf3e15385d3232b52287e50692506f8fd01a09 upstream. The charger manager obtained in probe references to power supplies for all chargers with power_supply_get_by_name() for later usage. However if such charger driver was removed then this reference would point to old power supply (from driver which was removed). This lead to accessing invalid memory which could be observed with: $ echo "max77693-charger" > /sys/bus/platform/drivers/max77693-charger/unbind $ grep . /sys/devices/virtual/power_supply/battery/charger.0/* $ grep . /sys/devices/virtual/power_supply/battery/* [ 15.339817] Unable to handle kernel paging request at virtual address 0001c12c [ 15.346187] pgd = edd08000 [ 15.348814] [0001c12c] *pgd=6dce2831, *pte=00000000, *ppte=00000000 [ 15.355075] Internal error: Oops: 80000007 [#1] PREEMPT SMP ARM [ 15.360967] Modules linked in: [ 15.364010] CPU: 2 PID: 1388 Comm: grep Not tainted 3.17.0-next-20141007-00027-ga95e761db1b0 #245 [ 15.372859] task: ee03ad00 ti: edcf6000 task.ti: edcf6000 [ 15.378241] PC is at 0x1c12c [ 15.381113] LR is at is_ext_pwr_online+0x30/0x6c [ 15.385706] pc : [<0001c12c>] lr : [] psr: a0000013 [ 15.385706] sp : edcf7e88 ip : 00000000 fp : 00000000 [ 15.397161] r10: eeb02c08 r9 : c04b1f84 r8 : eeb02c00 [ 15.402369] r7 : edc69a10 r6 : eea6ac10 r5 : eea6ac10 r4 : 00000004 [ 15.408878] r3 : 0001c12c r2 : edcf7e8c r1 : 00000004 r0 : ee914418 [ 15.415390] Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user [ 15.422506] Control: 10c5387d Table: 6dd0804a DAC: 00000015 [ 15.428236] Process grep (pid: 1388, stack limit = 0xedcf6240) [ 15.434050] Stack: (0xedcf7e88 to 0xedcf8000) [ 15.438395] 7e80: ee03ad00 00000000 edcf7f80 eea6aca8 edcf7ec4 c033b7b0 [ 15.446554] 7ea0: 00000001 ee1cc3f0 00000004 c06e1e44 eebdc000 c06e1e44 eeb02c00 c0337144 [ 15.454713] 7ec0: ee2dac68 c005cffc ee1cc3c0 c06e1e44 00000fff 00001000 eebdc000 c0278ca8 [ 15.462872] 7ee0: c0278c8c ee1cc3c0 eeb7ce00 c014422c edcf7f20 00008000 ee1cc3c0 ee9a48c0 [ 15.471030] 7f00: 00000001 00000001 edcf7f80 c0142d94 c0142d70 c01060f4 00021000 ee1cc3f0 [ 15.479190] 7f20: 00000000 00000000 c06a2150 eebdc000 2e7ec000 ee9a48c0 00008000 00021000 [ 15.487349] 7f40: edcf7f80 00008000 edcf6000 00021000 00021000 c00e39a4 00000000 ee9a48c0 [ 15.495508] 7f60: 00004000 00000000 00000000 ee9a48c0 ee9a48c0 00008000 00021000 c00e3aa0 [ 15.503668] 7f80: 00000000 00000000 0001f2e0 0001f2e0 00021000 00001000 00000003 c000f364 [ 15.511826] 7fa0: 00000000 c000f1a0 0001f2e0 00021000 00000003 00021000 00008000 00000000 [ 15.519986] 7fc0: 0001f2e0 00021000 00001000 00000003 00000001 000205e8 00000000 00021000 [ 15.528145] 7fe0: 00008000 bebbe910 0000a7ad b6edc49c 60000010 00000003 aaaaaaaa aaaaaaaa [ 15.536320] [] (is_ext_pwr_online) from [] (charger_get_property+0x170/0x314) [ 15.545164] [] (charger_get_property) from [] (power_supply_show_property+0x48/0x20c) [ 15.554719] [] (power_supply_show_property) from [] (dev_attr_show+0x1c/0x48) [ 15.563577] [] (dev_attr_show) from [] (sysfs_kf_seq_show+0x84/0x104) [ 15.571725] [] (sysfs_kf_seq_show) from [] (kernfs_seq_show+0x24/0x28) [ 15.579973] [] (kernfs_seq_show) from [] (seq_read+0x1b0/0x484) [ 15.587614] [] (seq_read) from [] (vfs_read+0x88/0x144) [ 15.594552] [] (vfs_read) from [] (SyS_read+0x40/0x8c) [ 15.601417] [] (SyS_read) from [] (ret_fast_syscall+0x0/0x48) [ 15.608877] Code: bad PC value [ 15.611991] ---[ end trace a88fcc95208db283 ]--- The charger-manager should get reference to charger power supply on each use of get_property callback. Signed-off-by: Krzysztof Kozlowski Fixes: 3bb3dbbd56ea ("power_supply: Add initial Charger-Manager driver") Signed-off-by: Sebastian Reichel Signed-off-by: Greg Kroah-Hartman commit 122d385ba70565ff11d3dc0383d07819f2e9a612 Author: Krzysztof Kozlowski Date: Mon Oct 13 15:34:30 2014 +0200 power: charger-manager: Fix accessing invalidated power supply after fuel gauge unbind commit bdbe81445407644492b9ac69a24d35e3202d773b upstream. The charger manager obtained reference to fuel gauge power supply in probe with power_supply_get_by_name() for later usage. However if fuel gauge driver was removed and re-added then this reference would point to old power supply (from driver which was removed). This lead to accessing old (and probably invalid) memory which could be observed with: $ echo "12-0036" > /sys/bus/i2c/drivers/max17042/unbind $ echo "12-0036" > /sys/bus/i2c/drivers/max17042/bind $ cat /sys/devices/virtual/power_supply/battery/capacity [ 240.480084] INFO: task cat:1393 blocked for more than 120 seconds. [ 240.484799] Not tainted 3.17.0-next-20141007-00028-ge60b6dd79570 #203 [ 240.491782] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 240.499589] cat D c0469530 0 1393 1 0x00000000 [ 240.505947] [] (__schedule) from [] (schedule_preempt_disabled+0x14/0x20) [ 240.514449] [] (schedule_preempt_disabled) from [] (mutex_lock_nested+0x1bc/0x458) [ 240.523736] [] (mutex_lock_nested) from [] (regmap_read+0x30/0x60) [ 240.531647] [] (regmap_read) from [] (max17042_get_property+0x2e8/0x350) [ 240.540055] [] (max17042_get_property) from [] (charger_get_property+0x264/0x348) [ 240.549252] [] (charger_get_property) from [] (power_supply_show_property+0x48/0x1e0) [ 240.558808] [] (power_supply_show_property) from [] (dev_attr_show+0x1c/0x48) [ 240.567664] [] (dev_attr_show) from [] (sysfs_kf_seq_show+0x84/0x104) [ 240.575814] [] (sysfs_kf_seq_show) from [] (kernfs_seq_show+0x24/0x28) [ 240.584061] [] (kernfs_seq_show) from [] (seq_read+0x1b0/0x484) [ 240.591702] [] (seq_read) from [] (vfs_read+0x88/0x144) [ 240.598640] [] (vfs_read) from [] (SyS_read+0x40/0x8c) [ 240.605507] [] (SyS_read) from [] (ret_fast_syscall+0x0/0x48) [ 240.612952] 4 locks held by cat/1393: [ 240.616589] #0: (&p->lock){+.+.+.}, at: [] seq_read+0x30/0x484 [ 240.623414] #1: (&of->mutex){+.+.+.}, at: [] kernfs_seq_start+0x1c/0x8c [ 240.631086] #2: (s_active#31){++++.+}, at: [] kernfs_seq_start+0x24/0x8c [ 240.638777] #3: (&map->mutex){+.+...}, at: [] regmap_read+0x30/0x60 The charger-manager should get reference to fuel gauge power supply on each use of get_property callback. The thermal zone 'tzd' field of power supply should not be used because of the same reason. Additionally this change solves also the issue with nested thermal_zone_get_temp() calls and related false lockdep positive for deadlock for thermal zone's mutex [1]. When fuel gauge is used as source of temperature then the charger manager forwards its get_temp calls to fuel gauge thermal zone. So actually different mutexes are used (one for charger manager thermal zone and second for fuel gauge thermal zone) but for lockdep this is one class of mutex. The recursion is removed by retrieving temperature through power supply's get_property(). In case external thermal zone is used ('cm-thermal-zone' property is present in DTS) the recursion does not exist. Charger manager simply exports POWER_SUPPLY_PROP_TEMP_AMBIENT property (instead of POWER_SUPPLY_PROP_TEMP) thus no thermal zone is created for this power supply. [1] https://lkml.org/lkml/2014/10/6/309 Signed-off-by: Krzysztof Kozlowski Fixes: 3bb3dbbd56ea ("power_supply: Add initial Charger-Manager driver") Signed-off-by: Sebastian Reichel Signed-off-by: Greg Kroah-Hartman commit fe888a904af6d107fb5e0903531f5ffbe69fa7ce Author: Pali Rohár Date: Sat Nov 8 23:36:09 2014 -0800 Input: alps - ignore bad data on Dell Latitudes E6440 and E7440 commit a7ef82aee91f26da79b981b9f5bca43b8817d3e4 upstream. Sometimes on Dell Latitude laptops psmouse/alps driver receive invalid ALPS protocol V3 packets with bit7 set in last byte. More often it can be reproduced on Dell Latitude E6440 or E7440 with closed lid and pushing cover above touchpad. If bit7 in last packet byte is set then it is not valid ALPS packet. I was told that ALPS devices never send these packets. It is not know yet who send those packets, it could be Dell EC, bug in BIOS and also bug in touchpad firmware... With this patch alps driver does not process those invalid packets, but instead of reporting PSMOUSE_BAD_DATA, getting into out of sync state, getting back in sync with the next byte and spam dmesg we return PSMOUSE_FULL_PACKET. If driver is truly out of sync we'll fail the checks on the next byte and report PSMOUSE_BAD_DATA then. Signed-off-by: Pali Rohár Tested-by: Pali Rohár Signed-off-by: Dmitry Torokhov Signed-off-by: Greg Kroah-Hartman commit c34120aafab9094d4bdc1f9e6838a13c6ec54159 Author: Pali Rohár Date: Sat Nov 8 12:58:57 2014 -0800 Input: alps - allow up to 2 invalid packets without resetting device commit 9d720b34c0a432639252f63012e18b0507f5b432 upstream. On some Dell Latitude laptops ALPS device or Dell EC send one invalid byte in 6 bytes ALPS packet. In this case psmouse driver enter out of sync state. It looks like that all other bytes in packets are valid and also device working properly. So there is no need to do full device reset, just need to wait for byte which match condition for first byte (start of packet). Because ALPS packets are bigger (6 or 8 bytes) default limit is small. This patch increase number of invalid bytes to size of 2 ALPS packets which psmouse driver can drop before do full reset. Resetting ALPS devices take some time and when doing reset on some Dell laptops touchpad, trackstick and also keyboard do not respond. So it is better to do it only if really necessary. Signed-off-by: Pali Rohár Tested-by: Pali Rohár Reviewed-by: Hans de Goede Signed-off-by: Dmitry Torokhov Signed-off-by: Greg Kroah-Hartman commit a5c137ad9f78cc501c94a793db9147faaff2d7b0 Author: Pali Rohár Date: Sat Nov 8 12:45:23 2014 -0800 Input: alps - ignore potential bare packets when device is out of sync commit 4ab8f7f320f91f279c3f06a9795cfea5c972888a upstream. 5th and 6th byte of ALPS trackstick V3 protocol match condition for first byte of PS/2 3 bytes packet. When driver enters out of sync state and ALPS trackstick is sending data then driver match 5th, 6th and next 1st bytes as PS/2. It basically means if user is using trackstick when driver is in out of sync state driver will never resync. Processing these bytes as 3 bytes PS/2 data cause total mess (random cursor movements, random clicks) and make trackstick unusable until psmouse driver decide to do full device reset. Lot of users reported problems with ALPS devices on Dell Latitude E6440, E6540 and E7440 laptops. ALPS device or Dell EC for unknown reason send some invalid ALPS PS/2 bytes which cause driver out of sync. It looks like that i8042 and psmouse/alps driver always receive group of 6 bytes packets so there are no missing bytes and no bytes were inserted between valid ones. This patch does not fix root of problem with ALPS devices found in Dell Latitude laptops but it does not allow to process some (invalid) subsequence of 6 bytes ALPS packets as 3 bytes PS/2 when driver is out of sync. So with this patch trackstick input device does not report bogus data when also driver is out of sync, so trackstick should be usable on those machines. Signed-off-by: Pali Rohár Tested-by: Pali Rohár Reviewed-by: Hans de Goede Signed-off-by: Dmitry Torokhov Signed-off-by: Greg Kroah-Hartman commit 1ffb8c57149836d0b82ef416b546a6480462a24d Author: Takashi Iwai Date: Thu Nov 6 09:27:11 2014 -0800 Input: synaptics - add min/max quirk for Lenovo T440s commit e4742b1e786ca386e88e6cfb2801e14e15e365cd upstream. The new Lenovo T440s laptop has a different PnP ID "LEN0039", and it needs the similar min/max quirk to make its clickpad working. BugLink: https://bugzilla.opensuse.org/show_bug.cgi?id=903748 Reported-and-tested-by: Joschi Brauchle Signed-off-by: Takashi Iwai Signed-off-by: Dmitry Torokhov Signed-off-by: Greg Kroah-Hartman commit 18804873848025d6a1bf3a8fb8f36414890f5af7 Author: Heinz Mauelshagen Date: Fri Oct 17 13:38:50 2014 +0200 dm raid: ensure superblock's size matches device's logical block size commit 40d43c4b4cac4c2647bf07110d7b07d35f399a84 upstream. The dm-raid superblock (struct dm_raid_superblock) is padded to 512 bytes and that size is being used to read it in from the metadata device into one preallocated page. Reading or writing this on a 512-byte sector device works fine but on a 4096-byte sector device this fails. Set the dm-raid superblock's size to the logical block size of the metadata device, because IO at that size is guaranteed too work. Also add a size check to avoid silent partial metadata loss in case the superblock should ever grow past the logical block size or PAGE_SIZE. [includes pointer math fix from Dan Carpenter] Reported-by: "Liuhua Wang" Signed-off-by: Heinz Mauelshagen Signed-off-by: Dan Carpenter Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman commit 084a4fc24d535504a81822778adca1581bd7d9f0 Author: Joe Thornber Date: Mon Nov 10 15:03:24 2014 +0000 dm btree: fix a recursion depth bug in btree walking code commit 9b460d3699324d570a4d4161c3741431887f102f upstream. The walk code was using a 'ro_spine' to hold it's locked btree nodes. But this data structure is designed for the rolling lock scheme, and as such automatically unlocks blocks that are two steps up the call chain. This is not suitable for the simple recursive walk algorithm, which retraces its steps. This code is only used by the persistent array code, which in turn is only used by dm-cache. In order to trigger it you need to have a mapping tree that is more than 2 levels deep; which equates to 8-16 million cache blocks. For instance a 4T ssd with a very small block size of 32k only just triggers this bug. The fix just places the locked blocks on the stack, and stops using the ro_spine altogether. Signed-off-by: Joe Thornber Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman commit fbdfc9b6eb3ba8df25804496f6921b70ec71e98e Author: Mikulas Patocka Date: Thu Oct 16 14:45:20 2014 -0400 dm bufio: change __GFP_IO to __GFP_FS in shrinker callbacks commit 9d28eb12447ee08bb5d1e8bb3195cf20e1ecd1c0 upstream. The shrinker uses gfp flags to indicate what kind of operation can the driver wait for. If __GFP_IO flag is present, the driver can wait for block I/O operations, if __GFP_FS flag is present, the driver can wait on operations involving the filesystem. dm-bufio tested for __GFP_IO. However, dm-bufio can run on a loop block device that makes calls into the filesystem. If __GFP_IO is present and __GFP_FS isn't, dm-bufio could still block on filesystem operations if it runs on a loop block device. The change from __GFP_IO to __GFP_FS supposedly fixes one observed (though unreproducible) deadlock involving dm-bufio and loop device. Signed-off-by: Mikulas Patocka Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman commit 3e790581cca78d4437cf6f5b40eced41dd1a494d Author: Jan Kara Date: Thu Oct 30 20:43:38 2014 +0100 block: Fix computation of merged request priority commit ece9c72accdc45c3a9484dacb1125ce572647288 upstream. Priority of a merged request is computed by ioprio_best(). If one of the requests has undefined priority (IOPRIO_CLASS_NONE) and another request has priority from IOPRIO_CLASS_BE, the function will return the undefined priority which is wrong. Fix the function to properly return priority of a request with the defined priority. Fixes: d58cdfb89ce0c6bd5f81ae931a984ef298dbda20 Signed-off-by: Jan Kara Reviewed-by: Jeff Moyer Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman commit 84b2986c349a7acf1601e6634860f0d32ef053de Author: Helge Deller Date: Mon Nov 10 21:46:18 2014 +0100 parisc: Use compat layer for msgctl, shmat, shmctl and semtimedop syscalls commit 2fe749f50b0bec07650ef135b29b1f55bf543869 upstream. Switch over the msgctl, shmat, shmctl and semtimedop syscalls to use the compat layer. The problem was found with the debian procenv package, which called shmctl(0, SHM_INFO, &info); in which the shmctl syscall then overwrote parts of the surrounding areas on the stack on which the info variable was stored and thus lead to a segfault later on. Additionally fix the definition of struct shminfo64 to use unsigned longs like the other architectures. This has no impact on userspace since we only have a 32bit userspace up to now. Signed-off-by: Helge Deller Cc: John David Anglin Signed-off-by: Greg Kroah-Hartman commit 1c753164d26bfffa90d6b852d0904020776457cf Author: Christoph Hellwig Date: Mon Nov 3 19:36:40 2014 +0100 scsi: only re-lock door after EH on devices that were reset commit 48379270fe6808cf4612ee094adc8da2b7a83baa upstream. Setups that use the blk-mq I/O path can lock up if a host with a single device that has its door locked enters EH. Make sure to only send the command to re-lock the door to devices that actually were reset and thus might have lost their state. Otherwise the EH code might be get blocked on blk_get_request as all requests for non-reset devices might be in use. Signed-off-by: Christoph Hellwig Reported-by: Meelis Roos Tested-by: Meelis Roos Reviewed-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman commit a1dd586647f122eec80dee85d3f6cb6685c8c7e8 Author: William Cohen Date: Tue Nov 11 09:41:27 2014 -0500 Correct the race condition in aarch64_insn_patch_text_sync() commit 899d5933b2dd2720f2b20b01eaa07871aa6ad096 upstream. When experimenting with patches to provide kprobes support for aarch64 smp machines would hang when inserting breakpoints into kernel code. The hangs were caused by a race condition in the code called by aarch64_insn_patch_text_sync(). The first processor in the aarch64_insn_patch_text_cb() function would patch the code while other processors were still entering the function and incrementing the cpu_count field. This resulted in some processors never observing the exit condition and exiting the function. Thus, processors in the system hung. The first processor to enter the patching function performs the patching and signals that the patching is complete with an increment of the cpu_count field. When all the processors have incremented the cpu_count field the cpu_count will be num_cpus_online()+1 and they will return to normal execution. Fixes: ae16480785de arm64: introduce interfaces to hotpatch kernel and module code Signed-off-by: William Cohen Acked-by: Will Deacon Signed-off-by: Catalin Marinas Signed-off-by: Greg Kroah-Hartman commit 25e26c3e8b6c7921a8d44017b61d989dbe4e8649 Author: Peng Tao Date: Wed Nov 5 22:36:50 2014 +0800 nfs: fix pnfs direct write memory leak commit 8c393f9a721c30a030049a680e1bf896669bb279 upstream. For pNFS direct writes, layout driver may dynamically allocate ds_cinfo.buckets. So we need to take care to free them when freeing dreq. Ideally this needs to be done inside layout driver where ds_cinfo.buckets are allocated. But buckets are attached to dreq and reused across LD IO iterations. So I feel it's OK to free them in the generic layer. Signed-off-by: Peng Tao Signed-off-by: Trond Myklebust Signed-off-by: Greg Kroah-Hartman commit 8274355d41957d06158db1111cb410d9574c5f9a Author: Simon Horman Date: Mon Oct 27 09:14:30 2014 +0900 ata: sata_rcar: Disable DIPM mode for r8a7790 ES1 commit aa1cf25887099bba68f1f3879c0d394e08b8779f upstream. Unlike other SATA R-Car r8a7790 controllers the r8a7790 ES1 SATA R-Car controller needs to be run with DIPM disabled. Signed-off-by: Simon Horman Signed-off-by: Tejun Heo Signed-off-by: Greg Kroah-Hartman commit 55507aed77796f7094d68264220b7aa6bd3a45be Author: Stefan Richter Date: Tue Nov 11 17:16:44 2014 +0100 firewire: cdev: prevent kernel stack leaking into ioctl arguments commit eaca2d8e75e90a70a63a6695c9f61932609db212 upstream. Found by the UC-KLEE tool: A user could supply less input to firewire-cdev ioctls than write- or write/read-type ioctl handlers expect. The handlers used data from uninitialized kernel stack then. This could partially leak back to the user if the kernel subsequently generated fw_cdev_event_'s (to be read from the firewire-cdev fd) which notably would contain the _u64 closure field which many of the ioctl argument structures contain. The fact that the handlers would act on random garbage input is a lesser issue since all handlers must check their input anyway. The fix simply always null-initializes the entire ioctl argument buffer regardless of the actual length of expected user input. That is, a runtime overhead of memset(..., 40) is added to each firewirew-cdev ioctl() call. [Comment from Clemens Ladisch: This part of the stack is most likely to be already in the cache.] Remarks: - There was never any leak from kernel stack to the ioctl output buffer itself. IOW, it was not possible to read kernel stack by a read-type or write/read-type ioctl alone; the leak could at most happen in combination with read()ing subsequent event data. - The actual expected minimum user input of each ioctl from include/uapi/linux/firewire-cdev.h is, in bytes: [0x00] = 32, [0x05] = 4, [0x0a] = 16, [0x0f] = 20, [0x14] = 16, [0x01] = 36, [0x06] = 20, [0x0b] = 4, [0x10] = 20, [0x15] = 20, [0x02] = 20, [0x07] = 4, [0x0c] = 0, [0x11] = 0, [0x16] = 8, [0x03] = 4, [0x08] = 24, [0x0d] = 20, [0x12] = 36, [0x17] = 12, [0x04] = 20, [0x09] = 24, [0x0e] = 4, [0x13] = 40, [0x18] = 4. Reported-by: David Ramos Signed-off-by: Stefan Richter Signed-off-by: Greg Kroah-Hartman commit c6f8075d3934e493980fe83f8a746d74b98f5e51 Author: Kyle McMartin Date: Wed Nov 12 21:07:44 2014 +0000 arm64: __clear_user: handle exceptions on strb commit 97fc15436b36ee3956efad83e22a557991f7d19d upstream. ARM64 currently doesn't fix up faults on the single-byte (strb) case of __clear_user... which means that we can cause a nasty kernel panic as an ordinary user with any multiple PAGE_SIZE+1 read from /dev/zero. i.e.: dd if=/dev/zero of=foo ibs=1 count=1 (or ibs=65537, etc.) This is a pretty obscure bug in the general case since we'll only __do_kernel_fault (since there's no extable entry for pc) if the mmap_sem is contended. However, with CONFIG_DEBUG_VM enabled, we'll always fault. if (!down_read_trylock(&mm->mmap_sem)) { if (!user_mode(regs) && !search_exception_tables(regs->pc)) goto no_context; retry: down_read(&mm->mmap_sem); } else { /* * The above down_read_trylock() might have succeeded in * which * case, we'll have missed the might_sleep() from * down_read(). */ might_sleep(); if (!user_mode(regs) && !search_exception_tables(regs->pc)) goto no_context; } Fix that by adding an extable entry for the strb instruction, since it touches user memory, similar to the other stores in __clear_user. Signed-off-by: Kyle McMartin Reported-by: Miloš Prchlík Signed-off-by: Catalin Marinas Signed-off-by: Greg Kroah-Hartman commit 5f64b0f2cbb9c15d0a1495d8fa3a102538a4a688 Author: Joe Thornber Date: Fri Oct 10 09:41:09 2014 +0100 dm thin: grab a virtual cell before looking up the mapping commit c822ed967cba38505713d59ed40a114386ef6c01 upstream. Avoids normal IO racing with discard. Signed-off-by: Joe Thornber Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman commit 656b20b4c61c629293f628cb9345cd6a9c3775cf Author: Roger Quadros Date: Mon Nov 3 12:09:52 2014 +0200 pinctrl: dra: dt-bindings: Fix output pull up/down commit 73b3a6657a88ef5348a0d69c9a8107d6f01ae862 upstream. For PIN_OUTPUT_PULLUP and PIN_OUTPUT_PULLDOWN we must not set the PULL_DIS bit which disables the PULLs. PULL_ENA is a 0 and using it in an OR operation is a NOP, so don't use it in the PIN_OUTPUT_PULLUP/DOWN macros. Fixes: 23d9cec07c58 ("pinctrl: dra: dt-bindings: Fix pull enable/disable") Signed-off-by: Roger Quadros Acked-by: Nishanth Menon Signed-off-by: Tony Lindgren Signed-off-by: Greg Kroah-Hartman commit 8be3d9977773789a8582ad18a5f8cd998e3c8511 Author: Will Deacon Date: Tue Nov 4 11:40:46 2014 +0100 ARM: 8191/1: decompressor: ensure I-side picks up relocated code commit 238962ac71910d6c20162ea5230685fead1836a4 upstream. To speed up decompression, the decompressor sets up a flat, cacheable mapping of memory. However, when there is insufficient space to hold the page tables for this mapping, we don't bother to enable the caches and subsequently skip all the cache maintenance hooks. Skipping the cache maintenance before jumping to the relocated code allows the processor to predict the branch and populate the I-cache with stale data before the relocation loop has completed (since a bootloader may have SCTLR.I set, which permits normal, cacheable instruction fetches regardless of SCTLR.M). This patch moves the cache maintenance check into the maintenance routines themselves, allowing the v6/v7 versions to invalidate the I-cache regardless of the MMU state. Reported-by: Marc Carino Tested-by: Julien Grall Signed-off-by: Will Deacon Signed-off-by: Russell King Signed-off-by: Greg Kroah-Hartman commit 962febd8f207bc0908b898ad5ed3594e2341f6e7 Author: Nathan Lynch Date: Mon Nov 10 23:46:27 2014 +0100 ARM: 8198/1: make kuser helpers depend on MMU commit 08b964ff3c51b10aaf2e6ba639f40054c09f0f7a upstream. The kuser helpers page is not set up on non-MMU systems, so it does not make sense to allow CONFIG_KUSER_HELPERS to be enabled when CONFIG_MMU=n. Allowing it to be set on !MMU results in an oops in set_tls (used in execve and the arm_syscall trap handler): Unhandled exception: IPSR = 00000005 LR = fffffff1 CPU: 0 PID: 1 Comm: swapper Not tainted 3.18.0-rc1-00041-ga30465a #216 task: 8b838000 ti: 8b82a000 task.ti: 8b82a000 PC is at flush_thread+0x32/0x40 LR is at flush_thread+0x21/0x40 pc : [<8f00157a>] lr : [<8f001569>] psr: 4100000b sp : 8b82be20 ip : 00000000 fp : 8b83c000 r10: 00000001 r9 : 88018c84 r8 : 8bb85000 r7 : 8b838000 r6 : 00000000 r5 : 8bb77400 r4 : 8b82a000 r3 : ffff0ff0 r2 : 8b82a000 r1 : 00000000 r0 : 88020354 xPSR: 4100000b CPU: 0 PID: 1 Comm: swapper Not tainted 3.18.0-rc1-00041-ga30465a #216 [<8f002bc1>] (unwind_backtrace) from [<8f002033>] (show_stack+0xb/0xc) [<8f002033>] (show_stack) from [<8f00265b>] (__invalid_entry+0x4b/0x4c) As best I can tell this issue existed for the set_tls ARM syscall before commit fbfb872f5f41 "ARM: 8148/1: flush TLS and thumbee register state during exec" consolidated the TLS manipulation code into the set_tls helper function, but now that we're using it to flush register state during execve, !MMU users encounter the oops at the first exec. Prevent CONFIG_MMU=n configurations from enabling CONFIG_KUSER_HELPERS. Fixes: fbfb872f5f41 (ARM: 8148/1: flush TLS and thumbee register state during exec) Signed-off-by: Nathan Lynch Reported-by: Stefan Agner Acked-by: Uwe Kleine-König Signed-off-by: Russell King Signed-off-by: Greg Kroah-Hartman commit fab22a1bde15890e22aa07e86ad66ee5c82ce909 Author: Alex Deucher Date: Wed Nov 5 17:14:32 2014 -0500 drm/radeon: add missing crtc unlock when setting up the MC commit f0d7bfb9407fccb6499ec01c33afe43512a439a2 upstream. Need to unlock the crtc after updating the blanking state. Signed-off-by: Alex Deucher Signed-off-by: Greg Kroah-Hartman commit 4ac6e9d1a7f6e936401012e0677d7421dcbe7120 Author: Alex Deucher Date: Mon Nov 3 09:57:46 2014 -0500 drm/radeon: make sure mode init is complete in bandwidth_update commit 8efe82ca908400785253c8f0dfcf301e6bd93488 upstream. The power management code calls into the display code for certain things. If certain power management sysfs attributes are called before the driver has finished initializing all of the hardware we can run into problems with uninitialized modesetting state. Add a check to make sure modesetting init has completed to the bandwidth update callbacks to fix this. Can be triggered by the tlp and laptop start up scripts depending on the timing. bugs: https://bugzilla.kernel.org/show_bug.cgi?id=83611 https://bugs.freedesktop.org/show_bug.cgi?id=85771 Signed-off-by: Alex Deucher Signed-off-by: Greg Kroah-Hartman commit 2b3470a2ff478d4cf6fc210d7754051ce7c52d7f Author: Jammy Zhou Date: Mon Nov 3 08:58:20 2014 -0500 drm/radeon: set correct CE ram size for CIK commit dc4edad6530a9b7b66c3d905e2bc06021a05dcad upstream. CE ram size is 32k/0k/0k for GFX/CS0/CS1 with CIK Ported from amdgpu driver. Signed-off-by: Jammy Zhou Signed-off-by: Alex Deucher Signed-off-by: Greg Kroah-Hartman commit 15d4b0476f867ab16f38ceea09a7276cdf6fbf6d Author: Johannes Berg Date: Mon Nov 3 13:57:46 2014 +0100 mac80211: fix use-after-free in defragmentation commit b8fff407a180286aa683d543d878d98d9fc57b13 upstream. Upon receiving the last fragment, all but the first fragment are freed, but the multicast check for statistics at the end of the function refers to the current skb (the last fragment) causing a use-after-free bug. Since multicast frames cannot be fragmented and we check for this early in the function, just modify that check to also do the accounting to fix the issue. Reported-by: Yosef Khyal Signed-off-by: Johannes Berg Signed-off-by: Greg Kroah-Hartman commit 2097903674ccf4d573f90ec3e31ac1ae659ddac4 Author: Luciano Coelho Date: Tue Oct 28 13:33:05 2014 +0200 mac80211: schedule the actual switch of the station before CSA count 0 commit ff1e417c7c239b7abfe70aa90460a77eaafc7f83 upstream. Due to the time it takes to process the beacon that started the CSA process, we may be late for the switch if we try to reach exactly beacon 0. To avoid that, use count - 1 when calculating the switch time. Reported-by: Jouni Malinen Signed-off-by: Luciano Coelho Signed-off-by: Johannes Berg Signed-off-by: Greg Kroah-Hartman commit b0255bc3277ddd8f1e3334ca1cda5067e8612490 Author: Luciano Coelho Date: Tue Oct 28 13:33:04 2014 +0200 mac80211: use secondary channel offset IE also beacons during CSA commit 84469a45a1bedec9918e94ab2f78c5dc0739e4a7 upstream. If we are switching from an HT40+ to an HT40- channel (or vice-versa), we need the secondary channel offset IE to specify what is the post-CSA offset to be used. This applies both to beacons and to probe responses. In ieee80211_parse_ch_switch_ie() we were ignoring this IE from beacons and using the *current* HT information IE instead. This was causing us to use the same offset as before the switch. Fix that by using the secondary channel offset IE also for beacons and don't ever use the pre-switch offset. Additionally, remove the "beacon" argument from ieee80211_parse_ch_switch_ie(), since it's not needed anymore. Reported-by: Jouni Malinen Signed-off-by: Luciano Coelho Signed-off-by: Johannes Berg Signed-off-by: Greg Kroah-Hartman commit b77034fb4913a44b957fa3c12d49f2adf33fdc32 Author: Johannes Berg Date: Tue Oct 21 20:56:42 2014 +0200 mac80211: properly flush delayed scan work on interface removal commit 46238845bd609a5c0fbe076e1b82b4c5b33360b2 upstream. When an interface is deleted, an ongoing hardware scan is canceled and the driver must abort the scan, at the very least reporting completion while the interface is removed. However, if it scheduled the work that might only run after everything is said and done, which leads to cfg80211 warning that the scan isn't reported as finished yet; this is no fault of the driver, it already did, but mac80211 hasn't processed it. To fix this situation, flush the delayed work when the interface being removed is the one that was executing the scan. Reported-by: Sujith Manoharan Tested-by: Sujith Manoharan Signed-off-by: Johannes Berg Signed-off-by: Greg Kroah-Hartman commit 39630ccbb373121742c1982ee8c138314fa80b9b Author: Junjie Mao Date: Tue Oct 28 09:31:47 2014 +0800 mac80211_hwsim: release driver when ieee80211_register_hw fails commit 805dbe17d1c832ad341f14fae8cedf41b67ca6fa upstream. The driver is not released when ieee80211_register_hw fails in mac80211_hwsim_create_radio, leading to the access to the unregistered (and possibly freed) device in platform_driver_unregister: [ 0.447547] mac80211_hwsim: ieee80211_register_hw failed (-2) [ 0.448292] ------------[ cut here ]------------ [ 0.448854] WARNING: CPU: 0 PID: 1 at ../include/linux/kref.h:47 kobject_get+0x33/0x50() [ 0.449839] CPU: 0 PID: 1 Comm: swapper Not tainted 3.17.0-00001-gdd46990-dirty #2 [ 0.450813] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 0.451512] 00000000 00000000 78025e38 7967c6c6 78025e68 7905e09b 7988b480 00000000 [ 0.452579] 00000001 79887d62 0000002f 79170bb3 79170bb3 78397008 79ac9d74 00000001 [ 0.453614] 78025e78 7905e15d 00000009 00000000 78025e84 79170bb3 78397000 78025e8c [ 0.454632] Call Trace: [ 0.454921] [<7967c6c6>] dump_stack+0x16/0x18 [ 0.455453] [<7905e09b>] warn_slowpath_common+0x6b/0x90 [ 0.456067] [<79170bb3>] ? kobject_get+0x33/0x50 [ 0.456612] [<79170bb3>] ? kobject_get+0x33/0x50 [ 0.457155] [<7905e15d>] warn_slowpath_null+0x1d/0x20 [ 0.457748] [<79170bb3>] kobject_get+0x33/0x50 [ 0.458274] [<7925824f>] get_device+0xf/0x20 [ 0.458779] [<7925b5cd>] driver_detach+0x3d/0xa0 [ 0.459331] [<7925a3ff>] bus_remove_driver+0x8f/0xb0 [ 0.459927] [<7925bf80>] ? class_unregister+0x40/0x80 [ 0.460660] [<7925bad7>] driver_unregister+0x47/0x50 [ 0.461248] [<7925c033>] ? class_destroy+0x13/0x20 [ 0.461824] [<7925d07b>] platform_driver_unregister+0xb/0x10 [ 0.462507] [<79b51ba0>] init_mac80211_hwsim+0x3e8/0x3f9 [ 0.463161] [<79b30c58>] do_one_initcall+0x106/0x1a9 [ 0.463758] [<79b517b8>] ? if_spi_init_module+0xac/0xac [ 0.464393] [<79b517b8>] ? if_spi_init_module+0xac/0xac [ 0.465001] [<79071935>] ? parse_args+0x2f5/0x480 [ 0.465569] [<7906b41e>] ? __usermodehelper_set_disable_depth+0x3e/0x50 [ 0.466345] [<79b30dd9>] kernel_init_freeable+0xde/0x17d [ 0.466972] [<79b304d6>] ? do_early_param+0x7a/0x7a [ 0.467546] [<79677b1b>] kernel_init+0xb/0xe0 [ 0.468072] [<79075f42>] ? schedule_tail+0x12/0x40 [ 0.468658] [<79686580>] ret_from_kernel_thread+0x20/0x30 [ 0.469303] [<79677b10>] ? rest_init+0xc0/0xc0 [ 0.469829] ---[ end trace ad8ac403ff8aef5c ]--- [ 0.470509] ------------[ cut here ]------------ [ 0.471047] WARNING: CPU: 0 PID: 1 at ../kernel/locking/lockdep.c:3161 __lock_acquire.isra.22+0x7aa/0xb00() [ 0.472163] DEBUG_LOCKS_WARN_ON(id >= MAX_LOCKDEP_KEYS) [ 0.472774] CPU: 0 PID: 1 Comm: swapper Tainted: G W 3.17.0-00001-gdd46990-dirty #2 [ 0.473815] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 0.474492] 78025de0 78025de0 78025da0 7967c6c6 78025dd0 7905e09b 79888931 78025dfc [ 0.475515] 00000001 79888a93 00000c59 7907f33a 7907f33a 78028000 fffe9d09 00000000 [ 0.476519] 78025de8 7905e10e 00000009 78025de0 79888931 78025dfc 78025e24 7907f33a [ 0.477523] Call Trace: [ 0.477821] [<7967c6c6>] dump_stack+0x16/0x18 [ 0.478352] [<7905e09b>] warn_slowpath_common+0x6b/0x90 [ 0.478976] [<7907f33a>] ? __lock_acquire.isra.22+0x7aa/0xb00 [ 0.479658] [<7907f33a>] ? __lock_acquire.isra.22+0x7aa/0xb00 [ 0.480417] [<7905e10e>] warn_slowpath_fmt+0x2e/0x30 [ 0.480479] [<7907f33a>] __lock_acquire.isra.22+0x7aa/0xb00 [ 0.480479] [<79078aa5>] ? sched_clock_cpu+0xb5/0xf0 [ 0.480479] [<7907fd06>] lock_acquire+0x56/0x70 [ 0.480479] [<7925b5e8>] ? driver_detach+0x58/0xa0 [ 0.480479] [<79682d11>] mutex_lock_nested+0x61/0x2a0 [ 0.480479] [<7925b5e8>] ? driver_detach+0x58/0xa0 [ 0.480479] [<7925b5e8>] ? driver_detach+0x58/0xa0 [ 0.480479] [<7925b5e8>] driver_detach+0x58/0xa0 [ 0.480479] [<7925a3ff>] bus_remove_driver+0x8f/0xb0 [ 0.480479] [<7925bf80>] ? class_unregister+0x40/0x80 [ 0.480479] [<7925bad7>] driver_unregister+0x47/0x50 [ 0.480479] [<7925c033>] ? class_destroy+0x13/0x20 [ 0.480479] [<7925d07b>] platform_driver_unregister+0xb/0x10 [ 0.480479] [<79b51ba0>] init_mac80211_hwsim+0x3e8/0x3f9 [ 0.480479] [<79b30c58>] do_one_initcall+0x106/0x1a9 [ 0.480479] [<79b517b8>] ? if_spi_init_module+0xac/0xac [ 0.480479] [<79b517b8>] ? if_spi_init_module+0xac/0xac [ 0.480479] [<79071935>] ? parse_args+0x2f5/0x480 [ 0.480479] [<7906b41e>] ? __usermodehelper_set_disable_depth+0x3e/0x50 [ 0.480479] [<79b30dd9>] kernel_init_freeable+0xde/0x17d [ 0.480479] [<79b304d6>] ? do_early_param+0x7a/0x7a [ 0.480479] [<79677b1b>] kernel_init+0xb/0xe0 [ 0.480479] [<79075f42>] ? schedule_tail+0x12/0x40 [ 0.480479] [<79686580>] ret_from_kernel_thread+0x20/0x30 [ 0.480479] [<79677b10>] ? rest_init+0xc0/0xc0 [ 0.480479] ---[ end trace ad8ac403ff8aef5d ]--- [ 0.495478] BUG: unable to handle kernel paging request at 00200200 [ 0.496257] IP: [<79682de5>] mutex_lock_nested+0x135/0x2a0 [ 0.496923] *pde = 00000000 [ 0.497290] Oops: 0002 [#1] [ 0.497653] CPU: 0 PID: 1 Comm: swapper Tainted: G W 3.17.0-00001-gdd46990-dirty #2 [ 0.498659] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 0.499321] task: 78028000 ti: 78024000 task.ti: 78024000 [ 0.499955] EIP: 0060:[<79682de5>] EFLAGS: 00010097 CPU: 0 [ 0.500620] EIP is at mutex_lock_nested+0x135/0x2a0 [ 0.501145] EAX: 00200200 EBX: 78397434 ECX: 78397460 EDX: 78025e70 [ 0.501816] ESI: 00000246 EDI: 78028000 EBP: 78025e8c ESP: 78025e54 [ 0.502497] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 [ 0.503076] CR0: 8005003b CR2: 00200200 CR3: 01b9d000 CR4: 00000690 [ 0.503773] Stack: [ 0.503998] 00000000 00000001 00000000 7925b5e8 78397460 7925b5e8 78397474 78397460 [ 0.504944] 00200200 11111111 78025e70 78397000 79ac9d74 00000001 78025ea0 7925b5e8 [ 0.505451] 79ac9d74 fffffffe 00000001 78025ebc 7925a3ff 7a251398 78025ec8 7925bf80 [ 0.505451] Call Trace: [ 0.505451] [<7925b5e8>] ? driver_detach+0x58/0xa0 [ 0.505451] [<7925b5e8>] ? driver_detach+0x58/0xa0 [ 0.505451] [<7925b5e8>] driver_detach+0x58/0xa0 [ 0.505451] [<7925a3ff>] bus_remove_driver+0x8f/0xb0 [ 0.505451] [<7925bf80>] ? class_unregister+0x40/0x80 [ 0.505451] [<7925bad7>] driver_unregister+0x47/0x50 [ 0.505451] [<7925c033>] ? class_destroy+0x13/0x20 [ 0.505451] [<7925d07b>] platform_driver_unregister+0xb/0x10 [ 0.505451] [<79b51ba0>] init_mac80211_hwsim+0x3e8/0x3f9 [ 0.505451] [<79b30c58>] do_one_initcall+0x106/0x1a9 [ 0.505451] [<79b517b8>] ? if_spi_init_module+0xac/0xac [ 0.505451] [<79b517b8>] ? if_spi_init_module+0xac/0xac [ 0.505451] [<79071935>] ? parse_args+0x2f5/0x480 [ 0.505451] [<7906b41e>] ? __usermodehelper_set_disable_depth+0x3e/0x50 [ 0.505451] [<79b30dd9>] kernel_init_freeable+0xde/0x17d [ 0.505451] [<79b304d6>] ? do_early_param+0x7a/0x7a [ 0.505451] [<79677b1b>] kernel_init+0xb/0xe0 [ 0.505451] [<79075f42>] ? schedule_tail+0x12/0x40 [ 0.505451] [<79686580>] ret_from_kernel_thread+0x20/0x30 [ 0.505451] [<79677b10>] ? rest_init+0xc0/0xc0 [ 0.505451] Code: 89 d8 e8 cf 9b 9f ff 8b 4f 04 8d 55 e4 89 d8 e8 72 9d 9f ff 8d 43 2c 89 c1 89 45 d8 8b 43 30 8d 55 e4 89 53 30 89 4d e4 89 45 e8 <89> 10 8b 55 dc 8b 45 e0 89 7d ec e8 db af 9f ff eb 11 90 31 c0 [ 0.505451] EIP: [<79682de5>] mutex_lock_nested+0x135/0x2a0 SS:ESP 0068:78025e54 [ 0.505451] CR2: 0000000000200200 [ 0.505451] ---[ end trace ad8ac403ff8aef5e ]--- [ 0.505451] Kernel panic - not syncing: Fatal exception Fixes: 9ea927748ced ("mac80211_hwsim: Register and bind to driver") Reported-by: Fengguang Wu Signed-off-by: Junjie Mao Signed-off-by: Johannes Berg Signed-off-by: Greg Kroah-Hartman commit b3101caa21f0bbfe600118d157e039610effd711 Author: Herbert Xu Date: Mon Nov 3 14:01:25 2014 +0800 macvtap: Fix csum_start when VLAN tags are present commit 3ce9b20f1971690b8b3b620e735ec99431573b39 upstream. When VLAN is in use in macvtap_put_user, we end up setting csum_start to the wrong place. The result is that the whoever ends up doing the checksum setting will corrupt the packet instead of writing the checksum to the expected location, usually this means writing the checksum with an offset of -4. This patch fixes this by adjusting csum_start when VLAN tags are detected. Fixes: f09e2249c4f5 ("macvtap: restore vlan header on user read") Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman Signed-off-by: David S. Miller commit 5ab7aeed775750145fad815e8cfc76fc14a7627c Author: Ilya Dryomov Date: Thu Oct 23 00:25:22 2014 +0400 libceph: do not crash on large auth tickets commit aaef31703a0cf6a733e651885bfb49edc3ac6774 upstream. Large (greater than 32k, the value of PAGE_ALLOC_COSTLY_ORDER) auth tickets will have their buffers vmalloc'ed, which leads to the following crash in crypto: [ 28.685082] BUG: unable to handle kernel paging request at ffffeb04000032c0 [ 28.686032] IP: [] scatterwalk_pagedone+0x22/0x80 [ 28.686032] PGD 0 [ 28.688088] Oops: 0000 [#1] PREEMPT SMP [ 28.688088] Modules linked in: [ 28.688088] CPU: 0 PID: 878 Comm: kworker/0:2 Not tainted 3.17.0-vm+ #305 [ 28.688088] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [ 28.688088] Workqueue: ceph-msgr con_work [ 28.688088] task: ffff88011a7f9030 ti: ffff8800d903c000 task.ti: ffff8800d903c000 [ 28.688088] RIP: 0010:[] [] scatterwalk_pagedone+0x22/0x80 [ 28.688088] RSP: 0018:ffff8800d903f688 EFLAGS: 00010286 [ 28.688088] RAX: ffffeb04000032c0 RBX: ffff8800d903f718 RCX: ffffeb04000032c0 [ 28.688088] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8800d903f750 [ 28.688088] RBP: ffff8800d903f688 R08: 00000000000007de R09: ffff8800d903f880 [ 28.688088] R10: 18df467c72d6257b R11: 0000000000000000 R12: 0000000000000010 [ 28.688088] R13: ffff8800d903f750 R14: ffff8800d903f8a0 R15: 0000000000000000 [ 28.688088] FS: 00007f50a41c7700(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000 [ 28.688088] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 28.688088] CR2: ffffeb04000032c0 CR3: 00000000da3f3000 CR4: 00000000000006b0 [ 28.688088] Stack: [ 28.688088] ffff8800d903f698 ffffffff81392ca8 ffff8800d903f6e8 ffffffff81395d32 [ 28.688088] ffff8800dac96000 ffff880000000000 ffff8800d903f980 ffff880119b7e020 [ 28.688088] ffff880119b7e010 0000000000000000 0000000000000010 0000000000000010 [ 28.688088] Call Trace: [ 28.688088] [] scatterwalk_done+0x38/0x40 [ 28.688088] [] scatterwalk_done+0x38/0x40 [ 28.688088] [] blkcipher_walk_done+0x182/0x220 [ 28.688088] [] crypto_cbc_encrypt+0x15f/0x180 [ 28.688088] [] ? crypto_aes_set_key+0x30/0x30 [ 28.688088] [] ceph_aes_encrypt2+0x29c/0x2e0 [ 28.688088] [] ceph_encrypt2+0x93/0xb0 [ 28.688088] [] ceph_x_encrypt+0x4a/0x60 [ 28.688088] [] ? ceph_buffer_new+0x5d/0xf0 [ 28.688088] [] ceph_x_build_authorizer.isra.6+0x297/0x360 [ 28.688088] [] ? kmem_cache_alloc_trace+0x11b/0x1c0 [ 28.688088] [] ? ceph_auth_create_authorizer+0x36/0x80 [ 28.688088] [] ceph_x_create_authorizer+0x63/0xd0 [ 28.688088] [] ceph_auth_create_authorizer+0x54/0x80 [ 28.688088] [] get_authorizer+0x80/0xd0 [ 28.688088] [] prepare_write_connect+0x18b/0x2b0 [ 28.688088] [] try_read+0x1e59/0x1f10 This is because we set up crypto scatterlists as if all buffers were kmalloc'ed. Fix it. Signed-off-by: Ilya Dryomov Reviewed-by: Sage Weil Signed-off-by: Greg Kroah-Hartman commit b120357fe04dd1783773d5a502d05612bb6eb5a1 Author: Max Filippov Date: Mon Oct 6 21:01:17 2014 +0400 xtensa: re-wire umount syscall to sys_oldumount commit 2651cc6974d47fc43bef1cd8cd26966e4f5ba306 upstream. Userspace actually passes single parameter (path name) to the umount syscall, so new umount just fails. Fix it by requesting old umount syscall implementation and re-wiring umount to it. Signed-off-by: Max Filippov Signed-off-by: Greg Kroah-Hartman commit 1d9e783748ddef36fa1ae6284da09a23904b5ef5 Author: Takashi Iwai Date: Tue Nov 11 15:45:57 2014 +0100 ALSA: usb-audio: Fix memory leak in FTU quirk commit 1a290581ded60e87276741f8ca97b161d2b226fc upstream. M-audio FastTrack Ultra quirk doesn't release the kzalloc'ed memory. This patch adds the private_free callback to release it properly. Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman commit ee693dc06e6d4ca3922edb08e66214ececf8aee9 Author: Tejun Heo Date: Mon Oct 27 10:22:56 2014 -0400 ahci: disable MSI instead of NCQ on Samsung pci-e SSDs on macbooks commit 66a7cbc303f4d28f201529b06061944d51ab530c upstream. Samsung pci-e SSDs on macbooks failed miserably on NCQ commands, so 67809f85d31e ("ahci: disable NCQ on Samsung pci-e SSDs on macbooks") disabled NCQ on them. It turns out that NCQ is fine as long as MSI is not used, so let's turn off MSI and leave NCQ on. Signed-off-by: Tejun Heo Link: https://bugzilla.kernel.org/show_bug.cgi?id=60731 Tested-by: Tested-by: Imre Kaloz Fixes: 67809f85d31e ("ahci: disable NCQ on Samsung pci-e SSDs on macbooks") Signed-off-by: Greg Kroah-Hartman commit 5706021ea8f12b3d7c5400e7cf20d34692fddeb9 Author: James Ralston Date: Mon Oct 13 15:16:38 2014 -0700 ahci: Add Device IDs for Intel Sunrise Point PCH commit 690000b930456a98663567d35dd5c54b688d1e3f upstream. This patch adds the AHCI-mode SATA Device IDs for the Intel Sunrise Point PCH. Signed-off-by: James Ralston Signed-off-by: Tejun Heo Signed-off-by: Greg Kroah-Hartman commit a5d002baef69d3e6f1ba772b6a33a7964764c1b1 Author: Miklos Szeredi Date: Tue Nov 4 11:27:12 2014 +0100 audit: keep inode pinned commit 799b601451b21ebe7af0e6e8f6e2ccd4683c5064 upstream. Audit rules disappear when an inode they watch is evicted from the cache. This is likely not what we want. The guilty commit is "fsnotify: allow marks to not pin inodes in core", which didn't take into account that audit_tree adds watches with a zero mask. Adding any mask should fix this. Fixes: 90b1e7a57880 ("fsnotify: allow marks to not pin inodes in core") Signed-off-by: Miklos Szeredi Signed-off-by: Paul Moore Signed-off-by: Greg Kroah-Hartman commit 1035897060b3e20bf0e166734ba90f9057e4fd29 Author: Richard Guy Briggs Date: Thu Oct 30 11:22:53 2014 -0400 audit: AUDIT_FEATURE_CHANGE message format missing delimiting space commit 897f1acbb6702ddaa953e8d8436eee3b12016c7e upstream. Add a space between subj= and feature= fields to make them parsable. Signed-off-by: Richard Guy Briggs Signed-off-by: Paul Moore Signed-off-by: Greg Kroah-Hartman commit 374f060993fc6ca7336080abf46dc9b464280470 Author: Richard Guy Briggs Date: Sun Aug 24 20:37:52 2014 -0400 audit: correct AUDIT_GET_FEATURE return message type commit 9ef91514774a140e468f99d73d7593521e6d25dc upstream. When an AUDIT_GET_FEATURE message is sent from userspace to the kernel, it should reply with a message tagged as an AUDIT_GET_FEATURE type with a struct audit_feature. The current reply is a message tagged as an AUDIT_GET type with a struct audit_feature. This appears to have been a cut-and-paste-eo in commit b0fed40. Reported-by: Steve Grubb Signed-off-by: Richard Guy Briggs Signed-off-by: Greg Kroah-Hartman commit 406ad132276e8207e1700744d767aa80121b87ea Author: Andy Lutomirski Date: Fri Sep 5 15:13:52 2014 -0700 x86, x32, audit: Fix x32's AUDIT_ARCH wrt audit commit 81f49a8fd7088cfcb588d182eeede862c0e3303e upstream. is_compat_task() is the wrong check for audit arch; the check should be is_ia32_task(): x32 syscalls should be AUDIT_ARCH_X86_64, not AUDIT_ARCH_I386. CONFIG_AUDITSYSCALL is currently incompatible with x32, so this has no visible effect. Signed-off-by: Andy Lutomirski Link: http://lkml.kernel.org/r/a0138ed8c709882aec06e4acc30bfa9b623b8717.1409954077.git.luto@amacapital.net Signed-off-by: H. Peter Anvin Signed-off-by: Greg Kroah-Hartman commit 6b037909d30a064f382abbaf789b82a1b141ad0f Author: Herbert Xu Date: Mon Nov 3 04:30:13 2014 +0800 tun: Fix csum_start with VLAN acceleration commit a8f9bfdf982e2b1fb9f094e4de9ab08c57f3d2fd upstream. When VLAN acceleration is in use on the xmit path, we end up setting csum_start to the wrong place. The result is that the whoever ends up doing the checksum setting will corrupt the packet instead of writing the checksum to the expected location, usually this means writing the checksum with an offset of -4. This patch fixes this by adjusting csum_start when VLAN acceleration is detected. Fixes: 6680ec68eff4 ("tuntap: hardware vlan tx support") Signed-off-by: Herbert Xu Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit fe658f87001374b48fb63c33948ce440215ba23c Author: Greg Kurz Date: Fri Oct 31 07:50:11 2014 +0100 hwrng: pseries - port to new read API and fix stack corruption commit 24c65bc7037e7d0f362c0df70d17dd72ee64b8b9 upstream. The add_early_randomness() function in drivers/char/hw_random/core.c passes a 16-byte buffer to pseries_rng_data_read(). Unfortunately, plpar_hcall() returns four 64-bit values and trashes 16 bytes on the stack. This bug has been lying around for a long time. It got unveiled by: commit d3cc7996473a7bdd33256029988ea690754e4e2a Author: Amit Shah Date: Thu Jul 10 15:42:34 2014 +0530 hwrng: fetch randomness only after device init It may trig a oops while loading or unloading the pseries-rng module for both PowerVM and PowerKVM guests. This patch does two things: - pass an intermediate well sized buffer to plpar_hcall(). This is acceptalbe since we're not on a hot path. - move to the new read API so that we know the return buffer size for sure. Signed-off-by: Greg Kurz Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman commit c9ccfcc2d0e33b3390574b41facc54cba59dbf98 Author: Cristian Stoica Date: Thu Aug 14 13:51:56 2014 +0300 crypto: caam - remove duplicated sg copy functions commit 307fd543f3d23f8f56850eca1b27b1be2fe71017 upstream. Replace equivalent (and partially incorrect) scatter-gather functions with ones from crypto-API. The replacement is motivated by page-faults in sg_copy_part triggered by successive calls to crypto_hash_update. The following fault appears after calling crypto_ahash_update twice, first with 13 and then with 285 bytes: Unable to handle kernel paging request for data at address 0x00000008 Faulting instruction address: 0xf9bf9a8c Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=8 CoreNet Generic Modules linked in: tcrypt(+) caamhash caam_jr caam tls CPU: 6 PID: 1497 Comm: cryptomgr_test Not tainted 3.12.19-rt30-QorIQ-SDK-V1.6+g9fda9f2 #75 task: e9308530 ti: e700e000 task.ti: e700e000 NIP: f9bf9a8c LR: f9bfcf28 CTR: c0019ea0 REGS: e700fb80 TRAP: 0300 Not tainted (3.12.19-rt30-QorIQ-SDK-V1.6+g9fda9f2) MSR: 00029002 CR: 44f92024 XER: 20000000 DEAR: 00000008, ESR: 00000000 GPR00: f9bfcf28 e700fc30 e9308530 e70b1e55 00000000 ffffffdd e70b1e54 0bebf888 GPR08: 902c7ef5 c0e771e2 00000002 00000888 c0019ea0 00000000 00000000 c07a4154 GPR16: c08d0000 e91a8f9c 00000001 e98fb400 00000100 e9c83028 e70b1e08 e70b1d48 GPR24: e992ce10 e70b1dc8 f9bfe4f4 e70b1e55 ffffffdd e70b1ce0 00000000 00000000 NIP [f9bf9a8c] sg_copy+0x1c/0x100 [caamhash] LR [f9bfcf28] ahash_update_no_ctx+0x628/0x660 [caamhash] Call Trace: [e700fc30] [f9bf9c50] sg_copy_part+0xe0/0x160 [caamhash] (unreliable) [e700fc50] [f9bfcf28] ahash_update_no_ctx+0x628/0x660 [caamhash] [e700fcb0] [f954e19c] crypto_tls_genicv+0x13c/0x300 [tls] [e700fd10] [f954e65c] crypto_tls_encrypt+0x5c/0x260 [tls] [e700fd40] [c02250ec] __test_aead.constprop.9+0x2bc/0xb70 [e700fe40] [c02259f0] alg_test_aead+0x50/0xc0 [e700fe60] [c02241e4] alg_test+0x114/0x2e0 [e700fee0] [c022276c] cryptomgr_test+0x4c/0x60 [e700fef0] [c004f658] kthread+0x98/0xa0 [e700ff40] [c000fd04] ret_from_kernel_thread+0x5c/0x64 Signed-off-by: Herbert Xu Cc: Cristian Stoica Signed-off-by: Greg Kroah-Hartman commit 77204ef5865a366573b4ee87c74daf6361039b96 Author: Cristian Stoica Date: Thu Oct 30 14:40:22 2014 +0200 crypto: caam - fix missing dma unmap on error path commit 738459e3f88538f2ece263424dafe5d91799e46b upstream. If dma mapping for dma_addr_out fails, the descriptor memory is freed but the previous dma mapping for dma_addr_in remains. This patch resolves the missing dma unmap and groups resource allocations at function start. Signed-off-by: Cristian Stoica Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman commit 340eb9b208105e5c86b936d7f8b8d8e865f1bc6f Author: Weijie Yang Date: Thu Nov 13 15:19:05 2014 -0800 zram: avoid kunmap_atomic() of a NULL pointer commit c406515239376fc93a30d5d03192182160cbd3fb upstream. zram could kunmap_atomic() a NULL pointer in a rare situation: a zram page becomes a full-zeroed page after a partial write io. The current code doesn't handle this case and performs kunmap_atomic() on a NULL pointer, which panics the kernel. This patch fixes this issue. Signed-off-by: Weijie Yang Cc: Sergey Senozhatsky Cc: Dan Streetman Cc: Nitin Gupta Cc: Weijie Yang Acked-by: Jerome Marchand Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 882b3ba4a885d3bc458e09ad951b44047b3b27a7 Author: Andreas Larsson Date: Wed Nov 5 15:52:08 2014 +0100 sparc32: Implement xchg and atomic_xchg using ATOMIC_HASH locks [ Upstream commit 1a17fdc4f4ed06b63fac1937470378a5441a663a ] Atomicity between xchg and cmpxchg cannot be guaranteed when xchg is implemented with a swap and cmpxchg is implemented with locks. Without this, e.g. mcs_spin_lock and mcs_spin_unlock are broken. Signed-off-by: Andreas Larsson Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit ca9811bf3fc654c277e17061452e97b0144c8740 Author: David S. Miller Date: Fri Nov 7 09:50:48 2014 -0800 sparc64: Do irq_{enter,exit}() around generic_smp_call_function*(). [ Upstream commit ab5c780913bca0a5763ca05dd5c2cb5cb08ccb26 ] Otherwise rcu_irq_{enter,exit}() do not happen and we get dumps like: ==================== [ 188.275021] =============================== [ 188.309351] [ INFO: suspicious RCU usage. ] [ 188.343737] 3.18.0-rc3-00068-g20f3963-dirty #54 Not tainted [ 188.394786] ------------------------------- [ 188.429170] include/linux/rcupdate.h:883 rcu_read_lock() used illegally while idle! [ 188.505235] other info that might help us debug this: [ 188.554230] RCU used illegally from idle CPU! rcu_scheduler_active = 1, debug_locks = 0 [ 188.637587] RCU used illegally from extended quiescent state! [ 188.690684] 3 locks held by swapper/7/0: [ 188.721932] #0: (&x->wait#11){......}, at: [<0000000000495de8>] complete+0x8/0x60 [ 188.797994] #1: (&p->pi_lock){-.-.-.}, at: [<000000000048510c>] try_to_wake_up+0xc/0x400 [ 188.881343] #2: (rcu_read_lock){......}, at: [<000000000048a910>] select_task_rq_fair+0x90/0xb40 [ 188.973043]stack backtrace: [ 188.993879] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 3.18.0-rc3-00068-g20f3963-dirty #54 [ 189.076187] Call Trace: [ 189.089719] [0000000000499360] lockdep_rcu_suspicious+0xe0/0x100 [ 189.147035] [000000000048a99c] select_task_rq_fair+0x11c/0xb40 [ 189.202253] [00000000004852d8] try_to_wake_up+0x1d8/0x400 [ 189.252258] [000000000048554c] default_wake_function+0xc/0x20 [ 189.306435] [0000000000495554] __wake_up_common+0x34/0x80 [ 189.356448] [00000000004955b4] __wake_up_locked+0x14/0x40 [ 189.406456] [0000000000495e08] complete+0x28/0x60 [ 189.448142] [0000000000636e28] blk_end_sync_rq+0x8/0x20 [ 189.496057] [0000000000639898] __blk_mq_end_request+0x18/0x60 [ 189.550249] [00000000006ee014] scsi_end_request+0x94/0x180 [ 189.601286] [00000000006ee334] scsi_io_completion+0x1d4/0x600 [ 189.655463] [00000000006e51c4] scsi_finish_command+0xc4/0xe0 [ 189.708598] [00000000006ed958] scsi_softirq_done+0x118/0x140 [ 189.761735] [00000000006398ec] __blk_mq_complete_request_remote+0xc/0x20 [ 189.827383] [00000000004c75d0] generic_smp_call_function_single_interrupt+0x150/0x1c0 [ 189.906581] [000000000043e514] smp_call_function_single_client+0x14/0x40 ==================== Based almost entirely upon a patch by Paul E. McKenney. Reported-by: Meelis Roos Tested-by: Meelis Roos Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit b51182e60458c700df0ad4b9591efb8954099804 Author: David S. Miller Date: Sat Nov 1 00:33:58 2014 -0400 sparc64: Fix crashes in schizo_pcierr_intr_other(). [ Upstream commit 7da89a2a3776442a57e918ca0b8678d1b16a7072 ] Meelis Roos reports crashes during bootup on a V480 that look like this: ==================== [ 61.300577] PCI: Scanning PBM /pci@9,600000 [ 61.304867] schizo f009b070: PCI host bridge to bus 0003:00 [ 61.310385] pci_bus 0003:00: root bus resource [io 0x7ffe9000000-0x7ffe9ffffff] (bus address [0x0000-0xffffff]) [ 61.320515] pci_bus 0003:00: root bus resource [mem 0x7fb00000000-0x7fbffffffff] (bus address [0x00000000-0xffffffff]) [ 61.331173] pci_bus 0003:00: root bus resource [bus 00] [ 61.385344] Unable to handle kernel NULL pointer dereference [ 61.390970] tsk->{mm,active_mm}->context = 0000000000000000 [ 61.396515] tsk->{mm,active_mm}->pgd = fff000b000002000 [ 61.401716] \|/ ____ \|/ [ 61.401716] "@'/ .. \`@" [ 61.401716] /_| \__/ |_\ [ 61.401716] \__U_/ [ 61.416362] swapper/0(0): Oops [#1] [ 61.419837] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.18.0-rc1-00422-g2cc9188-dirty #24 [ 61.427975] task: fff000b0fd8e9c40 ti: fff000b0fd928000 task.ti: fff000b0fd928000 [ 61.435426] TSTATE: 0000004480e01602 TPC: 00000000004455e4 TNPC: 00000000004455e8 Y: 00000000 Not tainted [ 61.445230] TPC: [ 61.449897] g0: 0000000000000000 g1: 0000000000000000 g2: 0000000000a10f78 g3: 000000000000000a [ 61.458563] g4: fff000b0fd8e9c40 g5: fff000b0fdd82000 g6: fff000b0fd928000 g7: 000000000000000a [ 61.467229] o0: 000000000000003d o1: 0000000000000000 o2: 0000000000000006 o3: fff000b0ffa5fc7e [ 61.475894] o4: 0000000000060000 o5: c000000000000000 sp: fff000b0ffa5f3c1 ret_pc: 00000000004455cc [ 61.484909] RPC: [ 61.489500] l0: fff000b0fd8e9c40 l1: 0000000000a20800 l2: 0000000000000000 l3: 000000000119a430 [ 61.498164] l4: 0000000001742400 l5: 00000000011cfbe0 l6: 00000000011319c0 l7: fff000b0fd8ea348 [ 61.506830] i0: 0000000000000000 i1: fff000b0fdb34000 i2: 0000000320000000 i3: 0000000000000000 [ 61.515497] i4: 00060002010b003f i5: 0000040004e02000 i6: fff000b0ffa5f481 i7: 00000000004a9920 [ 61.524175] I7: [ 61.529099] Call Trace: [ 61.531531] [00000000004a9920] handle_irq_event_percpu+0x40/0x140 [ 61.537681] [00000000004a9a58] handle_irq_event+0x38/0x80 [ 61.543145] [00000000004ac77c] handle_fasteoi_irq+0xbc/0x200 [ 61.548860] [00000000004a9084] generic_handle_irq+0x24/0x40 [ 61.554500] [000000000042be0c] handler_irq+0xac/0x100 ==================== The problem is that pbm->pci_bus->self is NULL. This code is trying to go through the standard PCI config space interfaces to read the PCI controller's PCI_STATUS register. This doesn't work, because we more often than not do not enumerate the PCI controller as a bonafide PCI device during the OF device node scan. Therefore bus->self remains NULL. Existing common code for PSYCHO and PSYCHO-like PCI controllers handles this properly, by doing the config space access directly. Do the same here, pbm->pci_ops->{read,write}(). Reported-by: Meelis Roos Tested-by: Meelis Roos Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 3698c493c93203f3728bca65a594dd364b6f9e35 Author: Dwight Engen Date: Thu Oct 30 15:55:35 2014 -0400 sunvdc: don't call VD_OP_GET_VTOC [ Upstream commit 85b0c6e62c48bb9179fd5b3e954f362fb346cbd5 ] The VD_OP_GET_VTOC operation will succeed only if the vdisk backend has a VTOC label, otherwise it will fail. In particular, it will return error 48 (ENOTSUP) if the disk has an EFI label. VTOC disk labels are already handled by directly reading the disk in block/partitions/sun.c (enabled by CONFIG_SUN_PARTITION which defaults to y on SPARC). Since port->label is unused in the driver, remove the call and the field. Signed-off-by: Dwight Engen Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 89f94e268ce7b8d127485dbff57824045d1c4cd5 Author: Dwight Engen Date: Fri Sep 19 09:43:02 2014 -0400 vio: fix reuse of vio_dring slot [ Upstream commit d0aedcd4f14a22e23b313f42b7e6e6ebfc0fbc31 ] vio_dring_avail() will allow use of every dring entry, but when the last entry is allocated then dr->prod == dr->cons which is indistinguishable from the ring empty condition. This causes the next allocation to reuse an entry. When this happens in sunvdc, the server side vds driver begins nack'ing the messages and ends up resetting the ldc channel. This problem does not effect sunvnet since it checks for < 2. The fix here is to just never allocate the very last dring slot so that full and empty are not the same condition. The request start path was changed to check for the ring being full a bit earlier, and to stop the blk_queue if there is no space left. The blk_queue will be restarted once the ring is only half full again. The number of ring entries was increased to 512 which matches the sunvnet and Solaris vdc drivers, and greatly reduces the frequency of hitting the ring full condition and the associated blk_queue stop/starting. The checks in sunvent were adjusted to account for vio_dring_avail() returning 1 less. Orabug: 19441666 OraBZ: 14983 Signed-off-by: Dwight Engen Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 42f107f9c7644f201f0d2d56ae106814f51ed3bf Author: Dwight Engen Date: Fri Sep 19 09:42:53 2014 -0400 sunvdc: limit each sg segment to a page [ Upstream commit 5eed69ffd248c9f68f56c710caf07db134aef28b ] ldc_map_sg() could fail its check that the number of pages referred to by the sg scatterlist was <= the number of cookies. This fixes the issue by doing a similar thing to the xen-blkfront driver, ensuring that the scatterlist will only ever contain a segment count <= port->ring_cookies, and each segment will be page aligned, and <= page size. This ensures that the scatterlist is always mappable. Orabug: 19347817 OraBZ: 15945 Signed-off-by: Dwight Engen Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit b3c23e4a27492ae9321c6a719b9b3b6d7090d416 Author: Allen Pais Date: Fri Sep 19 09:42:26 2014 -0400 sunvdc: compute vdisk geometry from capacity [ Upstream commit de5b73f08468b4fc5e2f6d1505f650262622f78b ] The LDom diskserver doesn't return reliable geometry data. In addition, the types for all fields in the vio_disk_geom are u16, which were being truncated in the cast into the u8's of the Linux struct hd_geometry. Modify vdc_getgeo() to compute the geometry from the disk's capacity in a manner consistent with xen-blkfront::blkif_getgeo(). Signed-off-by: Dwight Engen Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit caa0a8a8bb14fd9a6a9d2544eab141f3a6469ba3 Author: Allen Pais Date: Fri Sep 19 09:42:14 2014 -0400 sunvdc: add cdrom and v1.1 protocol support [ Upstream commit 9bce21828d54a95143f1b74619705c2dd8e88b92 ] Interpret the media type from v1.1 protocol to support CDROM/DVD. For v1.0 protocol, a disk's size continues to be calculated from the geometry returned by the vdisk server. The geometry returned by the server can be less than the actual number of sectors available in the backing image/device due to the rounding in the division used to compute the geometry in the vdisk server. In v1.1 protocol a disk's actual size in sectors is returned during the handshake. Use this size when v1.1 protocol is negotiated. Since this size will always be larger than the former geometry computed size, disks created under v1.0 will be forwards compatible to v1.1, but not vice versa. Signed-off-by: Dwight Engen Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit fa4f20e262fec43854d6bec0a6673829febbce1a Author: Enric Balletbo i Serra Date: Thu Nov 13 09:14:34 2014 +0100 smsc911x: power-up phydev before doing a software reset. [ Upstream commit ccf899a27c08038db91765ff12bb0380dcd85887 ] With commit be9dad1f9f26604fb ("net: phy: suspend phydev when going to HALTED"), the PHY device will be put in a low-power mode using BMCR_PDOWN if the the interface is set down. The smsc911x driver does a software_reset opening the device driver (ndo_open). In such case, the PHY must be powered-up before access to any register and before calling the software_reset function. Otherwise, as the PHY is powered down the software reset fails and the interface can not be enabled again. This patch fixes this scenario that is easy to reproduce setting down the network interface and setting up again. $ ifconfig eth0 down $ ifconfig eth0 up ifconfig: SIOCSIFFLAGS: Input/output error Signed-off-by: Enric Balletbo i Serra Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit f58ee885658bd83f6580751a2b8a8e3ad6cd6665 Author: Daniel Borkmann Date: Mon Nov 10 18:00:09 2014 +0100 net: sctp: fix memory leak in auth key management [ Upstream commit 4184b2a79a7612a9272ce20d639934584a1f3786 ] A very minimal and simple user space application allocating an SCTP socket, setting SCTP_AUTH_KEY setsockopt(2) on it and then closing the socket again will leak the memory containing the authentication key from user space: unreferenced object 0xffff8800837047c0 (size 16): comm "a.out", pid 2789, jiffies 4296954322 (age 192.258s) hex dump (first 16 bytes): 01 00 00 00 04 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [] kmemleak_alloc+0x4e/0xb0 [] __kmalloc+0xe8/0x270 [] sctp_auth_create_key+0x23/0x50 [sctp] [] sctp_auth_set_key+0xa1/0x140 [sctp] [] sctp_setsockopt+0xd03/0x1180 [sctp] [] sock_common_setsockopt+0x14/0x20 [] SyS_setsockopt+0x71/0xd0 [] system_call_fastpath+0x12/0x17 [] 0xffffffffffffffff This is bad because of two things, we can bring down a machine from user space when auth_enable=1, but also we would leave security sensitive keying material in memory without clearing it after use. The issue is that sctp_auth_create_key() already sets the refcount to 1, but after allocation sctp_auth_set_key() does an additional refcount on it, and thus leaving it around when we free the socket. Fixes: 65b07e5d0d0 ("[SCTP]: API updates to suport SCTP-AUTH extensions.") Signed-off-by: Daniel Borkmann Cc: Vlad Yasevich Acked-by: Neil Horman Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 358905266ed83d4a9e693ae7ff86c1595220ec60 Author: Daniel Borkmann Date: Mon Nov 10 17:54:26 2014 +0100 net: sctp: fix NULL pointer dereference in af->from_addr_param on malformed packet [ Upstream commit e40607cbe270a9e8360907cb1e62ddf0736e4864 ] An SCTP server doing ASCONF will panic on malformed INIT ping-of-death in the form of: ------------ INIT[PARAM: SET_PRIMARY_IP] ------------> While the INIT chunk parameter verification dissects through many things in order to detect malformed input, it misses to actually check parameters inside of parameters. E.g. RFC5061, section 4.2.4 proposes a 'set primary IP address' parameter in ASCONF, which has as a subparameter an address parameter. So an attacker may send a parameter type other than SCTP_PARAM_IPV4_ADDRESS or SCTP_PARAM_IPV6_ADDRESS, param_type2af() will subsequently return 0 and thus sctp_get_af_specific() returns NULL, too, which we then happily dereference unconditionally through af->from_addr_param(). The trace for the log: BUG: unable to handle kernel NULL pointer dereference at 0000000000000078 IP: [] sctp_process_init+0x492/0x990 [sctp] PGD 0 Oops: 0000 [#1] SMP [...] Pid: 0, comm: swapper Not tainted 2.6.32-504.el6.x86_64 #1 Bochs Bochs RIP: 0010:[] [] sctp_process_init+0x492/0x990 [sctp] [...] Call Trace: [] ? sctp_bind_addr_copy+0x5d/0xe0 [sctp] [] sctp_sf_do_5_1B_init+0x21b/0x340 [sctp] [] sctp_do_sm+0x71/0x1210 [sctp] [] ? sctp_endpoint_lookup_assoc+0xc9/0xf0 [sctp] [] sctp_endpoint_bh_rcv+0x116/0x230 [sctp] [] sctp_inq_push+0x56/0x80 [sctp] [] sctp_rcv+0x982/0xa10 [sctp] [] ? ipt_local_in_hook+0x23/0x28 [iptable_filter] [] ? nf_iterate+0x69/0xb0 [] ? ip_local_deliver_finish+0x0/0x2d0 [] ? nf_hook_slow+0x76/0x120 [] ? ip_local_deliver_finish+0x0/0x2d0 [...] A minimal way to address this is to check for NULL as we do on all other such occasions where we know sctp_get_af_specific() could possibly return with NULL. Fixes: d6de3097592b ("[SCTP]: Add the handling of "Set Primary IP Address" parameter to INIT") Signed-off-by: Daniel Borkmann Cc: Vlad Yasevich Acked-by: Neil Horman Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit b101a7aedb0708d461105c95ef5f9d7dedb1a2f0 Author: Marcelo Leitner Date: Thu Nov 13 14:43:08 2014 -0200 vxlan: Do not reuse sockets for a different address family [ Upstream commit 19ca9fc1445b76b60d34148f7ff837b055f5dcf3 ] Currently, we only match against local port number in order to reuse socket. But if this new vxlan wants an IPv6 socket and a IPv4 one bound to that port, vxlan will reuse an IPv4 socket as IPv6 and a panic will follow. The following steps reproduce it: # ip link add vxlan6 type vxlan id 42 group 229.10.10.10 \ srcport 5000 6000 dev eth0 # ip link add vxlan7 type vxlan id 43 group ff0e::110 \ srcport 5000 6000 dev eth0 # ip link set vxlan6 up # ip link set vxlan7 up [ 4.187481] BUG: unable to handle kernel NULL pointer dereference at 0000000000000058 ... [ 4.188076] Call Trace: [ 4.188085] [] ? ipv6_sock_mc_join+0x3a/0x630 [ 4.188098] [] vxlan_igmp_join+0x66/0xd0 [vxlan] [ 4.188113] [] process_one_work+0x220/0x710 [ 4.188125] [] ? process_one_work+0x1b4/0x710 [ 4.188138] [] worker_thread+0x11b/0x3a0 [ 4.188149] [] ? process_one_work+0x710/0x710 So address family must also match in order to reuse a socket. Reported-by: Jean-Tsung Hsiao Signed-off-by: Marcelo Ricardo Leitner Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 4dd750b420b5753611ccde564e9fe5ebe18bcf7d Author: Steffen Klassert Date: Mon Nov 3 09:19:30 2014 +0100 gre6: Move the setting of dev->iflink into the ndo_init functions. [ Upstream commit f03eb128e3f4276f46442d14f3b8f864f3775821 ] Otherwise it gets overwritten by register_netdev(). Signed-off-by: Steffen Klassert Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 6dfa82b47afc05b4d8885c09191d93c1e531d463 Author: Steffen Klassert Date: Mon Nov 3 09:19:29 2014 +0100 sit: Use ipip6_tunnel_init as the ndo_init function. [ Upstream commit ebe084aafb7e93adf210e80043c9f69adf56820d ] ipip6_tunnel_init() sets the dev->iflink via a call to ipip6_tunnel_bind_dev(). After that, register_netdevice() sets dev->iflink = -1. So we loose the iflink configuration for ipv6 tunnels. Fix this by using ipip6_tunnel_init() as the ndo_init function. Then ipip6_tunnel_init() is called after dev->iflink is set to -1 from register_netdevice(). Signed-off-by: Steffen Klassert Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 466f1a5f1e5f4901f01a1f6a658f8847570c07a0 Author: Steffen Klassert Date: Mon Nov 3 09:19:28 2014 +0100 vti6: Use vti6_dev_init as the ndo_init function. [ Upstream commit 16a0231bf7dc3fb37e9b1f1cb1a277dc220b5c5e ] vti6_dev_init() sets the dev->iflink via a call to vti6_link_config(). After that, register_netdevice() sets dev->iflink = -1. So we loose the iflink configuration for vti6 tunnels. Fix this by using vti6_dev_init() as the ndo_init function. Then vti6_dev_init() is called after dev->iflink is set to -1 from register_netdevice(). Signed-off-by: Steffen Klassert Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit ba4ac5f9be89fe217f1f79262a90091d2508e6eb Author: Steffen Klassert Date: Mon Nov 3 09:19:27 2014 +0100 ip6_tunnel: Use ip6_tnl_dev_init as the ndo_init function. [ Upstream commit 6c6151daaf2d8dc2046d9926539feed5f66bf74e ] ip6_tnl_dev_init() sets the dev->iflink via a call to ip6_tnl_link_config(). After that, register_netdevice() sets dev->iflink = -1. So we loose the iflink configuration for ipv6 tunnels. Fix this by using ip6_tnl_dev_init() as the ndo_init function. Then ip6_tnl_dev_init() is called after dev->iflink is set to -1 from register_netdevice(). Signed-off-by: Steffen Klassert Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 41e881107a3d22b53c3b1c3116b1c2afd4561f80 Author: Greg Kroah-Hartman Date: Fri Nov 14 10:25:11 2014 -0800 Revert "drivers/net: Disable UFO through virtio" This reverts commit 2b52d6c6beda6308ba95024a1eba1dfc9515ba32 which was commit 3d0ad09412ffe00c9afa201d01effdb6023d09b4 upstream. Ben writes: Please drop this patch for 3.14 and 3.17. It causes problems for migration of VMs and we're probably going to revert part of this. The following patch ("drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packets") might no longer apply, in which case you can drop that as well until we have this sorted out upstream. Cc: Ben Hutchings Cc: David S. Miller Signed-off-by: Greg Kroah-Hartman