Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x86_64: mm: support numa migration pte entry #379

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 38 additions & 7 deletions libdrgn/arch_x86_64.c
Original file line number Diff line number Diff line change
Expand Up @@ -574,18 +574,49 @@ linux_kernel_pgtable_iterator_init_x86_64(struct drgn_program *prog,
memset(it->index, 0xff, sizeof(it->index));
}

// pte flags
static const uint64_t _PAGE_PRESENT = 1 << 0;
static const uint64_t _PAGE_PROTNONE = 1 << 8;

// TODO: PTE_PFN_MASK may be different because of CONFIG_DYNAMIC_PHYSICAL_MASK
static const uint64_t PTE_PFN_MASK = UINT64_C(0xffffffffff000);
static const int PAGE_SHIFT = 12;

static inline bool __pte_needs_invert(uint64_t val)
{
return val && !(val & _PAGE_PRESENT);
}

static inline uint64_t protnone_mask(uint64_t val)
{
return __pte_needs_invert(val) ? ~0ull : 0;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory, ~0ull could be a different size than uint64_t, so I'd feel better about:

Suggested change
return __pte_needs_invert(val) ? ~0ull : 0;
return __pte_needs_invert(val) ? -1 : 0;

}

static inline uint64_t pte_flags(uint64_t pte)
{
return pte & ((1 << PAGE_SHIFT) - 1);
}
Comment on lines +585 to +598
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to fold these small helpers into their callers.


static inline int pte_present(uint64_t pte)
{
return pte_flags(pte) & (_PAGE_PRESENT | _PAGE_PROTNONE);
}

static inline uint64_t pte_phys(uint64_t pte)
{
pte ^= protnone_mask(pte);
return pte & PTE_PFN_MASK;
}

static struct drgn_error *
linux_kernel_pgtable_iterator_next_x86_64(struct drgn_program *prog,
struct pgtable_iterator *_it,
uint64_t *virt_addr_ret,
uint64_t *phys_addr_ret)
{
static const int PAGE_SHIFT = 12;
static const int PGTABLE_SHIFT = 9;
static const int PGTABLE_MASK = (1 << PGTABLE_SHIFT) - 1;
static const uint64_t PRESENT = 0x1;
static const uint64_t PSE = 0x80; // a.k.a. huge page
static const uint64_t ADDRESS_MASK = UINT64_C(0xffffffffff000);
struct drgn_error *err;
bool bswap = drgn_platform_bswap(&prog->platform);
struct pgtable_iterator_x86_64 *it =
Expand Down Expand Up @@ -623,14 +654,14 @@ linux_kernel_pgtable_iterator_next_x86_64(struct drgn_program *prog,
uint64_t entry = it->table[level][it->index[level]++];
if (bswap)
entry = bswap_64(entry);
table = entry & ADDRESS_MASK;
if (!(entry & PRESENT) || (entry & PSE) || level == 0) {
table = entry & PTE_PFN_MASK;
if (!(entry & _PAGE_PRESENT) || (entry & PSE) || level == 0) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can huge pages also have numa migration entries? pmd_present() in the kernel checks _PAGE_PROTNONE, so maybe 2MB huge pages can? If so, this check probably needs to be slightly different (and depend on the level).

uint64_t mask = (UINT64_C(1) <<
(PAGE_SHIFT +
PGTABLE_SHIFT * level)) - 1;
*virt_addr_ret = virt_addr & ~mask;
if (entry & PRESENT)
*phys_addr_ret = table & ~mask;
if (pte_present(entry))
*phys_addr_ret = pte_phys(entry);
Comment on lines +663 to +664
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will also get entries for higher levels (for huge pages or unmapped levels), so the pte_* names are not completely accurate. The functions themselves probably also need to depend on the level as noted above.

else
*phys_addr_ret = UINT64_MAX;
it->it.virt_addr = (virt_addr | mask) + 1;
Expand Down