[ Content | Sidebar ]

Final Fantasy VII

March 8th, 2021

The last few months I’ve been playing through the PC version of Final Fantasy VII. I first played this on the original PlayStation way back around year 9 in secondary school and it was the first time I’d played any kind of Japanese RPG.

Emerald Weapon


One of my friends was obsessed with the game and the lore around it. He had the golden Chocobo, all the materia and summons, and defeated both the Emerald and Ruby Weapon “superbosses”. I think I just played the main story through to the end and didn’t bother much with the optional content.

Press [OK] to slap…


I love the pre-rendered backgrounds, reminiscent of PC point-and-click adventure games of the same era. However they look a bit pixelated when scaled up on a large monitor (apparently they lost the original 3D scenes). The character models look great on a modern system though.

A regular battle


The story is great too and has tons of memorable characters (albeit mostly the antagonists). There’s a lot more depth to the story than I realised as a teenager. Perhaps my understanding at that time was hampered by the dodgy English translation, but it has a certain charm. I also much prefer the “science-plus-magic” setting to the common fantasy one, it’s almost a bit cyberpunk.

Finally got to level 99


What’s not great? Well it’s kinda easy: once you’ve figured out some tricks with the materia system the main story doesn’t have much challenge. I wish there was a configurable difficulty level (I heard the PC version has a mod to do this but haven’t tried). And the occasional mini-games, especially the snowboarding one, are annoying and haven’t aged well.

Generating perf maps with OpenJDK 17

February 28th, 2021

Linux perf is a fantastically useful tool for all sorts of profiling tasks. It’s a statistical profiler that works by capturing the program counter value when a particular performance event occurs. This event is typically generated by a timer (e.g. 1kHz) but can be any event supported by your processor’s PMU (e.g. cache miss, branch mispredict, etc.). Try perf list to see the events available on your system.

A table of raw program counter addresses isn’t particularly useful to humans so perf needs to associate each address with the symbol (function) that contains it. For ahead-of-time compiled programs and shared libraries perf can look this up in the ELF symbol table on disk, but for JIT-compiled languages like Java this information isn’t available as the code is generated on-the-fly in memory.

Let’s look at what perf top reports while running a CPU-intensive Java program:

Samples: 136K of event 'cycles', 4000 Hz, Event count (approx.): 57070116973 lost: 0/0 drop: 0/0
Overhead  Shared Object                       Symbol
  16.33%  [JIT] tid 41266                     [.] 0x00007fd733e40ec0
  16.15%  [JIT] tid 41266                     [.] 0x00007fd733e40e3b
  16.14%  [JIT] tid 41266                     [.] 0x00007fd733e40df1
  16.14%  [JIT] tid 41266                     [.] 0x00007fd733e40e81
   2.80%  [JIT] tid 41266                     [.] 0x00007fd733e40df5
   2.62%  [JIT] tid 41266                     [.] 0x00007fd733e40e41
   2.45%  [JIT] tid 41266                     [.] 0x00007fd733e40ec4
   2.43%  [JIT] tid 41266                     [.] 0x00007fd733e40e87

Perf marks these locations as [JIT] because the addresses are in part of the process’s address map not backed by a file. Because the addresses are all very similar we might guess they’re in the same method, but perf has no way to group them and shows each unique address separately. Apart from that it’s not very helpful for figuring out which method is consuming all the cycles.

As an aside, it’s worth briefly comparing perf’s approach, which samples the exact hardware instruction being executed when a PMU event occurs, with a traditional Java profiler like VisualVM which samples at the JVM level (i.e. bytecodes). A JVM profiler needs to interrupt the thread, then record the current method, bytecode index, and stack trace, and finally resume the thread. Obviously this has larger overhead but there is a deeper problem: JIT-ed code cannot be interrupted at arbitrary points because the runtime may not be able to accurately reconstruct the VM state at that point. For example one cannot inspect the VM state halfway through executing a bytecode. So at the very least the JIT-ed code needs to continue executing to the end of the current bytecode. But requiring the VM to be in a consistent state at the end of each bytecode places too many restrictions on an optimising JIT. Therefore the optimised code can typically only be interrupted at special “safepoints” inserted by the JIT – in Hotspot this is at method return and loop back-edges. That means that a JVM profiler can only see the thread stopped at one of these safepoints which may deviate from the actual hot parts of the code, sometimes wildly. This problem is known as safepoint bias.

So a hardware profiler can give us better accuracy, but how to translate the JIT-ed code addresses to Java method names? Currently there are at least two tools to do this, both of which are implemented as JVMTI plugins that load into a JVM process and then dump some metadata for perf to use.

The first is the “jitdump” plugin that is part of the Linux perf tree. After being loaded into a JVM, the plugin writes out all the machine code and metadata for each method that is JIT compiled. Later this file can be combined with recorded profile data using perf inject --jit to produce an annotated data file with the correct symbol names, as well as a separate ELF shared object for each method allowing perf to display the assembly code. I use this often at work when I need to do some detailed JIT profiling, but the offline annotation step is cumbersome and the data files can be 100s of MB for large programs. The plugin itself is complex and historically buggy. I’ve fixed several of those issues myself but wouldn’t be surprised if there are more lurking. This tool is mostly overkill for typical Java developers.

The second tool that I’m aware of is perf-map-agent. This generates a perf “map” file which is a simple text file listing start address, length, and symbol name for each JIT compiled method. Perf will load this file automatically if it finds one in /tmp. As the map file doesn’t contain the actual instructions it’s much smaller than the jitdump file, and doesn’t require an extra step to annotate the profile data so it can be used for live profiling (i.e. perf top). The downsides are that profile data is aggregated by method so you can’t drill down to individual machine instructions, and the map can become stale with a long-running VM as methods can be unloaded or recompiled. You also need to compile the plugin yourself as it’s not packaged in any distro and many people would rightly be wary of loading untrusted third-party code which has full access to the VM. So it would be much more convenient if the VM could just write this map file itself.

OpenJDK 17, which should be released early next month, has a minor new feature contributed by yours truly to do just this: either send the diagnostic command Compiler.perfmap to a running VM with jcmd, or run java with -XX:+DumpPerfMapAtExit, and the VM will write a perf map file that can be used to symbolise JIT-ed code.

$ jps
40885 Jps
40846 PiCalculator
$ jcmd 40846 Compiler.perfmap
40846:
Command executed successfully
$ head /tmp/perf-40846.map 
0x00007ff6dbe401a0 0x0000000000000238 void java.util.Arrays.fill(int[], int)
0x00007ff6dbe406a0 0x0000000000000338 void PiSpigout.calc()
0x00007ff6dbe40cc0 0x0000000000000468 void PiSpigout.calc()
0x00007ff6dbe41520 0x0000000000000138 int PiSpigout.invalidDigitsControl(int, int)
0x00007ff6d49091a0 0x0000000000000110 void java.lang.Object.<init>()
0x00007ff6d4909560 0x0000000000000350 int java.lang.String.hashCode()
0x00007ff6d4909b20 0x0000000000000130 byte java.lang.String.coder()
0x00007ff6d4909ea0 0x0000000000000170 boolean java.lang.String.isLatin1()

Here we can see for example that the String.hashCode() method starts at address 0x00007ff6d4909560 and is 0x350 bytes long. Let’s run perf top again:

Samples: 206K of event 'cycles', 4000 Hz, Event count (approx.): 94449755711 lost: 0/0 drop: 0/0
Overhead  Shared Object                         Symbol
  78.78%  [JIT] tid 40846                       [.] void PiSpigout.calc()
   0.56%  libicuuc.so.67.1                      [.] icu_67::RuleBasedBreakIterator::handleNext
   0.34%  ld-2.31.so                            [.] do_lookup_x
   0.30%  libglib-2.0.so.0.6600.3               [.] g_source_ref
   0.24%  libglib-2.0.so.0.6600.3               [.] g_hash_table_lookup
   0.24%  libc-2.31.so                          [.] __memmove_avx_unaligned_erms
   0.20%  libicuuc.so.67.1                      [.] umtx_lock_67

This is much better: now we can see that 79% of system-wide cycles are spent in one Java method PiSpigout.calc().

Alternatively we can do offline profiling with perf record:

$ perf record java -XX:+UnlockDiagnosticVMOptions -XX:+DumpPerfMapAtExit PiCalculator 50
3.1415926535897932384626433832795028841971693993750
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.030 MB perf.data (230 samples) ]
$ perf report
Samples: 340  of event 'cycles', Event count (approx.): 71930116
Overhead  Command          Shared Object       Symbol
  11.83%  java             [JIT] tid 44300     [.] Interpreter
   4.19%  java             [JIT] tid 44300     [.] flush_icache_stub
   3.87%  java             [kernel.kallsyms]   [k] mem_cgroup_from_task
...

Here the program didn’t run long enough for any method to be JIT compiled and we see most of the cycles are spent in the interpreter. (By the way, the interpreter still looks like JIT-ed code to perf because its machine code is generated dynamically by Hotspot at startup.)

I’d be interested to hear from anyone who tries this new feature and finds it useful, or otherwise!

YPbPr Mode vs RGB

February 26th, 2021

I was fiddling with my monitor settings today (Dell U2415) and noticed the “Input Color Format” was set to “YPbPr” instead of “RGB”. This is a compressed colour space where the chroma channel has half the resolution of the luminance channel. Normally this would be used for TVs or video encoding rather than a PC monitor. That said I’ve been using it this way for two years without noticing…

The problem is Dell monitors advertise this mode along with RGB in their HDMI EDID. The driver for my AMD graphics card see this and prefers it over RGB with no way to override the selection. There is one creative solution I found which involves patching a local copy of the EDID and telling the driver to load that from disk rather than reading it from the monitor. I took the simpler option of spending a few quid on a DisplayPort cable which only supports RGB.

The result? Fonts look a bit sharper… maybe… but it’s hard to tell.

Minhang Riverside Park

February 20th, 2021

On the last day of the holiday I decided to go for another walk, this time almost directly south to the Huangpu river.

The long straight section is walking along the “green corridor” next to metro line 5.

Underneath the Shanghai–Jiaxing–Huzhou expressway

Eventually I reached the Huangpu river, which I believe is the southern boundary of Minhang district. There’s a small park here called Minhang Riverside Park, which isn’t particularly notable. However the view of the river is dominated by a large single-span suspension bridge. Unfortunately Wikipedia, my usual source of bridge knowledge, failed me on this one. The words on the tower read 闵浦二桥 “MinPu 2nd Bridge”. Some web searching reveals that it was opened to traffic in 2010, the tower is 148 meters tall, and the deck is 40 meters above the water. It’s a rather prosaic concrete structure: they didn’t even bother to paint it. However it is notable for having two decks. The lower deck carries the metro line 5 extension to the Fengxian suburb, the only place the Shanghai metro crosses the Huangpu river above-water. You can just make out a train crossing the bridge in the picture below. They have to drive very slowly for some reason.

Minpu II bridge with metro train crossing

Speaking of metro lines, Shanghai’s brand new line 15 opened just last month and the southern terminus is a few miles from this spot. This line is generating a lot of excitement because all the trains are driverless, like London’s DLR. (Actually Shanghai already had driverless trains on the Pujiang line but it’s so out of the way not many people knew about it.) Obviously I ran straight to the front of the train and stared transfixed at the passing tunnel for the whole journey. I made a short video so you can experience it too.

Sijing Old Town

February 14th, 2021

I’ve got a week off for Chinese new year so it seemed like a good opportunity to go out for a walk. I planned to explore westwards along the Dianpu river which I’d walked before, but I didn’t get very far as the path along the river is intermittent at best and the big roads aren’t pleasant to walk along. So I took a detour to Sijing old town, which I found by chance while looking at the map.

The main attraction is probably Futian temple, a medium-sized Buddhist temple dating from the 18th century. There weren’t many visitors so it was very peaceful, much more so than other temples I’ve visited in China.

Outside the temple

The old town itself has seen better days. I’m not sure if it’s in the process of being demolished or renovated. Anyway most of the buildings are boarded up although there’s a free architecture exhibition that’s worth a look. I actually preferred wandering around here compared to some of the more over-comercialised old towns I’ve been too.

The waterfront view is more attractive

The pagoda was closed too although it seemed operational so that might just be because it’s the new year holiday.

Hipster cafe and bookshop

Emacs: enable whitespace-mode automatically in files with tabs

January 28th, 2021

I normally only use spaces to indent but sometimes I have to edit a file that contains tabs and I don’t want to break the existing indentation. In this situation Emacs’ whitespace-mode is very useful for showing exactly which whitespace characters are present in a file. But I don’t want to enable it all the time so I use this little function to check automatically if a file contains tabs, and if so enable whitespace-mode and also indent-tabs-mode so that any new lines are indented with tabs as well.

(defun check-for-tabs ()
  "Enable whitespace-mode and indent-tabs-mode if the buffer
contains tabs."
  (when (save-excursion
          (goto-char (point-min))
          (search-forward "\t" nil t))
    (whitespace-mode +1)
    (setq-local indent-tabs-mode t)))
 
(add-hook 'find-file-hook #'check-for-tabs)

Changing Emacs C align function to only apply once per line

January 3rd, 2021

I find myself using M-x align a lot in Emacs to line up the identifiers in blocks of C declarations or statements like the following:

struct foo {
   something x;
   int hello;
   long *blodgett;
};
 
int *q;
bazbazbaz *y;
static char fnork;
 
spoonful = sugar;
spork = 1;

After marking the region containing the declarations a quick M-x align transforms it into:

struct foo {
   something  x;
   int        hello;
   long      *blodgett;
};
 
int         *q;
bazbazbaz   *y;
static char  fnork;
 
spoonful = sugar;
spork    = 1;

Which looks much neater to my eyes. However when the declarations have initialisers the result is not at all pleasing:

int x = 5;
struct baz baz = { 1, 2 };
const int *this_is_a_pointer = NULL;

Becomes

int         x                 = 5;
struct baz  baz               = { 1, 2 };
const int  *this_is_a_pointer = NULL;

Really I only want it to align the identifiers and leave the = where they were. I’m sure I can’t be the only one with this problem but I’ve no idea what to type into Google or Stack Overflow to find the solution, and the manual doesn’t help either.

My current solution is to patch the list of alignment rules in align-rules-list with this bit of hackery:

;;; Define this in a file with `lexical-binding' set or else change the
;;; `let' below to `lexical-let'.
(defun nick-fix-c-align ()
  (let ((var-decl-regexp (alist-get 'regexp
                                    (alist-get 'c-variable-declaration
                                               align-rules-list))))
    (push `(valid . ,(lambda ()
                       (not (save-excursion
                              (end-of-line)
                              (looking-back var-decl-regexp)))))
          (alist-get 'c-assignment align-rules-list))))

The variable align-rules-list holds a list of rules for patterns like “C assignment”, “C variable declaration”, and so on. Each rule is an alist with a regex trigger, a list of modes to enable it in, an optional predicate valid to further restrict when it is run, and some other irrelevant options. The align function loops over this list and aligns any text where the regex matches and valid returns true.

The problem is C declarations with intialisers trigger both the c-variable-declaration and the c-assignment rules. There’s no way to tell it “stop processing rules if this matches” so the function above modifies the rules list to add an extra predicate to the c-assignment rule which says “do not apply this rule when the line also matches the regex for C variable declarations”.

align-load-hook is called after align.el has been loaded, which is a convienent time to patch the rules list. Afterwards M-x align lines up only the identifiers in declarations:

int         x = 5;
struct baz  baz = { 1, 2 };
const int  *this_is_a_pointer = NULL;

Mince Pie Scalpers

December 31st, 2020

It’s no secret that I love mince pies. But due to the pandemic I’m stuck in Shanghai this Christmas and the seasonal delicacy is hard to source. Now everyone who’s lived in China knows you can buy anything on Taobao, and sure enough I found a 代购 daigou specialising in British foodstuffs.

Unfortunately the price is around 5-10x what you’d normally pay in the UK. Feeling desperate I bought one box of Waitrose pies anyway, for a total of ¥154 with shipping (about £17). It eventually arrived two days before Christmas and a joyous time was had by all.

A delicous Waitrose regular mince pie

Filed in musings - Comments closed