Author Topic: GameBoy Emulation (Read 6030 times)

Spectere · « **on:** August 09, 2020, 02:20:56 PM »

So after a whole hell of a lot of research and development, I managed to emulate enough of a GameBoy to properly run the DMG boot ROM (it actually gets far enough to swap out the boot ROM for the first 0x100 bytes of the game cart, then start executing code from the game ROM, but the emulation isn't complete enough to run anything useful right now):

I still have plenty of work to do, but I think I'm off to a decent start.

Edit: I ended up doing a ton of improvements on this. After I got it to this point the emulator was unable to run at full speed on a mobile i9. No, I'm not joking. I didn't think to get an exact figure on how long one "second" of emulation time took in real-time, and my frontend doesn't support frame skipping, so it was effectively running at half speed.

It's kind of amazing how a few microseconds quickly start to add up when you're literally simulating five million clock ticks per second (CPU runs at ~1.05MHz, PPU dot clock is ~4.19MHz). I believe that my CPU core is cycle-accurate, and I'm aiming for cycle-accuracy on the PPU as well (to a point, anyway; some of the specific timings involved with the video generation phase%u2014PPU mode 3%u2014are unclear).

While there were a few little microoptimizations that probably did more for debug builds than release builds (flipping from range-based for loops to index-based, converting some if/else blocks to switch blocks, etc), the most impactful changes occurred on the memory mapper.

Basically, Plip was designed to be more of an emulation interface than a single emulator. It doesn't split off the cores as separate libraries, ala RetroArch (though it probably could, honestly), but it's conceptually similar.

One of the things that I did was generalize memory access. It has a memory mapper, and that memory mapper takes PlipMemory objects (that being a pure virtual class, with RAM and ROM implementations). The core then assigns those memory blocks to specific addresses, and when you want to access it you tell the mapper to fetch a byte and it'll handle all the hard work for you. For instance, if you have a ROM at 0x0000-0x3FFF, system RAM at 0x4000-0x5FFF, and video RAM at 0x6000-0x7FFF and you request a byte from 0x4800, it'll check the mapping table, find that the requested byte lives in the system RAM instance, then do some fancy math and return 0x0800 in system RAM.

This system is nice because it inherently supports banked ROM and RAM. All I have to do is update the offset of a block and I'm suddenly in a new bank. Additionally, this ended up being useful for a little GameBoy quirk, known as ECHO RAM (due to how the DMG's memory controller works, 0xC000-0xDDFF is mirrored to 0xE000-0xFDFF). All I had to do to simulate that was simply add the work RAM block to the upper address and it Just Worked%u2122.

Now, there is a pretty substantial problem with this that I hinted at earlier: that find routine costs CPU cycles, and doing it unnecessarily is a huge problem. Obviously, the CPU needs to use the mapper for all of its memory access, since it doesn't really know any details about the memory layout. This isn't a huge problem, seeing as the CPU only hits memory once per cycle at the most, since the GB's CPU cannot read or write memory more than once per cycle. The problem lies in the PPU. Not only is the dot clock four times faster than the CPU, but the PPU also has a bunch of registers that it needs to both read and update in order to display the image. This results in the function being called millions of times per second.

The fix for this was simple: since the PPU is only reading and writing certain specified registers, I can easily get away with directly accessing the PlipMemory objects declared in the core (m_videoRam, m_oam, and m_ioRegisters, in my case). I just handled all of the arithmetic to get the appropriate addresses in the various "static const" declarations. Easy peasy.

Even with that, it still wasn't fast enough, and there's a good reason why: because I needed a data structure that allows quick and easy inserts, I used std::list, STL's doubly-linked list implementation. Now, linked lists are fast, but since they are disparate objects tied together with pointers the compiler can't simply say "oh, it's just X address, plus the index times the size". It has to follow the trail of pointers, and this makes iteration significantly slower since it can't be easily cached, and the location of the data can't be predicted. Since the memory mapper supports all sorts of fancy features like being able to smash a block of memory on top of another one, I didn't want to abandon std::list altogether because it made everything so clean and easy. Fortunately, assigning blocks of memory is done relatively rarely, so I changed it to do block assignments against an std::list and building a far more efficient std::vector (which is basically a managed contiguous array) with the final contents of the list after it's built. Basically, trading in a minuscule amount of CPU time and memory to save a ton of cycles in the long run.

And finally, there's a matter of the return type of FindAddress. I was using std::tuple<PlipMemory*, uint32_t> (the pointer to the memory object and the memory address offset relative to that block). I ended up replacing that with a struct, which ended up reducing the overhead of that function quite a bit. I think std::pair<> would have been a safe bet as well. I might test that at some point. From what I understand the difference becomes moot on optimized builds, but this is one of those situations where micro-optimizations are actually useful in order to make the debugging process less awful.

I'm having a ton of fun with this project, in case it wasn't obvious. ;P

vladgd · « **Reply #1 on:** August 11, 2020, 01:52:30 PM »

I feel like that's substantial enough for its own topic. All of it is over my head, but what made you decide on gameboy of all things? Easier to code, or more personal interest?

Spectere · « **Reply #2 on:** August 11, 2020, 04:55:06 PM »

Yeah, you're not wrong, haha. I was originally going to just stick with the screenshot but that edit ended up taking on a life of its own. I'll go ahead and split this off.

I decided to start with the GameBoy because it has a simple architecture as far as systems are concerned, and because it does have some personal significance (the GBC was my second vidya game console, and my first Nintendo one). I was kind of juggling between the Master System and GB, but the latter won out because I have fonder memories of it.

When I say "simple architecture" I mean more compared to systems like the SNES. Emulating every single instruction of a CPU is still pretty tedious, and replicating all of the fun little hardware quirks, bugs, and errata without breaking other stuff often takes a bit of trial and error. The main thing that makes the GameBoy simple is because programming it just involves setting RAM values. The CPU doesn't have much in the way of I/O ports like many other ones do (in fact, the only I/O pins on the CPU are used for handling system input), and the instruction set is elegant and simple, barring a couple of oddball functions. The PPU ("Picture Processing Unit") is also relatively straight-forward, and doesn't have a million and a half modes that you have to worry about. Additionally, the memory bank controllers on the cartridge tend to be far simpler than the myriad of memory mappers that you'd find on, say, an NES cart.

So, yeah, overall it's a pretty good system for breaking into full-system emulation. There aren't too many moving parts, and you can generally get results fairly quickly (again, a very relative term—it probably took around 60 hours of research and development before I was able to get through the boot ROM).

Bobbias · « **Reply #3 on:** August 12, 2020, 05:35:49 PM »

I've been interested in attempting to writer an emulator for some time, but never actually took the plunge to try my hand at it. This is seriously awesome.

The closest thing I have to an emulator is a simple virtual machine created for the advent of code 2019 series problems, which is currently broken in some obscure way which I have yet to figure out. It worked fine until I tried to add what is effectively breakpoints where I could break execution on any instruction.

Spectere · « **Reply #4 on:** August 12, 2020, 09:55:21 PM »

Not sure exactly how your project is structured, but one thing that helped me was the way I structured the game loop. I pretty much just have my frontend telling the core how many microseconds it should run in a given cycle (then using that to figure out how long to wait for the next frame). Not exactly thread-safe, but this is more of a research project for me. I probably would have done things a little differently if I were aiming for the quality of something like BGB.

I don't really have a way of doing that just yet, but I'm currently working on adding a console that will allow me to query things, set breakpoints, and all that fun stuff while the emulation is running. The cores are exposed enough to the frontend that information gathering is possible, and things are set up in such a way that I can easily do single-stepping (at least for the GameBoy core, as everything is more or less timed based on the same clock). I'm not really in a position where actual games are booting, and I'm not sure if it's due to bugs in the CPU core or unimplemented features. I figure I can author a few simple test ROMs to spot check functionality, before running the more comprehensive tests (like blargg's test suite, etc). I'd kinda like to get my hands on an EverDrive at some point (maybe after I finish paying off my desk and monitors) so that I can compare the results to real hardware. I don't have a DMG at my disposal, but the SGB is close enough architecturally that it should be good enough for most things, barring a few very specific SGB-specific things.

Honestly, I kinda wish I would have developed the console before starting on the core. Kinda sucks to have to put everything on hold for this, but oh well. I had been using my debugger to get the boot ROM working, but lldb isn't really intended for debugging high level things like this (I mean, you can do it, but it's tedious).

One habit I have been having a tough time breaking is my tendency to reach for malloc/free instead of new/delete. C habits die hard, apparently.

Bobbias · « **Reply #5 on:** August 13, 2020, 04:49:22 PM »

It's short enough I can toss it in a gist. I didn't include any of the actual programs provided by the advent of code problems, so you'd have to grab those yourself if you actually wanted to test it out. Documentation is somewhat lacking. The basic way it works is that you provide a queue for IO to and from the VM, and pass that into the run function, which runs until some stop condition is reached. By default, the stop condition is an output opcode, but some problems required it wait on input instead of halt on output, because several of the questions involved running several VMs at once with outputs from one VM being passed to the next one in sequence. Originally I only needed to halt when it output something, but once they started requiring multiple VMs to feed into each other I realized I had to modify how that worked. Adding the ability to halt on either input or output was what inspired me to refactor things so you could halt on any opcode. I originally wanted to try my hand at asynchronous operation, but I was struggling to get things to synchronize properly and eventually gave up on that idea because it was overly complicated and unnecessary.

Here's the gist.

Spectere · « **Reply #6 on:** August 14, 2020, 02:02:14 AM »

Without knowing the exact issue (or exact knowledge of the Advent of Code 2019 problems), I suspect your issue might be with this (on line 134):

Code: [Select]

self.break_on & op
Unless I'm misreading your intention, a bitwise AND isn't really appropriate in this situation since that'll cause any common bit to trigger the breakpoint. Basically, if break_on is set to 0xF0 and op is 0x1F, it'll still trigger the breakpoint.

And yeah, I was kinda thinking of making Plip a bit more asynchronous—most notably splitting off the video and audio threads—but I knew I would be able to get a GameBoy emulator running on a single thread on pretty much any modern system and was very eager to get started on the meat of the project. I think that Plip is just modular enough that I may be able to split those things off into separate threads, but I don't think it'll matter much in the grand scheme of things.

The memory subsystem is by far the biggest sticking point of the project, with fetching values from memory consuming far more CPU time than I like. Being able to control memory access on a class level is fantastic, and with the way the system currently works the CPU should theoretically have the same view of the memory controller that a real GameBoy does without me having to do anything special, but when you multiply a tiny bit of overhead times a million it ends up turning into quite a bit of overhead (and it has a few tricks that allow emulators to detect writes to ROM, as this is what GameBoy memory bank controllers use to modify their registers). As-is, this approach likely won't be viable beyond the third console generation. It'll just take far too many CPU cycles to be viable.

I decided to toss mine up on GitHub, so here ya go (linking directly to the dmg branch, as the master branch only includes a fairly meh CHIP-8 emulator I threw together as a test of the audio/video/input subsystems): https://github.com/Spectere/Plip/tree/dmg. It should theoretically compile with MSVC, but seeing as 90% of the development was done on my Mac (clang) and the remainder was done on my Gentoo box (gcc) I can't guarantee that it'll work without modification.

I sort of like my approach to the CPU in particular. It likely suffers quite a bit of overhead compared to Gambatte's CPU core due to the number of function calls (which, again, pales in comparison to the time spent in PlipMemoryMap) but I aimed to leverage macros in a way to try and expresses what the CPU is doing during each mcycle, as well as stuff some common functionality (i.e. fetching the next byte and advancing the PC) into macros.

The method for decoding the opcode used to be a bit more elegant, such as allowing for both one-off opcodes as well as allowing for a masked opcode (stuff like LD r, r' that take distinct parameters) but I opted for a switch block to effectively force the compiler to create a jump table. While the compiler would have likely created a jump table out of the non-masked if conditions, the lack of this optimization on debug builds made stepping through the op decode process extremely tedious.

Spectere · « **Reply #7 on:** August 14, 2020, 10:05:13 PM »

I just finished implementing the first part of Plip's console.

Right now it just echoes whatever you type in, but obviously it's going to do a little bit more than that after I'm finished with it.

It automatically pauses emulation as well, so the state of the core won't change while the console is open. I'm likely going to add some sort of simple overlay layer as well to indicate the state of emulation (pretty much just a pause icon if the emulation is paused) as well as a shortcut key to single-step the CPU if emulation is paused.

I drew that particular font (8x12) myself, but it should theoretically be able to seamlessly use any 256 character font arranged in a 16x16 grid. It doesn't support Unicode at all (it'll just interpret codepoints as ASCII strings) but that's not really necessary in this case.

The next step is going to be creating a basic command processor for the console. Kinda thinking I'm going to take the simplest approach and keep commands in an alphabetically sorted list, and allowing the nearest unambiguous match (for example, if "getPC" and "getReg" are registered, "get" would throw an error but "getr" would expand to "getReg"). I'm hesitant to spend too much time on this, but at the same time I think I'll regret not making it at least half-decent in the long run.

After that, I'm probably going to provide a more generalized way of reading the register state from the core's CPU. Right now I have a DumpRegisters() function that just spits everything out to a std::string (which is mostly used for core/CPU exception handlers at this time), but that's not always ideal. What I'll probably end up doing is writing a function that formats register info into a std::unordered_map. That would allow me to get the value of individual registers without dealing with the more verbose DumpRegisters() output in a EmuCPU-agnostic way.

Spectere · « **Reply #8 on:** August 19, 2020, 12:39:27 AM »

Quote from: Spectere on August 14, 2020, 10:05:13 PM

Kinda thinking I'm going to take the simplest approach and keep commands in an alphabetically sorted list, and allowing the nearest unambiguous match (for example, if "getPC" and "getReg" are registered, "get" would throw an error but "getr" would expand to "getReg").

This is what I wound up doing, and it ended up being pretty simple.

The console system narrows down the command list using the first token in the user's input. If there are no candidates, it fires off an error. If there is only one candidate (or multiple candidates with an exact/unambiguous match) it'll select that. If there are multiple candidates with no exact matches, it displays each of the possible candidates.

When a command is registered, the console system expects the command's full name, as well as a function pointer (with a pointer to the Console instance and a vector of strings as parameters). When the command is called, the function pointer is called with the console instance and a tokenized parameter list attached.

The only supported commands at this point are "help" and "quit," but it's at a point where I can push ahead and start adding debug commands. I'm not too crazy about how the video subsystem interacts with the console (it's immensely obvious that it's an afterthought), but it's not really harming anything at this point. I might end up refactoring it down the road, but it's fine for now.

Spectere · « **Reply #9 on:** August 25, 2020, 10:18:44 PM »

I took a break this past week due to having an absolutely horrendous week at work and opted to pick up where I left off tonight. First, a fancy pictographical thing:

Register dumping! Neat. Breakpoints are going to be a bit more "interesting" due to how the frontend only talks to a generic system core object, but it'll still be doable. The onus will basically be on the CPU core to catch the condition and send a message back to halt execution.

I did implement pausing, machine cycle stepping, and frame stepping. I suspect that'll help out pretty significantly in the long run.

In addition to that, I ended up doing a much-needed refactor of the game loop. Before I basically dumped all of that into main.cpp, but I opted to pull that into its own GameLoop class and relegate main.cpp to set up the loop and transfer execution to that. In addition to that, I changed the Console class from using C-style function pointers (which really don't work with non-static member function) to using std::function and binding lambdas (which can access instance members) to that.

Next step: breakpoints. Next next step: making things actually boot.

Edit: Scratch that. Next step: memory fudging.

Being able to get and set arbitrary memory locations is pretty useful. While in this case I just use it to corrupt a few tiles of the Nintendo logo and change the X scroll register (which is actually good, since now I know that my X scrolling implementation works

), it gives me the ability to peek and poke at the system's various registers, since they're alllllll memory mapped.

Bobbias · « **Reply #10 on:** August 27, 2020, 12:34:57 AM »

Looking good man. I'm excited to follow the progress. I still haven't taken the time to properly sort out my intcode computer. I'll try to get to it one of these days. The comparison you pointed out may well be wrong, but I don't believe it's the only issue with it. Pretty sure there's other bugs in there that need to be ironed out as well.

Spectere · « **Reply #11 on:** August 27, 2020, 02:07:51 PM »

Oh, trust me. I know that feeling, haha.

Another thing on the list is writing some simple test ROMs. Even blargh's test ROMs, simple as they are, have enough boilerplate that it muddies the waters at this stage (at this point I don't know if the perpetual blank screen I get after the boot ROM concludes is the CPU core, the PPU, or something else). I'll probably collect those in their own repo as I work on them.

Spectere · « **Reply #12 on:** August 31, 2020, 08:38:12 PM »

I ended up setting up breakpoints this past weekend, which have proven to be pretty helpful with tracking down weird PPU and CPU issues.

I made a few corrections to the interrupt handler, as well as made a few corrections to the way the LCD disable flag in the LCDC register is handled. This allows the FOSS SameBoy DMG boot ROM to both function and display as expected. This is good, because it goes through its animation a lot faster than the official DMG boot ROM.

I wrote a couple test ROMs, finding that my vblank interrupt handler worked first try (whoa). Plip's behavior does differ from SameBoy on another test ROM, where I disable and reenable the LCD during the same vblank period. I'm going to try it out on BGB at some point to see how that behaves (I suspect its behavior will match SameBoy's). I would test against BGB more often, but it's Windows-only and I do most of my development on a Mac (in case that wasn't obvious already). Oh well. At some point I think I'm going to try to get my hands on an Everdrive X7 so that I can do testing on actual hardware (I have a Super GameBoy and I'm thinking of picking up a used GameBoy Pocket, or Color, as well).

I need to improve its Makefile so that it doesn't constantly rebuild everything from scratch, but my test ROM is fairly nice. I ended up setting it up so that I can override any of the major sections (RST routines, interrupt handlers, header variables) without having to fiddle around too much. I'll have to push that into a GitHub repo at some point.

Spectere · « **Reply #13 on:** September 24, 2020, 08:16:05 PM »

Ended up fixing a whole bunch of CPU issues and…

Seems just a tiny bit off, but this is the closest I've come to booting a real game so far. It's passing all of blargg's CPU tests now aside from the interrupts test (it throws an ERROR #04 because my timer is buggy). I'm working on the mooneye-gb test suite at the moment.

This is one of those cases where writing your own tests is immensely helpful. I was running into a lot of really weird failures when I tried to run the standard tests since they contained a ton of boilerplate code (initializing a scrolling text console, for example!). My testing "framework" is a bit simpler, instead just writing our solid blocks over the ROM logo to indicate pass/fail states. My "miscInstrs" tests basically pushed Plip to the point that it could run the other tests, and then blargg's CPU instruction tests further increased accuracy. I'm still not entirely sure if the CPU core is cycle accurate (and won't be at all sure until my timer works) but at least it seems to give the correct result most of the time.

Bobbias · « **Reply #14 on:** September 25, 2020, 10:22:41 AM »

Dude, that's awesome. I've been fiddling around with racket recently. Been researching techniques for.... well basically decompiling Intcode from the advent of code 2019 to racket. That's only the first challenge with it, it gets enough features to be Turing complete later on. It's been quite the learning experience, but I have very little progress on actual code :/

spectere.net

Author Topic: GameBoy Emulation (Read 6030 times)

Spectere

GameBoy Emulation

vladgd

Re: GameBoy Emulation

Spectere

Re: GameBoy Emulation

Bobbias

Re: GameBoy Emulation

Spectere

Re: GameBoy Emulation

Bobbias

Re: GameBoy Emulation

Spectere

Re: GameBoy Emulation

Spectere

Re: GameBoy Emulation

Spectere

Re: GameBoy Emulation

Spectere

Re: GameBoy Emulation

Bobbias

Re: GameBoy Emulation

Spectere

Re: GameBoy Emulation

Spectere

Re: GameBoy Emulation

Spectere

Re: GameBoy Emulation

Bobbias

Re: GameBoy Emulation