Kernel monitoring means tracking the performance or behavior of a whole system. It is only possible with the help of the kernel itself, either through device driver modules or through kernel functionality directly. Control and data are usually maintained in userspace.
How does the process behave with respect to resource consumption? How much time is spent in user vs. system mode? How much memory is allocated? Garbage Collectors provide monitoring interfaces which allow the tracking of memory usage. Utilities like ps, top, taskmanager show process activity.
Applications (and user level servers) finally need to call the operating system for critical functions like I/O. This is done through kernel traps - an ideal place to watch what a program does without further interfering with it (or having to modify it)
Many applications use additional libraries or organize their own code into libs. If those libraries are linked statically into the program they become a fixed part of the application. Sometimes special versions of those libraries exist (with additional functions for tracking problems). The applications gets rebuilt wiht such a library and runs now in a "debugging" mode.
Libraries can be loaded at runtime. In this case it is easy to replace the lib with a dummy or debug version or just intercept the call from the application to the library function, do some monitoring and then forward the call to the real lib.
Most programming languages offer no easy way to intercept program-internal calls to its own methods or functions. The compiler needs to help here. Driven by compile time arguments the compiler inserts extra code between function calls which performs monitoring functions. Applications need to be recompiled of course. It is easier if the program runs under the control of a virtual machine because the VM usually has a monitoring interface which allows the user to switch into special monitoring or profiling modes. At this level people start talking about "profiling" the applications control flow via the "call graph" (the order of funtion or method calls). Performance optimizations usually require this level of analysis.
Due to the lack of interfaces this level is tracked by inserting debugging code (println/printf) into the program. Requires code changes of course and needs to be removed after the problems are found. An alternative is the use of a debugger which frequently also requires a recompilation with special debug arguments.
Only debuggers can reach this level. With virtual machines one could possibly also single step through the bytecode.