====== SPO600 2025 Winter Project ======

==== Before Starting ====

Before starting this project, please perform [[GCC Build Lab|Lab 4]].

===== Project Stage I: Create a Basic GCC Pass =====

Create a pass for the current development version of the GCC compiler which:
  - Iterates through the code being compiled;
  - Prints the name of every function being compiled;
  - Prints a count of the number of basic blocks in each function; and
  - Prints a count of the number of gimple statements in each function.

Your code must build on both of the [[SPO600 Servers]].

It is recommended that you proceed in steps:
  * Create a basic dummy pass with a diagnostic dump
  * Add logic to iterate through the code in the program being compiled
  * Incrementally add logic to count the basic blocks and gimple statements

It is recommended that you position your compiler pass __late__ in the compilation/optimization process.

==== Resources ====

  * [[Creating a GCC Pass]].
  * [[Building GCC]]

==== Recommendations for Building GCC ====

A reminder that the ''make'' utility will rebuild a codebase in as few steps as possible. It does this by comparing the timestamps of the dependencies (inputs) for each target (output) to determine which source (or other input files) have changed since the related targets were built, and then rebuilding only those targets.

This can effectively cut the build time for a complex project like GCC from hours to minutes. On my development system (a Ryzen 7735HS with 32 GB RAM), a null rebuild (no source changes - make is checking that everything is up-to-date) takes about 8.3 seconds, and a rebuild with edits to one pass source file take 23-30 seconds. On the [[SPO600 Servers]] the rebuild times are similar.

To take advantage of this capability, do an initial full build of GCC in a separate build directory as usual, then make whatever required edits to the source code in the source directory. Run ''make'' with appropriate options (including ''-j'' job values) in the build directory.

Remember to use [[Screen Tutorial|screen]] (or a similar program such as tmux) when building on remote systems in case your network connection gets interrupted, and it's a good idea to time every build (prepend ''time'' to your ''make'' command) and redirect both stdout and stderr to a log file: ''time make ... |& tee build.log'' if you also want to see the output on the terminal or ''time make ... &> build.log'' if you don't want to see the output.

You can do your development work on either architecture, but remember to test your work on both architectures.

==== Submitting your Project Stage I ====

Blog about your process and results:
  * Include detailed results for the items above. Be specific and conclusive in your reporting, and include detail such as build options and build time, specific files and directories identified as you learned to navigate the code, and the specific code used in your experimentation. 
  * Clearly identify the capabilities and limitations of your code.
  * Enable replication of your results. For example, you could provide links to specific content in a Git repository of your experiments. Avoid presenting code as screenshots whenever possible, because screenshots are not searchable, indexable, testable, nor accessible. **It must be possible to easily test your code.**
  * Add your reflections on the experience - what you learned, what you found interesting, what you found challenging, and what gaps you have identified in your knowledge and how you can address those gaps. 
  * I recommend that you blog about your work in multiple sections - blog as you go rather than waiting and writing one massive blog post at the end of each stage.
  * Assuming that the basic work is done well, extending your Stage 1 work with particularly well-formatted dump text or additional detail in the output could improve your mark.

==== Due Date ====

  * Stage 1 is due with the second batch of blog posts <del>on March 9, 2025</del> by 8 am on March 10, 2025.
  * This stage of the project is worth 15% of the course total.

===== Project Stage II: Clone-Pruning Analysis Pass =====

Create a pass for the GCC compiler which analyzes the program being compiled and:
  - Identifies one or more functions which have been cloned. These functions will have the name ''//function//.//variant//'' where the //function// portion is the same, and there will be a corresponding resolver named ''//function//.resolver''.
  - Examines the cloned functions to determine if they are substantially the same or different. "Substantially the same" means that they are identical, with the possible exception of identifiers such as temporary and single static assignment (SSA) variable names, labels, and basic block numbers. For example, two cloned functions may have two different names for the first declared integer variable, but the corresponding variables will appear at exactly the same points in the two functions and are therefore equivalent.
  - Emit a message in the GCC diagnostic dump for the pass that indicates if the functions should be pruned (in the case that they're substantially the same) or not pruned (if they are different). The diagnostic dump may contain other information.

It is recommended that you proceed in steps:
  * Start with your code from Stage I
  * Add the logic to find the cloned function(s)
  * Add the logic to compare the gimple representation of the funtion(s)
  * Add the code to output a decision on whether the functions should or should not be pruned

To limit complexity, you may make these assumptions:
  * There is only one cloned function in a program
  * There are only two versions (clones) of that function (ignoring the function resolver)

However, if you choose to handle multiple cloned functions, or more than two clones, that would be a welcome enhancement!

It is important that you position your compiler pass __late__ in the compilation/optimization process so that any significant optimizations, such as vectorization, are performed before your analysis. Ideally, it should be one of the last "tree" (gimple) passes performed.

Two possible approaches to this problem are (1) to iterate through the statements in each function, comparing them statement-by-statement; or (2) generating some type of hash or signature that uniquely identifies the implementation of the function and which can be compared to the hash/signature of a clone to see if they are different. In either case, you need to accomodate the variation in variable, label, and basic block names.

You must output one of these specific strings in your dump file //per function//, each on its own line, conditional on whether the cloned functions are the same (PRUNE) or different (NOPRUNE):
  * ''PRUNE: //function//''
  * ''NOPRUNE: //function//''
Where //function// is the original name of the function that should (or should not) be pruned, without the variant portion.

Your solution should build and execute successfully on both x86_64 and aarch64 systems, and should take into account the differences between the FMV implementations on those two architectures (for example, the munging algorithm used to create the suffixes for the cloned functions is different).

==== Test Cases for Pruning/No-Pruning ====

Each of the [[SPO600 Servers]] has a file ''/public/spo600-test-clone.tgz'' which is a tar archive containing code to build test cases on x86_64 or aarch64 systems. On each architecture, two binaries will be built, each containing one cloned function. Building these binaries with a copy of GCC that contains your analysis pass should result in a decision to prune (for the binary ''test-clone-//arch//-prune'') or not to prune (for the binary ''test-clone-//arch//-noprune''), where ''//arch//'' is either ''x86'' or ''aarch64''.

Refer to the README.txt file within the tgz file for more detail.

Your code must be able to correctly output PRUNE or NOPRUNE messages for the test programs on each platform.

==== Submitting your Project Stage II ====

Blog your results:
  * Include detailed results for the items above. Be specific, detailed, and conclusive in your reporting.
  * Clearly identify the capabilities and limitations of your code.
  * Enable __easy__ replication of your results. For example, you could provide links to specific content in a Git repository of your experiments. Avoid presenting code as screenshots whenever possible, because screenshots are not searchable, indexable, testable, nor accessible. **Your code __must__ be easily testable.**
  * Add your reflections on the experience - what you learned, what you found interesting, what you found challenging, and what gaps you have identified in your knowledge and how you can address those gaps. 
  * Identify technical issues and improvements you would like to work on in Stage III of your project.
  * I recommend that you blog about your work in multiple sections - blog as you go rather than waiting and writing one massive blog post at the end of each stage.

==== Due Date ====

  * Stage II is due with the third batch of blog posts on April 6, 2025.
  * This stage of the project is worth 20% of the course total.

===== Project Stage III: Tidy & Wrap =====

==== What was planned for Stage III ====

My original intention for Stage III was to provide a GCC codebase that included AFMV cloning -- that is to say, every function would be cloned, without having to add the ''target_clones'' attribute to each function -- that you could test against.

Here's the background: there are at least two ways that AFMV cloning can be implemented...
  * By changing the logic that applies function multiversioning to each function so that it will do so regardless of the attributes present on that function when AFMV is enabled. This can be performed by altering the ''expand_target_clones'' function in ''gcc/multiple_target.cc''.
  * Alternatively, a pass could be added early in the processing that would automatically add the ''target_clone'' attribute information to each function declaration node in the tree structure.

I attempted to use the first approach, incorporating some work started by a student in the summer of 2024. However, subsequent issues with this approach arose from the assumption built into the code code that information about the architectural variants should be available in the tree representation of the code (Gimple). This is not the case, leading to a segmentation fault (segfault) in the later portions of the AFMV processing, in the case where the compiler has determined that a function cannot be cloned due to a (pretty rare!) situation where there is a non-local ''goto'' statement. This exact situation occurs within the GCC code //itself//, which causes the normal bootstrap compilation to fail.

I spent a considerable effort to rectify this issue but was not able to do so in time to incorporate AFMV cloning into the codebase for Stage III of this semester's project; my current perspective is that the second approach outlined above is probably the cleaner way to tackle this issue.

Therefore, we're going to use Stage III to wrap up our project work with enhanced testing, //without// AFMV automatic cloning.

==== Requirements for Stage III ====

  * Extend the functionality of your code so that it can process multiple sets of cloned functions and make a PRUNE / NOPRUNE recommendation for each (your code may already do this). In other words, remove the assumption from Stage II that "There is only one cloned function in a program".
  * Create one or more test cases that contain a minimum of two cloned functions per program. You can do this by extending the test cases that I provided for Stage II; or you can create your own test cases.
  * Verify that your code works correctly on both architectures (x86_64 and aarch64), with multiple functions, in both the PRUNE and NOPRUNE scenarios.
  * Ensure that you test a scenario where one (or more) functions have a PRUNE recommendation while one (or more) other functions have a NOPRUNE recommendation. This is probably most easily done by having one function that does a simple operation such as I/O or scalar arithmetic, and another function that processes an array or buffer of data in a way that is vectorizable.
  * Clean up any remaining issues you have identified with your Stage II work.

==== Submitting your Project Stage III ====

Blog your results:
  * Include detailed results for the items above. Be specific, detailed, and conclusive in your reporting, especially regarding your test cases and test results.
  * Clearly identify the capabilities and limitations of your code, including the scope of coverage of your test cases.
  * Enable __easy__ replication of your results. For example, you could provide links to specific content in a Git repository of your experiments. Avoid presenting code as screenshots whenever possible, because screenshots are not searchable, indexable, testable, nor accessible. **Your code __must__ be easily testable, and include the code and build instructions (e.g., ''Makefile'') for your test cases.**
  * Add your final reflections on the experience. 

==== Due Date ====

  * Stage III is due with the fourth batch of blog posts at 11:59 PM on April 17, 2025.
  * This stage of the project is worth 25% of the course total.