Git and its Version Control Goodness

I would rather have learned about git from the sky down but hey. Find it here

This talk was recommended to me by one of the fine chaps at the Games Academy, Al, as a way for me to get my feet wet with version control. Well I can report that my feet are indeed wet. The problem I was facing was one that I have spoken about before in that I did not use version control, aside from copying the project file of whatever game I was working on and then working from that copy. The main issue there is memory and it cost me more space than my wife’s shoe collection. No, I don’t know why she like Army boots either, shall we move on? There was also the small issue of perhaps losing a whole days worth of development should I introduce some issue that breaks the whole thing. The best I could do in that case would be to copy the project from the last time I knew it was ‘good’ and work from there. And then, of course, I learned the really hard way (I do that sometimes) and lost my game jam project. Time for some version control. To be honest I don’t know why I didn’t do it earlier as I had been introduced to the concept during a Udemy course I had been following. But I think that I got a little overwhelmed with having the learn C++, version control and Unreal all at the same time. I just dont think I was ready for all of it. Also, and I hate to point the finger but its true, we were not advised to use source control during the BA top up course. I think, now that I have a little more experience that is a mistake and it should be introduced in the first week as it have discovered that it really is an industry standard practice.

So I have a SMART goal for this that I have talked about recently so I wont go into that here and this talk is sort of outside of that really but I really like to know how things work behind the scenes and I know that I will feel more comfortable using git in particular if I understand the how and why and little bit better. The rest of this post is very much straight from the presenters mouth and I am not passing this off as my own work unless its worth a ton of marks then yeah, all me. I was just trying to keep up with her and document my own understanding of what she was talking about. It was very interesting and I have learned much about what goes on behind the scenes. Sorry about the bullet style I don’t write straight to the journal, I use WorkFlowy as reordering things is so very easy!

A Little History

  • 1978
  • The Source Code Control System SCCS
    • A delta table
      • A data structure that held deltas between changes in files
    • A set of control and tracking flags
      • Set permissions and release control on specific files
    • A set of control records
      • Keep track of insertions and deletions at the file level
  • 1982
    • Revision Control System RCS
      • This was not distributed and was only used on a single machine
  • 1990
    • Concurrent Version System CVS
      • This worked on the modern client/sever model
      • However this leads to the Merge Conflict and this happens because different developers edit the same file at the same time. This then needs to be resolved. How? CVS enforced the fact that the developer had to pull the most recent commit from the sever in order that they could commit their own work. This essentially stops merge conflicts from happening and from the sound of it, was a first come first serve approach to development changes.
  • 2005
    • Git
      • This is decentralised. You can have multiple ‘first class’ copies of a repository in different places. They also introduced a different way to resolve merge conflicts. You could work on a old version, pull the new version in and reconcile the two versions at that point.
    • Git hub is a separate software to Git

Under the Hood

  • .git directory
    • Objects
      • What is an object?
        • Its a data type that has
          • Type
          • Size
          • Content
        • Objects
          • Blobs
          • Trees
            • A tree is a reference to multiple blobs or multiple trees
          • Commit objects have
            • A reference to a particular tree
            • A Time stamp
            • A committer, the user that made the commit
            • Commit message that goes along with it so that the developer can explain the changes made
          • Annotated tag
            • ‘cut’ a release
            • create a tag
            • contains many of the same details that can be found in the commit object
      • info
      • pack
        • pack .idx
        • pack .pack
        • use command ‘file’ to get more information about them

Using git

Go girl! Got it from here
  • Clone the repository first
  • Create a branch, lets call it the Feature branch
    • Whats in the Head file?
      • The 40 character nonsense that is shows is called a hash
      • You can print out the content of the file that is associated with that hash
  • Staging
    • When you stage a file a new blob object is created that corresponds to that file. A blob is used to represent file data. You can fetch the hash associated with the blob object and print it. When you stage a file or project, you add a new blob object to the .git directory with every file that you have changed
  • Committing
    • When you make a commit the directory name is the first two characters of the hash and the file name the other 38
      • 2 reasons for this
        • Some operating systems place a limit on the number of files that you can have in a directory
        • Because of the way that some operating system search for files, its faster to split the files up over multiple directories.
    • The commit object, because of its connections to other blobs and trees, stores the state of the entire repository in one data structure.
  • Pushing
    • Git has compressed the ‘loose objects’ that were created in the repository into a pack file.
      • The Heuristic that is used to execute the compression is
        • The system goes through the directories and sorts files by type
          • Commit, blob, tree or tag
        • Then they are sorted by name
        • Then sorted by size
          • Because of the fact that files tend to grow over time with additions to the code base and refactoring, ordering by size is a good way to order by recency. You should have the most recent changes at the top of the directory so that the system can figure out the deltas on them.
        • Then the system uses a sliding window to compute the deltas between those adjacent objects. I don’t know why, but this is just really cool.
          • Linus’s Law – as time passes by the size of the file grows
          • If it detects only very small changes, instead of storing both objects, it will store one and a delta to the next
        • I think that they are compressed at this point? The presenter talked about compression and from what I can see this is the most obvious place for it to happen.
        • Then the index file is created that is used to resolve the hash to the file it represents in the compressed pack file via a pointer in the index.
  • Merging
    • Fast forward
      • If the Master branch has not changed then the Feature branch created earlier can just be merged with the Master using a fast forward command, which sets up a pointer in the Master branch to the one you want to merge it with, the Feature branch in our case. The pointer points to the most recent commit object you have.
    • Recursive Merge Strategy
      • You will end up with a merge commit at the head of the new branch
      • After merging you can query the merge commit object and find that it has 2 parents, pointer to the commits that are on the Feature branch and the Master branch it was merged with.
    • Rebase
      • Instead of creating a merge commit, the system reconciles that differences into a single linear history.
    • When you make a merge commit, you maintain the explicit branching which means that the commit has an awareness of the two branches that it came from. When you do a Rebase you favour having a linear history

In Conclusion

Git represents key information as objects stored in the file system.

Git compresses loose objects into pack files to increase space efficiency using delta compression

Rebases and merges differ in whether they give preference to maintaining a linear history or explicit branches

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s