Back

Kanji Koohii Development & OpenSource Updates

#1
http://kanji.koohii.com/news/id/309

Another thing I'd like to put on github eventually is the scripts I used to parse KANJDIC  / JMDICT. I could use help in improving them, it's really old code and I never got the time to get into it. The scripts split the readings in all the JMDICT compounds.

Right now the last three issues (90, 91 , 92)  are super simple front end modifications which a contributor can use to get around the code base.

Support additional kanji sequences such as KKLD or RTK Lite #70 This one is not very hard to do either. The code is mostly ready to support a new index since it already handles switching between 2 indexes (old / new editions). But it could be more complicated as there are probably a few semi-hard coded references to RTK in some places.

Label the actual Hard/No/Yes/Easy answers in the SRS review summary  #89  isn't very hard to do either. mainly consists of finding some tempoorary storage for the answers, pass them through to the review summary page via POST , and then change the table output to include the answer.

I am now looking at a better way of labeling issues (ie. article). Any suggestions? 

If you have any motivation whatsoever to improve Kanji Koohii just get in touch via Gitter (cf. README file).

Backend side, it's not terribly exciting, but there is room for improvment. It's a lot to ask though for controbutors to improve / fix old code that probably doesn't make sense to them.

Front end side I have updated the build with Webpack / Babel / npm. You can use modern ES2015 Javascript and VueJS for components, so it should be more interesting.

Or if you are just interested in security you can help me improve the code, or find some problems. (on that subject https is coming in April hopefully).

And you can support me on PatreonAngel
Edited: 2017-09-26, 6:07 am
Reply
#2
Thank you! Fabrice, this is simply awesome! I'm really hard pressed IRL at present, but when things settle down a little I'll contribute without hesitation ;-). I know this had to be really time consuming and also require some mental switches to flip, so thanks for all your efforts.
Reply
#3
Sounds good faneca!

I am available most of today for the next 8-10 hours if anyone needs help setting up the repo just fire questions on Gitter.
Reply
Thanksgiving Sale: 30% OFF Basic, Premium & Premium PLUS Subscriptions! (Nov 13 - 22)
JapanesePod101
#4
Can someone git me a few git tips?

- So when I created the repo and published it, it has a "master" branch.
- Locally, I always merge feature branches into "develop".
- And "master" is meant to be the last published & stable production version.

I'm not sure what to do now for contributors. Do I have to explicitly create a "develop" branch and push it?

What happens with a pull request? if I merge it locally in "develop", the contributor can not have theirs merged in master right? It would have to follow the same structure I use? because then what happens when they update their repo? I suppose to see their changes, they would have to pull my develop branch? And then do they just delete their feature branch?
Reply
#5
If "develop" is the current production branch, you should make it available on github. Git is a distributed VCS, so every repository has exactly the entire history of any branches it contains.

Git is really smart about merges. There's no problem with merging two branches that independently merged the same other branch. Merges only really care about file state (right now) and the hashes of individual commits (http://codetunnel.io/merge-vs-rebase-par...mmit-hash/).

A merge will only fail if commits with different hashes change the same file in conflicting ways (which is why binary files can't be merged, git doesn't know how to define "conflicting ways" for binaries). Even then you just manually resolve the merge conflict by choosing the desired lines in each conflicting file (or desired file in the case of binaries) and committing that. Each merge is its own commit, and merge conflict fixes are included in the same commit.
Edited: 2017-03-25, 6:33 am
Reply
#6
Since I created this public repo, I started anew. The history is fairly new, and there wasnt' a develop branch.

I created a "develop" branch, but it doesn't get pushed with `git push`. So in that sense, it's not as intuitive as I expect. It doesn't reflect the branches. It seems like I have to add another remote for that?
Reply
#7
By default, git only pushes the branch you currently have checked out in your working directory. You can push all branches with --all, but I just check out the individual branches I want to push if I want to push specific ones. I think there's a way to push a specific branch without checking it out, but I don't remember the syntax.
Reply
#8
I think
Code:
git push -u origin develop
is the command you are after, but depending in how your upstream or your local repos are configured right now, you may need to do additional tasks like adding remotes to your local repository first or such.

Anyway, given your current workflow, I'd recommend you to have a look at the git flow module, which greatly simplifies your life at the commandline. And this cheatsheet which explains it all is golden.
Reply
#9
Thanks. I think I get it now . "origin" is just a convention for the main repo. It could be the name of a remote I set up for a fork, but I will always push to "origin".

So it's more like you have lots of small repos , each one a branch, really.. and unless you setup remotes, the only thing people have in common is when they sync the same branch.
Reply
#10
Is it a good idea to switch the repository to php 7?

7.0 has some useful additions: closure binding with ::call(), scalar type declarations in method signatures, null coalescing ?? operator is pretty handy, etc.

On the other hand they are not essential, and 5.x stills seems to be the default on Ubuntu and Sierra stock installations.

The goal is to reduce friction, but also make the repository more conducive to contributions.
Edited: 2017-03-29, 9:19 am
Reply
#11
If there's one reason over anything else to defend the migration path, it's performance.
PHP7 + OpCache performs way, way better than 5. Also, the language's got a lot more professional, not just on the surface, as you already point out, but finally having things like a true AST have opened the door to future optimization techniques impossible until now.

Additionally, I'm on Mint 18.1 (based on Ubuntu 16.10) and there isn't php5.6 anymore, only php7. I think fedora 25 is the same, though not completely sure. So expect servers to be phasing it out sooner than later.
On your own computer, assuming you install one of the most recent distributions, if you absolutely need php5 for developing, the situation can be workarounded by using phpbrew (custom php user installations in parallel) together with virtphp (python-like virtualenvs for php). If interested google for it ;-).

OTOH, taking into account your code base can have pretty old parts, some of the deprecations could hit you specially hard. it's very likely just switching PHP versions won't work out of the box, so the extent of the work needed to do the migration has to be evaluated (more so if, as I imagine, your code doesn't have thorough testing implemented). Anyway I've confidence it would be doable with minor pain, the worst part would be doing the manual testing in order to check everything works (or assume you're gonna break things here and there and wait for the issue reports to come).
Reply
#12
Alright, I will give it a shot. I found a [url=https://github.com/LExpress/symfony1]symfony 1 fork[/urkl]with php7 support in case I run intro trouble.

faneca Wrote:(or assume you're gonna break things here and there and wait for the issue reports to come).

Dammit, I've been found :p
Reply
#13
Thank you faneca! Turned out to be pretty simple. Updating repository soon.

I only had to ditch an old library called Services_JSON.php which was really not needed. It had the advantage of being more lenient, and even allowing comments in JSON code for example. So now I use native php json encode/decode.

Actually I haven't tested symfony in depth. Site seems to work fine.

Git Question

Say I created this branch "refactor/php7". I merge with --no-ff because I like to see the individual commit history.

But I'm thinking, maybe it's better to keep a "topic" branch, like that to group commits thematically.

Scenario : I want to make more php7 refactor commits every now and then.

I'm not quite sure how that works. If I just co to that branch, to make thematic changes, it means it 's reverting back to an old version when I co, because the main master or develop keeps developing. So don't I have to merge the latest master / develop , into that topic branch, before merging again? Oh.. then I am guessing what happens is that all those newer commtis will be ignored when I merge the topic branch once more, since git no longer sees them as changes, and the topic branch ends up merging just the topic changes (does that make sense?).

But then I wonder what is the point? If I swiych to that branch, Im' getting the full history right? It's not like I'm seeing just the thematic commits now since I need to update it to merge. Or did I miss something?

edit: Only thing that comes to mind is that I will see the branch graph in git tools and git log --graph I suppose that seems like the only advantage?
Edited: 2017-03-31, 8:42 am
Reply
#14
(2017-03-31, 8:37 am)ファブリス Wrote: Thank you faneca! Turned out to be pretty simple. Updating repository soon.

I only had to ditch an old library called Services_JSON.php which was really not needed. It had the advantage of being more lenient, and even allowing comments in JSON code for example. So now I use native php json encode/decode.

Actually I haven't tested symfony in depth. Site seems to work fine.

Git Question

Say I created this branch "refactor/php7". I merge with --no-ff because I like to see the individual commit history.

But I'm thinking, maybe it's better to keep a "topic" branch, like that to group commits thematically.

Scenario :  I want to make more php7 refactor commits every now and then.

I'm not quite sure how that works. If I just co to that branch, to make thematic changes, it means it 's reverting back to an old version when I co, because the main master or develop keeps developing. So don't I have to merge the latest master / develop , into that topic branch, before merging again? Oh.. then I am guessing what happens is that all those newer commtis will be ignored when I merge the topic branch once more, since git no longer sees them as changes, and the topic branch ends up merging just the topic changes (does that make sense?).

But then I wonder what is the point?  If I swiych to that branch, Im' getting the full history right? It's not like I'm seeing just the thematic commits now since I need to update it to merge. Or did I miss something?

edit: Only thing that comes to mind is that I will see the branch graph in git tools and git log --graph  I suppose that seems like the only advantage?

Even though git is smart about merging, IMO work on a feature branch should be regularly updated with whatever's new on the main development branch. Letting branches diverge for too long can lead to trouble. Sometimes you do have to resolve conflicts manually and you want to get that sorted out the sooner the better. Merge from development into the feature branch and make sure nothing is broken, then keep going. When the feature is finished, merging to development should be painless.
Reply
#15
My opinion is biased because my "teams" almost always include just me or two people at most. That said, I rarely merge develop again into a feature branch (when I really have to, I *rebase* the feature on a more recent commit of the develop branch... though other workflows are possible, of course).

One of the crucial points to make this work is making features small enough to be manageable. I usually keep a backlog of tasks and issues in the repository's issue tracker, and every time I tackle one of those, I create a new feature branch named after the issue / task number (though to something like a migration to PHP7 it makes sense to give it a proper name... otoh you could also use tags for that).
If one issue is going to take more than 8 hours, I divide it in smaller tasks if possible (appropriately entering the new resulting tasks in the tracker, documenting it in the original issue with links, and reflecting that in the features' branch names). That tends to reduce the clashing among modifications from different people working on different tasks to a minimum (of course, they still occur at times, but git makes it easy to resolve them).

When I finish one feature, I simply merge it back to develop and don't care much about the log's history: normally there's just a few commits (or even just one) that make a clear reference to the issue number and title in their commit message, and they all get merged at the same time, so reading and understanding the commit history isn't a complicated task.

Obviously there are some times (like this migration you have at hand, for instance) when you want to state things clearer and louder. What I do is simply tag the branch (--no-ff is the default when you have it tagged; just don't forget to upload the tags when pushing upstream with `git push --tags`).
Reply
#16
@faneca

I also rebase a lot on solo project. But now with the public repo, I can't rebase a branch right? Afaik, the branch history needs to be the same.
Reply
#17
Kanji Koohii along with the Open Source repository is now running on PHP7. Huzzah.
Reply
#18
(2017-04-01, 5:42 am)ファブリス Wrote: I also rebase a lot on solo project. But now with the public repo, I can't rebase a branch right? Afaik, the branch history needs to be the same.
Well, you can't rebase branches that other people expect not to rebase (so master, any kind of per-release branch, a dev branch if you use that sort of workflow). But for your own work-in-progress topic branches that haven't yet been merged into master a rebasing workflow can be OK. If you want to keep things clearly separate you can have your own repo for your own in-progress work that's not the same as the official 'upstream' one, and handle your own pull requests the same way you'd handle ones from another contributor.

(The main project I work on handles code review through email and so the convention is to always rebase while working on a patch series, so that the set of commits presented for code review are "clean" and easy to understand. Git allows lots of different workflow styles though.)
Reply
#19
Hmm still not 100% clear. So indeed if I have say, a long term "refactor/php7" branch in which I'd make thematic commits with syntax changes and the like. I can rebase this branch if it is only on my end. Let's say I merge this branch into master. Indeed master itself is not rebased. Let's say I make more changes,

So let's say here, instead of merging master into it, I rebase my refactor/php7 branch. In theory, this creates new commits and new history. But, the branch is for me only. And when I merge it again, git will see only the topmost changes since I merged master into it, and from the master branch everything is fine. Did I get this right?

So that wpould mean I can rebase refactor/php7 from master once in a while, because its history is not shared, OR I can merge master into it.

Rebase would make my commits eaasier to review inthat branch, as they would all appear towards the end I imagine.

But on the other hand, don't you run in trouble eventually as you rebase over files that have been changed in master, and start seeing conflicts that you need to resolve?

edit: Gah, I guess I'll just backup repo and experiment.. I suspect that merging master periodically into my topic branch is the safest option.
Edited: 2017-04-02, 5:50 am
Reply
#20
Once you first merge refactor/php7 into master, that topic branch is done, and the easiest thing is just to leave it alone. If you have more php7 refactoring to do in future, create a new refactor/php7-part2 from head of master to work in (it'll have all the changes from the refactor/php7 branch because you merged that into master).

It's possible to reuse the same branch name for the second lot of work (git doesn't care about branch names, only the commit graph) but it's probably just confusing to do that.

Regarding conflicts, you get those either way. Either you see them when you rebase your commits onto current master, or you see them when you merge current master into your branch. If there are conflicting changes between the two you're going to have to sort them out somehow. It's usually best not to let work-in-progress diverge too much from current master so that the resolution process isn't too painful.
Edited: 2017-04-02, 7:26 am
Reply
#21
Thanks I was confused by the subject of a "long running branch", which is what I had in mind when I said topic branch.

But yeah, I don't see why I couldn't just recreate the same named branch.

I don't like leaving branches behind now, because it confuses me when I pick up the code after a break. So as soon as I'm done with a topic branch, I use git br -D <branchname>
Reply
#22
My trick for avoiding being confused by old branches is a script I use to list branches which sorts them by most-recently-changed, like this:
Code:
[....]
s sigfix-noodling          2016-03-07  handle SA_RESTART properly
s osx-warnings             2016-06-18  configure: Make AVX2 test robust to non-ELF systems
s test-about               2016-08-16  ui/cocoa.m: Make a better about dialog
  test-gerd-cocoa          2017-01-26  cocoa: stop using MOUSE_EVENT_*
* master                   2017-03-14  Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20170314' into staging
so the things I've done recently are always at the bottom and the thing I did back in 2014 scrolls happily off the top of the screen where I don't need to care about it. I do have an occasional spring clean of deleting dead branches sometimes though. (NB: for me most of those are purely local branches which I've never needed to push out to a public repo.)
Reply
#23
@ファブリス: Nothing to add, pm215 explained better than I ever would ;-)

@pm215: Interesting script! Until now, I never had to deal with so much local branches that anything outside `git branch -avv` was necessary, but on a few occasions I sat there staring at that command's output for several seconds before I could make sense of it. It's bookmarked for when I see the need, thank you!
Reply
#24
Bumpity Bump

I will make a (more formal) announcement soon that I am taking a backseat so to speak and no longer making significant developments myself. (after I switch to HTTPS).

Please pass the word about the project; I am 100% available as a project maintainer and on Gitter to help php / front end dev setting up and contributing.

https://github.com/fabd/kanji-koohii

Yes, it's not the prettiest / modern codebase, but it's maintainable (imho). This is an active site with thousands of users. At least a few hundred active SRS users as well. Improving the UX and fixing bugs makes a difference.

I wouldn't say this is a good project to learn modern php, but there's no reason why you can't write good php either. THe main downside is that it runs the old Symfony 1.x.

The database layer is custom, but it's very functional AFAIK. The API is very similar to the Zend_Db one. And there's no need to try to figure out everything at once, you can just write a single string query with quoted params like ->query("select foo, bar where id = ?", $id) and that's securely escaped, and you (or I) can always pretty things up later.

I started documenting but put it on hold until I see some interest. This takes time and I'd like to see some minimal interest. I have a bunch of docs already they were running on another Symfony 1.x app so I figured it was better converting them to an online, easily maintainable format.

There are quite a few help-wanted issues which I labelled because they are medium to low complexity. In fact a couple ones recently I could have done in a few days so it's annoying but if I don't put them there, then I don't have tasks for new contributors as the other stuff tends to involved refactoring bits and pieces.

Note as such, I will update Patreon eventually, which will continue to cover the web hosting / https cert costs.

If you have any questions about the project just ask here or Gitter.


PS: basically I need to learn new things. I also want to reconnect to my roots as game dev and in general programming with compiled languages. So plan is to learn iOS dev over the summer 5-6 months while I am unemployed. Maybe have fun making a small game as a side project. Which is precisely why I am available during the coming months to help anybody set up the code and contribute.
Edited: 2017-04-20, 10:39 am
Reply
#25
By the way I realized since Kanji Koohii is Open Source you have the possibility to bring back Reviewing the Hanzi, the Chinese version supporting RTH / RSH.

If someone comes forward with motivation to maintain it, I already had James Heisig and Timothy W. Richardson's blessings.

I already had the data for RTH. THe site was closed mid 2014 due to low activity and me wanting to reduce complexity and focus on a single site.

I don't remember what went on regarding RSH. I think the multiple edition support was added *aftewards* which means in theory the site already supports it instead of 5th / 6th editions you 'll have RSH / RTH.

There are also still some bits and pieces left of the method I used to switch the code between Chinese / Japanese for example the CJ_HANZI constant still in a few places.

Then there was obviosuly other index.php entrypoints, where the main switch was defined. See applicationConfiguration.php

If someone *REALLY* motivated comes forward, who is communicative via Gitter, I can bring back the subdomain (presumably hanzi.koohii.com), deploy the site etc.

I have a backup still from February 2015 when I pulled the plug, where the database contains the required RTH data so you can go ahead focusing on fixing the code.

PROS

- RTH site will now have the mobile / responsive improvements
- should now support RTH / RSH (but you need to fill in the RSH data, we have the spreadhseets)

CONS

- it's non trivial, I deleted some if / else cases in some places because it was obsolete from my pt of view, but nothing too complicated.
- TESTING : you have to test EVERYTHING. adding cards, removing cards, editing keywords, all flashcard modes etc.
- ARCHITECTURE : you have to consider each and every feature, if not supported for RTH / RSH discuss with me how to handle it in a CLEAN way so as not to complexify the codebase (ie. avoid duplicating code, avoid too many if / else cases, this has to be carefully considered case by case.. it could make sense, eg. to have a separate flashcard page for Hanzi, so it can be customized more freely)

Finally if you have a RTH / RSH working, without breaking Kanji Koohii, then I can bring it back.

I think It should be in the same code base, as RTH / RSH / RTK have many commonalities. But I'm not sure. Perhaps an organization on githubg, and having two separate repos would work better? I don't know how you'd sync like bugfixes across them as many code is shared.
Edited: 2017-05-17, 9:08 am
Reply